Using k-anonymization for registry data: pitfalls and alternatives

  • Sten Anspal Estonian Centre for Applied Research, Tallinn
  • Mart Kaska Estonian Centre for Applied Research, Tallinn
  • Indrek Seppo Estonian Centre for Applied Research, Tallinn
Keywords: privacy-preserving computing, k-anonymization


We describe an applied study of ICT students' employment in Estonia based on data from two national registries. The study offered an opportunity to compare results from both k-anonymised data as well as those from the novel Sharemind platform for privacy-preserving statistical computing, which offers a way to use confidential data for research without loss of information. Comparison of results using k-anonymized and lossless data indicate substantial differences in estimates of students' employment rates. The results illustrate, on the basis of a real-world study, how the effects of k-anonymization can lead to considerable bias in estimates. While privacy-preserving computing does entail inconveniences because original microdata is not revealed to the statistician, this can be offset by greater confidence in the results.