Using k-anonymization for registry data: pitfalls and alternatives

Sten Anspal, Mart Kaska, Indrek Seppo

Abstract


We describe an applied study of ICT students' employment in Estonia based on data from two national registries. The study offered an opportunity to compare results from both k-anonymised data as well as those from the novel Sharemind platform for privacy-preserving statistical computing, which offers a way to use confidential data for research without loss of information. Comparison of results using k-anonymized and lossless data indicate substantial differences in estimates of students' employment rates. The results illustrate, on the basis of a real-world study, how the effects of k-anonymization can lead to considerable bias in estimates. While privacy-preserving computing does entail inconveniences because original microdata is not revealed to the statistician, this can be offset by greater confidence in the results.

Keywords


privacy-preserving computing; k-anonymization

Full Text:

PDF


DOI: http://dx.doi.org/10.12697/ACUTM.2017.21.05

Refbacks

  • There are currently no refbacks.




ISSN 1406–2283 (print)
ISSN 2228–4699 (online)