Geometric Data Perturbation for Privacy Preserving Outsourced Data Mining
Document Type
Article
Publication Date
12-2011
Abstract
Data perturbation is a popular technique in privacy-preserving data mining. A major challenge in data perturbation is to balance privacy protection and data utility, which are normally considered as a pair of conflicting factors. We argue that selectively preserving the task/model specific information in perturbation will help achieve better privacy guarantee and better data utility. One type of such information is the multidimensional geometric information, which is implicitly utilized by many data-mining models. To preserve this information in data perturbation, we propose the Geometric Data Perturbation (GDP) method. In this paper, we describe several aspects of the GDP method. First, we show that several types of well-known data-mining models will deliver a comparable level of model quality over the geometrically perturbed data set as over the original data set. Second, we discuss the intuition behind the GDP method and compare it with other multidimensional perturbation methods such as random projection perturbation. Third, we propose a multi-column privacy evaluation framework for evaluating the effectiveness of geometric data perturbation with respect to different level of attacks. Finally, we use this evaluation framework to study a few attacks to geometrically perturbed data sets. Our experimental study also shows that geometric data perturbation can not only provide satisfactory privacy guarantee but also preserve modeling accuracy well.
Repository Citation
Chen, K.,
& Liu, L.
(2011). Geometric Data Perturbation for Privacy Preserving Outsourced Data Mining. Knowledge and Information Systems, 29 (3), 657-695.
https://corescholar.libraries.wright.edu/knoesis/45
DOI
10.1007/s10115-010-0362-4