Archive
Differentially Private Grids for Geospatial Data
Summarize - Uniform grid approach for balancing the noise error and the non-uniformity error - Proposed a method for choosing the grid size. - Traditional differentially private methods for 2D datasets : quadtrees, kd...
Summarize
- Uniform grid approach for balancing the noise error and the non-uniformity error
- Proposed a method for choosing the grid size.
- Traditional differentially private methods for 2D datasets : quadtrees, kd-trees (Cormode et al., 2012; Xiao et al., 2010)
- Uniform grid method : apply grid on domain and execute count queries
- Key problem is the best grid size
- Proposed where is the number of data and is some small const.
Problem formulation
- : 2-dimensional geospatial dataset
- Goal : publish a synopsis of the dataset to accurately answer count queries
- Count query : specifies a grid and asks for the number of data points in that grid
- Query rectangle : A set of grid asked by a query
Two sources of error
Noise error
- To satisfy differential privacy, one adds an independent noise to each cell.
- The answer for count query then has sum of the noisy counts.
- If the noise has variance and we count cells then the standard deviation of answer becomes .
- That is, if we make the resolution higher, more grids are contained for same answer, which results larger variance.
Non-uniformity error
- Caused by cells that intersect with the query rectangle but not contained in it.
- If the data points are not distributed uniformly, the non-uniformity error occurs.
- It depends on the number of data points in the intersected cell
- If we make the resolution higher, the non-uniformity error gets lower.
Previous approaches
Recursive partitioning
- (Xiao et al., 2010) proposed spatial indexing method, KD-trees
- KD tree recursively split along some dimension. Split points were chose to have an uniformity among subregions.
- (Cormode et al., 2012) proposed some method based on KD-trees, using the median of the partiton dimension as the split points.
- or based on quadtree, which recursively divides the nodes into four equal regions.
Hierarchical Transformations
- Recursive partitioning essentially builds hierarchy over data points.
- (Xiao et al., 2011) proposed the Privlet method using wavelet transformation.
- Applies a Haar wavelet transformation to the frequency matrix of the dataset.
Methodology
Uniform Grid method
-
Partitions the data domain into grid cells of equal size.
-
The grid size(resolution) is recommended to be
where is the number of data points. worked well.
Proof
-
Sensitivity of count query is .
-
Thus the noise follows the Laplace distribution , with standard deviation of .
-
For grid, suppose a query selects proportion of the domain.
-
Then, the query has standard deviation of .
-
For proportion query, there are cells along the border of the query.
-
That is, it includes on the order of cells, which results the non-uniformity error for some constant .
Thus the error becomes
and to minimize it, we should pick as equation where .
Adaptive Grid method
- The main disadvantage of UG is that we treat each region without concerning their sparsity.
- Instead of making equipartition, we first lay a coarse grid over the data.
- Then we issue a count query for each cell using a privacy budget , where .
- AG then again partitions each cell based on which is the noisy count of the cell.
Constrained Inference
Let be the noisy count of a specific first-level cell at AG method. Then, count is partitioned into counts ( second-level grid). Then, one obtains more accurate count by
This value goes to the following update:
For AG, the second-level grid resolution becomes
where indicates the remaining privacy budget,
Choosing is not that important than since the density of first-level grid affects on the second-level partition. But should not be too small, thus it was set to
Also was set to .
Error
For query , denotes the correct answer and denotes the answered value using the histogram using method . Then, the relative error is defined as
where were set to avoid division by zero.
Experiment (GPT summarized)
In the experimental results section, the authors conducted extensive experiments using four real datasets to compare the accuracy of different methods and validate their analysis of parameter choices. The datasets include road intersections GPS coordinates, check-ins from a location-based social networking website, landmarks in the United States, and US storage facility locations.
The relative error is primarily considered, defined as the absolute error divided by the maximum of the correct answer or a threshold value. The experiments involve various query sizes, and the errors are computed for randomly generated queries per size. Two values of epsilon () are used ( and ) for privacy considerations.
The authors compare KD-tree-based methods (KD-standard and KD-hybrid) with Uniform Grid (UG), Adaptive Grid (AG), and Privlet methods. They also explore the effect of adding hierarchies to UG and evaluate adaptive grids' performance with varying parameters. The evaluation metrics include line graphs for the arithmetic mean of relative errors and candlesticks for a clearer comparison among different algorithms.
Results show that choosing an appropriate grid size is crucial, and the suggested sizes by the authors generally align well with experimentally observed optimal sizes. The authors compare KD-hybrid with UG and find that UG tends to perform better in most cases. Additionally, the authors investigate the impact of adding hierarchies and find that while it can improve accuracy, the benefit is relatively small.
The AG method consistently outperforms other methods, including KD-hybrid, Quadopt, UG with the lowest observed relative error size, and Privlet on the same grid size. The AG method with suggested grid sizes performs competitively with the experimentally observed best grid sizes. The authors also conduct a final comparison using absolute error, confirming that AG consistently and significantly outperforms other methods across various datasets and privacy settings.
References
-
G. Cormode, C. Procopiuc, D. Srivastava, E. Shen, and T. Yu, “Differentially Private Spatial Decompositions,” in 2012 IEEE 28th International Conference on Data Engineering, Arlington, VA, USA: IEEE, Apr. 2012, pp. 20–31. doi: 10.1109/ICDE.2012.16.
-
Y. Xiao, L. Xiong, and C. Yuan, Differentially Private Data Release through Multidimensional Partitioning. 2010, p. 168. doi: 10.1007/978-3-642-15546-8_11.
-
W. Qardaji, Weining Yang, and Ninghui Li, “Differentially private grids for geospatial data,” in 2013 IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, QLD: IEEE, Apr. 2013, pp. 757–768. doi: 10.1109/ICDE.2013.6544872.