Archive

A Neural Approach to Spatiotemporal data release with Differential Privacy

Intro - Most existing work on DP-publication of location data focuses on single snapshot releases (Cormode et al., 2012) - To release multiple snapshot, user-level privacy is required - Two key aspects : Bound sensiti...

2024. 1. 16.6 min read

Intro

  • Most existing work on DP-publication of location data focuses on single snapshot releases (Cormode et al., 2012)
  • To release multiple snapshot, user-level privacy is required
  • Two key aspects : Bound sensitivity and Add noise, with denoising step

Preliminaries

Problem Formulation

Goal

Release high-resolution density information of dataset DD.

  • Build a M×M×TM\times M\times T histogram HH over DD, where M,TM,T is determined spatial and temporal resolution parameter.

Queries

  1. Range count queries

    • Given the query range(max,min of lat, lon, time)
    • Return : the number of user location in DD that satisfy this range
    • Metric : relative error metric RE(y,u)=yumax{u,ψ}\text{RE}(y,u) = \dfrac{\vert y-u\vert }{\max\lbrace u,\psi\rbrace }
    • where ψ\psi is smoothing factor to avoid division by zero.
  2. Nearest hotspot queries

    • Given query location qq, density threshold vv, spatio-temporal extent SRSR
    • Return : closest cell to qq within extents SRSR that contains at least vv locations.
    • Metric : MAE between distance H,H^H,\hat H for distance penalty. Regret (deviation of the reported density of the found hotspot from given H^,v\hat H, v) for hotspot density estimation error
    • Regret(q)=0\text{Regret}(q)=0 if the reported hotspot meets the density threshold vv.
  3. Forecasting queries

    • Given timeseries of spatial densities and forecasting horizon hh
    • Return : The prediction of count of location reports for hh future timesteps.
    • Metric : symmetric mean absolute percentage errors
    sMAPE=1ht=1hFtAt(At+Ft)/2 \text{sMAPE} = \frac{1}{h}\sum_{t=1}^{h}\dfrac{\vert F_{t}-A_{t}\vert }{(A_{t}+F_{t})/2}

    where AtA_{t} are the true counts from HH in the hh timesteps and FtF_{t} are the hh predicted counts from a forecasting algorithm fitted to the historical data from H^\hat H.

Data

  • Spatiotemporal data : have density patterns through both temporal and spatial axis. (e.g. weekday vs weekend pattern at commercial area) (Yang et al., n.d.)
  • Distribution across users : power law distribution

VDRVAE-based Density Release

  • Differentially private release of spatiotemporal data

Data collection

  • Goal is to create the ϵ\epsilon-DP histogram (Figure (d)).
  • Laplace mechanism with sensitivity Δ=kmax\Delta=k_\text{max} i.e. the maximum number of points.
Hˉ=H+Laplace(kmaxϵ) \bar H = H + \text{Laplace}\left(\dfrac{k_{\text{max}}}{\epsilon}\right)
  • Bound sensitivity by sampling a maximum of kk points per user.

Learned Denoising

  1. Spatiotemporal dataset are similar to video(seq. of images)
  2. Regularized representation learning > learns denoised representation without overfitting
  3. Multi-resolution learning can capture spatio-temporal patterns at different resolution

Design Principles

  • Denoising with regularized representation learning

    : Derive a denoised histogram H^\hat H from VAE that minimizes reconstruction loss H^Hˉ\Vert \hat H - \bar H\Vert . By setting the dimensionality of representation lower than Hˉ\bar H, the representation will capture the patterns in Hˉ\bar H.

  • Multi-resolution Learning

    : Prepare the training set with rr different resolution scales. Expect to improve denoising accuracy via MRL.

Algorithm

Algorithm Learned DenoisingInput: Set of noisy 2D histograms, HˉOutput: Set of denoised 2D histograms, H^TˉHˉFor j2 to r do:For Hˉi in Hˉ do:HˉijHistogram from aggregating j×j blocks of HˉiTˉTˉHˉijInitialize encoder, decoderWhile convergence do:z,T^encoder(Tˉ:θe),decoder(z:θd)LC(T^)iTˉiT^i2LG(z)d(z,Γ)LαLG+LCθθeθdUpdate θ in direction θL \begin{align} &\textbf{Algorithm } \text{Learned Denoising}\\ &\textbf{Input: } \text{Set of noisy 2D histograms, } \bar H\\ &\textbf{Output: } \text{Set of denoised 2D histograms, } \hat H \\ &\quad \bar T \leftarrow \bar H\\ &\quad \textbf{For } j\leftarrow 2 \text{ to } r \textbf{ do:} \\ &\quad \quad \textbf{For } \bar H_{i}\text{ in } \bar H \textbf{ do:}\\ &\quad \quad \quad \bar H_{i}^{j} \leftarrow \text{Histogram from aggregating } j\times j \text{ blocks of } \bar H_{i} \\ &\quad \quad \quad \bar T \leftarrow \bar T \cup \bar H_{i}^{j}\\ &\quad \text{Initialize encoder, decoder}\\ &\quad \textbf{While } \text{convergence} \textbf{ do:}\\ &\quad \quad z,\hat T \leftarrow \text{encoder}(\bar T:\theta_{e}),\text{decoder}(z:\theta_{d})\\ &\quad \quad L_{C}(\hat T) \leftarrow \sum_{i}\Vert \bar T_{i}-\hat T_{i}\Vert^{2}\\ &\quad \quad L_{G}(z) \leftarrow d(z,\Gamma)\\ &\quad \quad L \leftarrow \alpha L_{G}+L_{C}\\ &\quad \quad \theta \leftarrow \theta_{e}\cup \theta_{d}\\ &\quad \quad \text{Update }\theta \text{ in direction } -\nabla_{\theta}L \end{align}

Model

  • Given a noisy ST histogram Hˉ={Hˉi}i=1T\bar H= \lbrace \bar H_i\rbrace _{i=1}^{T} where Hˉi\bar H_i is a spatial histogram at ii-th timestamp.
  • For regularized representation learning, use Convolutional VAE.
  • encoder(.;θe)\text{encoder}(.;\theta_{e}) : 2D histogram (any resolution) > representation Rn×l\in \mathbb{R}^{n\times l}
  • decoder(.;θd)\text{decoder}(.;\theta_{d}) : representation > 2D histogram
  • Inference : Given Hˉi\bar H_i, obtain H^\hat H (3D histogram).

VQ-VAE (Razavi et al., 2019)

  • Γ={e1,,eB}\Gamma=\lbrace e_{1},\ldots,e_{B}\rbrace : codebook of BB different discrete encodings
  • For training set Tˉ\bar T, encoder(Tˉ:θe)\text{encoder}(\bar T:\theta_{e}) produces a set of representations zz.
  • LG(z)=d(z,Γ)L_{G}(z) = d(z,\Gamma) : regulariaztion loss. distance between representation and codebook
  • LC(T^)=i=1TˉTˉiT^i2L_{C}(\hat T) = \sum_{i=1}^{\vert \bar T\vert }\Vert \bar T_{i}-\hat T_{i}\Vert^{2} : reconstruction loss
  • Optimization becomes:
minθα×LG(z)+LC(T^) \min_\theta \alpha\times L_{G}(z) + L_{C}(\hat T)
  • Parameters
    • BB : variability of the encoding space (representation power)
    • α\alpha : How much the encoder is forced to adhere to the codebook.
    • Experiment : α=1,β=128\alpha=1, \beta=128

Statistical Analysis

Let XicX_{i}^{c} be the Bernoulli random variable indicates whether iith point falls in a cell cc. Suppose the iith point is sampled from Uniform(D)\text{Uniform}(D). Let EXic=μc\mathrm{E}X_{i}^{c}=\mu_{c}. The estimator is given by

θc=γ(iXic+Laplace(kϵ)). \theta_{c}=\gamma \left(\sum_{i}X_{i}^{c}+\text{Laplace}\left( \frac{k}{\epsilon}\right)\right).

Then, the bias and variance can be calculated as follows.

Bias(θc)=E[θcNμc]=μc(γnN)Var(θc)=γ2(nμc(1μc+2k2ϵ2) \begin{align} \text{Bias}(\theta_{c}) &= \mathrm{E}[\theta_{c}-N\mu_{c}] =\mu_{c}(\gamma n-N)\\ \mathrm{Var}\left(\theta_{c}\right)&= \gamma^{2}\left(n\mu_{c}(1-\mu_{c}+2k^{2}\epsilon^{-2}\right) \end{align}

where N=DN=\vert D\vert is the total number of data points and n=Dsn=\vert D_{s}\vert is the total number of sampled data. Thus, the MSE is given by

MSE(θc)=γ2(nμc(1μc)+2k2ϵ2)+μc2(γnN)2. \text{MSE}(\theta_{c}) = \gamma^{2}\left(n\mu_{c}(1-\mu_{c})+2k^{2}\epsilon^{-2}\right) + \mu_{c}^{2}(\gamma n-N)^{2}.

By differentiating w.r.t. γ\gamma, we obtain

γ=nNC2mk2ϵ2+(1C)n+Cn2 \gamma = \dfrac{nNC}{2mk^{2}\epsilon^{-2}+(1-C)n+Cn^{2}}

minimizes cMSE(θc)\sum_{c}\text{MSE}(\theta_{c}) where C=c=1mμc2C=\sum_{c=1}^{m}\mu_{c}^{2} is a data-dependent constant.

Theorem

The following VDR algorithm is ϵ\epsilon-DP.

Algorithm VDRInput: Dataset D,ϵ, discretization resolution parameter M,T,k,COutput: Privatized 3D histogram H^ of DDssample k points per user in DHsM×M×T histogram of DsHˉ=Hs+Laplace(kϵ)H^Denoise Hˉ via VAEReturn γC×H^ \begin{align} &\textbf{Algorithm } \text{VDR}\\ &\textbf{Input: } \text{Dataset } D, \epsilon, \text{ discretization resolution parameter } M,T, k,C\\ &\textbf{Output: } \text{Privatized 3D histogram } \hat H \text{ of } D\\ &\quad D_{s} \leftarrow \text{sample } k \text{ points per user in } D\\ &\quad H_{s} \leftarrow M\times M\times T \text{ histogram of } D_{s}\\ &\quad \bar H = H_{s} + \text{Laplace}\left( \frac{k}{\epsilon}\right)\\ &\quad \hat H \leftarrow \text{Denoise } \bar H \text{ via VAE}\\ &\textbf{Return } \gamma C\times \hat H \end{align}

References

  • Ahuja, R., Zeighami, S., Ghinita, G., & Shahabi, C. (2023). A Neural Approach to Spatio-Temporal Data Release with User-Level Differential Privacy. Proceedings of the ACM on Management of Data, 1(1), 1–25. https://doi.org/10.1145/3588701
  • Razavi, A., Oord, A. van den, & Vinyals, O. (2019). Generating Diverse High-Fidelity Images with VQ-VAE-2 (arXiv:1906.00446). arXiv. https://doi.org/10.48550/arXiv.1906.00446
  • Cormode, G., Procopiuc, C., Srivastava, D., Shen, E., & Yu, T. (2012). Differentially Private Spatial Decompositions. 2012 IEEE 28th International Conference on Data Engineering, 20–31. https://doi.org/10.1109/ICDE.2012.16