Archive

Differentially Private Markov Random Field

Introduction Markov random field Let $G=(V,E)$ be a graph with $p$ nodes and $C{t}(G)$ be the set of cliques of size at most $t$ in $G$. A Markov random field with alphabet size $k$ and $t$-order interactions is a dis...

2024. 1. 18.6 min read

Introduction

Markov random field

Let G=(V,E)G=(V,E) be a graph with pp nodes and Ct(G)C_{t}(G) be the set of cliques of size at most tt in GG. A Markov random field with alphabet size kk and tt-order interactions is a distribution D\mathcal{D} over [k]p[k]^{p} such that

p(x)exp(cCt(G)ψc(xc)).(1) p(\mathbf{x}) \propto \exp \left(\sum_{c\in C_{t}(G)}\psi_{c}(\mathbf{x}_{c})\right). \tag{1}

where xc\mathbf{x}_{c} is the subset of x\mathbf{x} that belongs to clique cc. The equation (1)(1) is also known as Hammersley-Clifford theorem.

  • t=2t=2 : MRF is pairwise.
  • k=2k=2 : binary MRF
  • k=t=2k=t=2 : Ising model (binary and pairwise)

It is easy to think that for pairwise MRF, ψc(xc)\psi_{c}(\mathbf{x}_{c}) is given as wixixjw_{i}x_{i}x_{j} for xc={xi,xj}\mathbf{x}_{c}=\lbrace x_{i},x_{j}\rbrace since the maximal clique size is 22.

Learning problem

  1. Structure learning : Recover the set of non-zero edges in GG (learning adjacency).
  2. Parameter learning : Learn the structure and ψI\psi_{I} for all cliques II of size at most tt.

Preliminaries

Markov random field

Ising model

The pp-variable Ising model is a distribution D(A,θ)\mathcal{D}(A,\theta) on {1,1}p\lbrace -1,1\rbrace ^{p} that satisfies

p(x)exp(1ijpAi,jxixj+i[p]θixi), p(\mathbf{x}) \propto \exp \left(\sum_{1\leq i\leq j\leq p}A_{i,j}x_{i}x_{j}+\sum_{i\in [p]}\theta_{i}x_{i}\right),

where AA is symmetric weight matrix with Aii=0A_{ii}=0 for i=1,,pi=1,\ldots,p and θRp\theta\in \mathbb{R}^{p} is a mean-field vector. The corresponding graph is MRF G=(V,E)G=(V,E).

The width of D(A,θ)\mathcal{D}(A,\theta) is defined as

λ(A,θ)=maxi[p](j[p]Ai,j+θi) \lambda(A,\theta) = \max_{i\in[p]} \left(\sum_{j\in[p]} \vert A_{i,j}\vert +\vert \theta_{i}\vert \right)

The minimum edge weight is defined as

η(A,θ)=mini,j[p],Ai,j0Ai,j \eta(A,\theta) = \min_{i,j\in[p], A_{i,j}\ne 0}|A_{i,j}|

Pairwise graphical model

For a set of weight matrices W\mathcal{W}, the pp-variable pairwise graphical model is a distribution D(W,θ)\mathcal{D}(\mathcal{W},\theta) on [k]p[k]^{p} that satisfies

p(x)exp(1ijpWi,j(xi,xj)+i[p]θi(xi)), p(\mathbf{x})\propto \exp \left(\sum_{1\le i\le j\le p}W_{i,j}(x_{i},x_{j})+\sum_{i\in[p]}\theta_{i}(x_{i})\right),

and Θ={θiRk:i[p]}\Theta=\lbrace \theta_{i}\in \mathbb{R}^{k}:i\in[p]\rbrace is a set of mean-field vectors. The width and minimum edge weight are defined as below.

λ(W,θ)=maxi[p],a[k](j[p]\imaxb[k]Wi,j(a,b)+θi(a))η(W,Θ)=min(i,j)Emaxa,bWi,j(a,b) \begin{align} \lambda(\mathcal{W},\theta) &= \max_{i\in[p],a\in[k]}\left(\sum_{j\in[p]\backslash i}\max_{b\in[k]}\vert W_{i,j}(a,b)\vert + \vert\theta_{i}(a)\vert \right)\\ \eta(\mathcal{W},\Theta) &= \min_{(i,j)\in E} \max_{a,b}\vert W_{i,j}(a,b)\vert \end{align}

Parameter learning

Given samples X1,,XnDX_{1},\ldots,X_{n}\sim \mathcal{D}, the parameter learning algorithm outputs a matrix A^\hat A such that

maxi,j[p]Ai,jA^i,jα. \max_{i,j\in[p]}\vert A_{i,j}-\hat A_{i,j}\vert\leq \alpha.

Privacy

Concentrated Differential Privacy(zCDP)

A randomized algorithm A:XnS\mathcal{A}:\mathcal{X}^{n}\to\mathcal{S} satisfies ρ\rho-zCDP if for every neighboring datasets X,XXnX,X'\in\mathcal{X}^{n},

α(1,)Dα(M(X)M(X))ρα \forall \alpha\in (1,\infty)\quad D_{\alpha}\left(M(X)\Vert M(X')\right) \leq \rho\alpha

where DαD_{\alpha} is the α\alpha-Rényi divergence.

The Rényi divergence of order α\alpha (or called α\alpha-divergence) is defined as below.

Dα(PQ)=1α1log(i=1npiαqiα1) D_\alpha(P\Vert Q) = \frac{1}{\alpha-1}\log \left(\sum_{i=1}^{n} \frac{p_{i}^{\alpha}}{q_{i}^{\alpha-1}}\right)

where P,QP,Q are discrete probability distribution with pi=Pr(X=xi),XPp_{i}=\Pr(X=x_{i}), X\sim P.

(Bun & Steinke, 2016)

  1. If A\mathcal{A} satisfies ϵ\epsilon-DP, then A\mathcal{A} is ϵ22\dfrac{\epsilon^{2}}{2}-zCDP.
  2. If A\mathcal{A} satisfies ϵ22\dfrac{\epsilon^{2}}{2}-zCDP, then A\mathcal{A} satisfies (ϵ22+ϵ2log(1δ),δ)\left(\dfrac{\epsilon^{2}}{2}+\epsilon\sqrt{2\log(\dfrac{1}{\delta}}),\delta\right)-DP for every δ>0\delta>0.

(Dwork et al., 2010)

For a sequence of DP algorithms A1,,AT\mathcal{A}_{1},\cdots,\mathcal{A}_{T} and A=ATA1\mathcal{A}=\mathcal{A}_{T}\circ\cdots\circ \mathcal{A}_{1}, the following hold.

  1. If A1,,AT\mathcal{A}_{1},\cdots,\mathcal{A}_{T} are (ϵ0,δ1),,(ϵ0,δT)(\epsilon_{0},\delta_{1}),\cdots,(\epsilon_{0},\delta_{T})-DP respectively, then for every δ0>0\delta_{0}>0, A\mathcal{A} is (ϵ,δ)(\epsilon,\delta)-DP for ϵ=ϵ06Tlog(1δ0)\epsilon=\epsilon_{0}\sqrt{6T\log(\frac{1}{\delta_{0}})} and δ=δ0+tδt\delta=\delta_{0}+\sum_{t}\delta_{t}.
  2. If A1,,AT\mathcal{A}_{1},\cdots,\mathcal{A}_{T} are ρ1,,ρTz\rho_{1},\cdots,\rho_{T}-zCDP respectively, then A\mathcal{A} is ρ\rho-zCDP for ρ=tρt\rho=\sum_{t}\rho_{t}.

Parameter Learning of Pairwise MRF

Private sparse logistic regression

Consider training data D={d1,dn}iidPD=\lbrace d^{1},\ldots d^{n}\rbrace \overset{iid}{\sim} P where di=(xi,yi),xi1,yi{1,1}d^{i}=(x^{i},y^{i}), \Vert x^{i}\Vert_{\infty}\leq 1, y^{i}\in\lbrace -1,1\rbrace . To minimize population logistic loss E[log(1+eYw,X]\mathrm{E}\left[\log(1+e^{-Y\langle w,X\rangle}\right], consider the following empirical risk minimization.

w=argminwL(w:D)=1njlog(1+eyjw,xj) w = \mathop{\arg\min}\limits_{w} \mathcal{L}(w:D) = \frac{1}{n}\sum_{j}\log \left(1+e^{-y^{j}\langle w,x^{j}\rangle}\right)

Then, the following algorithm satisfies ρz\rho-zCDP. At spatial statistics this model is so-called as auto-logistic model.

Algorithm

Algorithm Private Frank-Wolfe algorithmInput: D,L,convex set C={wRp:w1λ}For t=1 to T1 do:sS,αss,L(w;D)+Laplace(0,L1C1Tnρ)w~targminsSαswt+1=(1μt)wt+μtw~twhere μt=2t+2end forOutput: wpriv=wT \begin{align} &\textbf{Algorithm } \text{Private Frank-Wolfe algorithm}\\ &\textbf{Input: } \text{D}, \mathcal{L}, \text{convex set }\mathcal{C}=\lbrace w\in \mathbb{R}^{p}:\Vert w\Vert_{1}\le \lambda\rbrace \\ &\textbf{For } t=1 \text{ to } T-1 \textbf{ do:}\\ &\quad \forall s\in S, \alpha_{s} \leftarrow \langle s,\nabla \mathcal{L}(w;D)\rangle+\text{Laplace}\left(0, \dfrac{L_{1}\Vert C\Vert_{1}\sqrt{T}}{n\sqrt{\rho}}\right)\\ &\quad \tilde w_{t}\leftarrow \mathop{\arg\min}\limits_{s\in S}\alpha_{s}\\ &\quad w_{t+1}=(1-\mu_{t})w_{t}+ \mu_{t}\tilde w_{t}\quad \text{where } \mu_{t}=\frac{2}{t+2}\\ &\textbf{end for}\\ &\textbf{Output: } w^{priv} = w_{T} \end{align}

Theorem

For wprivw^{priv} from algorithm above, with probability at least 1β1-\beta the following holds.

E[l(wpriv;(X,Y))]minwCE[l(w;(X,Y))]O(λ43log(npβ)(nρ)23+λlog(1β)n) \mathrm{E}\left[l(w^{priv};(X,Y))\right]-\min_{w\in C}\mathrm{E}\left[l(w;(X,Y))\right] \le O \left( \frac{\lambda^{\frac{4}{3}}\log\left(\frac{np}{\beta}\right)}{(n\sqrt{\rho})^{\frac{2}{3}}} + \frac{\lambda\log(\frac{1}{\beta})}{\sqrt{n}} \right)

Privately Learning Ising Models

Lemma (Klivans & Meka, 2017)

Let ZD(A,θ)Z\sim \mathcal{D}(A,\theta) and Z{1,1}pZ\in\lbrace -1,1\rbrace ^{p}, then for all i[p]i\in[p],

Pr(Zi=1Zi=x)=σ(ji2Ai,jxj+2θi)=:σ(w,x) \Pr(Z_{i}=1\vert Z_{-i}=x) = \sigma \left(\sum_{j\ne i}2A_{i,j}x_{j}+2\theta_{i}\right) =: \sigma(\langle w,x'\rangle)

From this lemma, it is possible to estimate the weight matrix by solving logistic regression for each node.

Algorithm

Algorithm Privately Learning Ising ModelsInput: {z1,,zn}, upper bound on λ(A,θ)λ, privacy parameter ρFor i=1 to p do:m[n],xm=(zim,1),ym=zimwprivAPFW(D,L,ρ,C),ρ=ρpjp,A^i,j=12wj~priv  where j~=j1(j>i)end forOutput: A^Rp×p \begin{align} &\textbf{Algorithm } \text{Privately Learning Ising Models}\\ &\textbf{Input: } \lbrace z^{1},\cdots,z^{n}\rbrace ,\text{ upper bound on }\lambda(A,\theta)\le \lambda,\text{ privacy parameter }\rho\\ &\textbf{For } \text{$i=1$ to $p$} \textbf{ do:}\\ &\quad \forall m\in[n],x^{m}=(z_{-i}^{m},1), y^{m}=z_{i}^{m}\\ &\quad w^{priv}\leftarrow \mathcal{A}_{PFW}(D,\mathcal{L},\rho',\mathcal{C}),\quad \rho'=\frac{\rho}{p}\\ &\quad \forall j\in p, \hat A_{i,j}=\frac{1}{2}w_{\tilde j}^{priv}\;\text{where } \tilde j=j - \mathbf{1}_{(j>i)}\\ &\textbf{end for}\\ &\textbf{Output: } \hat A \in \mathbb{R}^{p\times p} \end{align}

It can be proved that the algorithm above satisfies ρ\rho-zCDP.

Privately Learning general pairwise model

Given nn i.i.d samples {z1,,zn}\lbrace z^{1},\cdots,z^{n}\rbrace drawn from unknown D(W,Θ)\mathcal{D}(\mathcal{W},\Theta), the purpose is to design ρ\rho-zCDP estimator W^\hat{\mathcal{W}} such that w.p. at least 23\frac{2}{3},

Wi,j(u,v)W^i,j(u,v)α \left\vert W_{i,j}(u,v)-\hat W_{i,j}(u,v)\right\vert\leq \alpha

for ij[p]i\ne j\in[p] and u,v[k]\forall u,v \in[k].

Lemma

Like the Ising model, a pairwise MRF ZD(W,Θ)Z\sim \mathcal{D}(\mathcal{W},\Theta) shows the following property.

Pr(Zi=uZi{u,v},Zi=x)=σ(ji(Wi,j(u,xj)Wi,j(v,xj))+θi(u)θi(v)) \Pr(Z_{i}=u\vert Z_{i}\in\lbrace u,v\rbrace ,Z_{-i}=x) = \sigma \left(\sum_{j\neq i} \left(W_{i,j}(u,x_{j})-W_{i,j}(v,x_{j})\right)+\theta_{i}(u)-\theta_{i}(v)\right)

for any i[p]i\in[p] and uv[k]u\neq v\in[k].

The implementation algorithm is based on the private FW algorithm, and it satisfies ρ\rho-zCDP.

Structure Learning of Graphical Models

Theorem (Wu et al., 2019)

There exists an algorithm that learns the structure of a pairwise graphical model w.p. at least 23\frac{2}{3}. It requires n=O(λ2k4e14λlog(pk)η4)n=O \left(\dfrac{\lambda^{2}k^{4}e^{14\lambda}\log(pk)}{\eta^{4}}\right) samples.

From the theorem above, for (ϵ,δ)(\epsilon,\delta)-DP structure learning algorithm the sample size is given as follows.

n=O(λ2k4log(pk)log(1δ)ϵη4) n = O \left(\dfrac{\lambda^{2}k^{4}\log(pk)\log(\frac{1}{\delta})}{\epsilon\eta^{4}}\right)

Lower Bounds

Theorem

Any ϵ\epsilon-DP structure learning algorithm of an Ising model with minimum edge weight η\eta requires

n=Ω(pηϵ+pϵ) n = \Omega \left(\frac{\sqrt{p}}{\eta \epsilon}+ \frac{p}{\epsilon}\right)

samples. Also, for ρ\rho-zCDP algorithm n=Ω(pρ)n=\Omega \left(\sqrt{\dfrac{p}{\rho}}\right) samples are required.

For pp-variable pairwise MRF, ϵ\epsilon-DP algorithm requires n=Ω(pηϵ+k2pϵ)n=\Omega \left(\dfrac{\sqrt{p}}{\eta\epsilon}+ \dfrac{k^{2}p}{\epsilon}\right) samples and ρ\rho-zCDP algorithm requires n=Ω(k2pρ)n=\Omega \left(\sqrt{\dfrac{k^{2}p}{\rho}}\right) samples.

References

  • C. Dwork, G. N. Rothblum, and S. Vadhan, “Boosting and Differential Privacy,” in 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, Las Vegas, NV, USA: IEEE, Oct. 2010, pp. 51–60. doi: 10.1109/FOCS.2010.12.
  • H. Zhang, G. Kamath, J. Kulkarni, and S. Wu, “Privately learning Markov random fields,” in International conference on machine learning, PMLR, 2020, pp. 11129–11140. Accessed: Jan. 17, 2024. [Online]. Available: http://proceedings.mlr.press/v119/zhang20l.html
  • S. Wu, S. Sanghavi, and A. G. Dimakis, “Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models.” arXiv, Jun. 18, 2019. doi: 10.48550/arXiv.1810.11905.