Abstract
Affiliation network is one kind of two-mode social network with two different sets of nodes (namely, a set of actors and a set of social events) and edges representing the affiliation of the actors with the social events. The asymptotic theorem of a differentially private estimator of the parameter in the private \(p_{0}\) model has been established. However, the \(p_{0}\) model only focuses on binary edges for one-mode network. In many case, the connections in many affiliation networks (two-mode) could be weighted, taking a set of finite discrete values. In this paper, we derive the consistency and asymptotic normality of the moment estimators of parameters in affiliation finite discrete weighted networks with a differentially private degree sequence. Simulation studies and a real data example demonstrate our theoretical results.
Similar content being viewed by others
References
Albert R, Barabási A (2002) Statistical mechanics of complex networks. Rev Mod Phys 74(1):47–97
Bickel PJ, Chen A, Levina E et al (2011) The method of moments and degree distributions for network models. Ann Stat 39(5):2280–2301
Blitzstein J, Diaconis P (2011) A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Internet Math 6(4):489–522
Britton T, Deijfen M, Martin-Löf A (2006) Generating simple random graphs with prescribed degree distribution. J Stat Phys 124(6):1377–1397
Chatterjee S, Diaconis P, Sly A (2011) Random graphs with a given degree sequence. Ann Appl Probab 21:1400–1435
Cutillo LA, Molva R, Strufe T (2010) Privacy preserving social networking through decentralization
Doreian P, Batagelj V, Ferligoj A (1994) Partitioning networks based on generalized concepts of equivalence. J Math Sociol 19(1):1–27
Dwork C, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd theory of cryptography conference, pp 265–284
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. Theory of cryptography conference. Springer, Berlin, pp 265–284
Dzemski A (2017) An empirical model of dyadic link formation in a network with unobserved heterogeneity. Working Papers in Economics
Fan Y, Zhang H, Yan T (2020) Asymptotic theory for differentially private generalized \(\beta \)-models with parameters increasing. Stat Interface 13(3):385–398
Fienberg SE (2012) A brief history of statistical models for network analysis and open challenges. J Comput Graph Stat 21(4):825–839
Graham BS (2017) An econometric model of network formation with degree heterogeneity. Econometrica 85(4):1033–1063
Hay M, Miklau GJD (2010) Privacy-aware knowledge discovery: novel applications and new techniques. CRC Press, Boca Raton, pp 459–498
Hay M, Li C, Miklau G, Jensen D (2009) Accurate estimation of the degree distribution of private networks. In: 2009 Ninth IEEE International Conference on Data Mining, pp 169–178. IEEE
He X, Chen W, Qian W (2020) Maximum likelihood estimators of the parameters of the log-logistic distribution. Stat Pap 61(5):1875–1892
Hillar C, Wibisono A(2013) Maximum entropy distributions on graphs. arXiv:1301.3321
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301):13–30
Holland PW, Leinhardt S (1981) An exponential family of probability distributions for directed graphs. J Am Stat Assoc 76(373):33–50
Karwa V, Slavković A (2016) Inference using noisy degrees: differentially private \(beta\)-model and synthetic graphs. Ann Stat 44(1):87–112
Karwa V, Slavković A et al (2016) Inference using noisy degrees: differentially private \(\beta \)-model and synthetic graphs. Ann Stat 44(1):87–112
Kasiviswanathan SP, Nissim K, Raskhodnikova S, Smith A (2013) Analyzing graphs with node differential privacy. Theory of cryptography conference. Springer, Berlin, pp 457–476
Lang S (1993) Real and functional analysis. Springer, New York
Loeve M (1977) Probability theory, 4th edn. Springer, New York
Lu W, Miklau G (2014) Exponential random graph estimation under differential privacy. In: In proceedings of the 20th ACM SIGKDD international conference on Knowlege discovery and data mining
Mosler K (2017) Ernesto Estrada and Philip A. Knight (2015): a first course in network theory. Stat Pap 58(4):1283–1284
Nissim K, Raskhodnikova S, Smith A (2007) Smooth sensitivity and sampling in private data analysis. In: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pp 75–84
Pan L, Yan T (2019) Asymptotics in the \(beta\)-model for networks with a differentially private degree sequence. Commun Stat 49:4378–4393
Snijders TA, Lomi A, Torló VJ (2013) A model for the multiplex dynamics of two-mode and one-mode networks, with an application to employment preference, friendship, and advice. Soc Netw 35(2):265–276
Su L, Qian X, Yan T (2018) A note on a network model with degree heterogeneity and homophily. Stat Probab Lett 138:27–30
Vershynin R, Eldar Y (2012) Compressed sensing, theory and applications. Cambridge University Press, Cambridge
Wang Q, Yan T, Leng C, Zhu J (2020) Two- mode networks: inference with as many parameters as actors and differential privacy
Yan T (2020) Directed networks with a differentially private bi-degree sequence. Stat Sin. https://doi.org/10.5705/ss.202019.0215
Yan T, Xu J (2013) A central limit theorem in the \(\beta \)-model for undirected random graphs with a diverging number of vertices. Biometrika 100:519–524
Yan T, Zhao Y, Qin H (2015) Asymptotic normality in the maximum entropy models on graphs with an increasing number of parameters. J Multivar Anal 133:61–76
Yan T, Leng C, Zhu J (2016) Asymptotics in directed exponential random graph models with an increasing bi-degree sequence. Ann Stat 44(1):31–57
Yan T, Jiang B, Fienberg SE, Leng C (2019) Statistical inference in a directed network model with covariates. J Am Stat Assoc 114(526):857–868
Yuan M, Chen L, Yu PS (2011) Personalized privacy protection in social networks. Proc Vldb Endow 4(2):141–150
Zhang Y, Chen S, Hong Q, Yan T (2016) Directed weighted random graphs with an increasing bi-degree sequence. Stat Probab Lett 119:235–240
Zhang Y, Qian X, Qin H, Yan T (2017) Affiliation network with an increasing degree sequence. arXiv:1702.01906
Zhao Y, Levina E, Zhu J et al (2012) Consistency of community detection in networks under degree-corrected stochastic block models. Ann Stat 40(4):2266–2292
Zhou T, Ren J, Medo M, Zhang Y (2007) Bipartite network projection and personal recommendation. Phys Rev E 76(4):046115
Zhou B, Pei J, Luk WS (2008) A brief survey on anonymization techniques for privacy preserving publishing of social network data. Acm Sigkdd Explor Newsl 10(2):12–22
Acknowledgements
We are very grateful to two anonymous referees and the Editor for their valuable comments that have greatly improved the manuscript. Luo’s research is partially supported by National Natural Science Foundation of China(No.11801576) and by the Fundamental Research Funds for the Central Universities(South-Central University for Nationalities(CZQ19010)).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
In this appendix, we will present the proofs of Theorems 1 and 2. We start with some preliminaries. For a vector \({\mathbf {x}}=(x_1,\dots ,x_n)^{T}\in \mathbb {R}^n\), denote the \(\ell _{\infty }\) norm of \({\mathbf {x}}\) by \(\Vert {\mathbf {x}}\Vert _{\infty }=\max _{1\le i\le n}\mid x_i\mid \). For an \(n\times n\) matrix \(J=(J_{i,j}), \Vert J\Vert _{\infty }\) denotes the matrix norm induced by the \(\Vert \cdot \Vert _{\infty }\)-norm on vectors in \(\mathbb {R}^n\):
Let D be an open convex subset of \(\mathbb {R}^n\). We say an \(n\times n\) function matrix \(F({\mathbf {x}})\) whose elements \(F_{ij}({\mathbf {x}})\) are functions on vectors \({\mathbf {x}}\), is Lipschitz continuous on D if there exists a real number \(\lambda \) such that for any \({\mathbf {v}}\in R^n\) and any \(\mathbf {x,y}\in D,\)
where \(\lambda \) may depend on n but independent of \({\mathbf {x}}\) and \({\mathbf {y}}\). For fixed \(n,\lambda \) is a constant.
We present Lemma 1–3 in Yan et al. (2016) stated as three lemmas here, which will be used in the proofs.
Lemma 2
If \(V\in {{\mathcal {L}}}_{mn}(q,Q)\) with \(Q/q=o(n),\) then for large enough n,
where \(c_1\) is a constant that dose not depend on M, m and n, and \(\Vert A \Vert := \max _{i,j}|a_{i,j}|\) for a general matrix \(A=(a_{i,j})\).
Note that if Q and q are bounded constants, then the upper bound of the above approximation error is on the order of \((mn)^{-1},\) indicating that S is a high-accuracy approximation to \(V^{-1}.\) Further, based on the above proposition, we immediately have the following lemma.
Lemma 3
If \(V\in {{\mathcal {L}}}_{mn}(q,Q)\) with \(Q/q=o(n),\) then for a vector \({\mathbf {x}}\in R^{m+n-1},\)
where \(x_{m+n}:=\sum _{i=1}^{m}x_{i}-\sum _{i=m+1}^{m+n-1}x_{i}.\)
Lemma 4
Define a system of equations:
where \(f(\cdot )\) is a continuous function with the third derivative. Let \(D\subset \mathbb {R}^{m+n-1}\) be a convex set and assume for any \(\mathbf {x,y,v}\in D\), we have
where \(F^{'}({\varvec{\theta }})\) is the Jacobin matrix of F on \({\varvec{\theta }}\) and \(F^{'}_{i}({\varvec{\theta }})\) is the gradient function of \(F_i\) on \({\varvec{\theta }}.\) Consider \({\varvec{\theta }}^{(0)}\in D\) with \(\Omega ({\varvec{\theta }}^{(0)},2\xi )\subset D\) where \(\xi =\parallel [F^{'}({\varvec{\theta }}^{(0)})]^{-1}F({\varvec{\theta }}^{(0)})\parallel _{\infty }\) for any \({\varvec{\theta }} \in \Omega ({\varvec{\theta }}^{(0)},2\xi ),\) we assume
For \(k=1,2,\dots ,\) define the Newton iterates \({\varvec{\theta }}^{(k+1)}={\varvec{\theta }}^{(k)}-[F^{'}({\varvec{\theta }}^{(k)})]^{-1}F({\varvec{\theta }}^{(k)}).\) Let
If \(\xi \rho <1/2,\) then \({\varvec{\theta }}^{(k)} \in \Omega ({\varvec{\theta }}^{(0)},2r),k=1,2,\dots ,\) are well defined and satisfy
Further, \(\lim _{k\rightarrow \infty }{\varvec{\theta }}^{(k)}\) exists and the limiting point is precisely the solution of \(F({\varvec{\theta }})=0\) in the rage of \({\varvec{\theta }} \in \Omega ({\varvec{\theta }}^{(0)},2\xi ).\)
1.1 Appendix A: Proof for Theorem 1
We define a system of functions:
Note that the solution to the equation \(F({\varvec{\theta }})=0\) is precisely the private-parameter estimators. Then the Jacobin matrix \(F^{'}({\varvec{\theta }})\) of \(F({\varvec{\theta }})\) can be calculated as follows. For \(i=1,\dots ,m,\)
and for \(j=1,\dots ,n-1\)
Since \(2\Vert \varvec{\theta }\Vert _{\infty } \ge \alpha _{i} + \beta _{j}\), we have:
Therefore, we have
Combing the above three inequalities, it yields such that
Here, the notation “\(\sum _{0\le k \ne l\le r-1}\)” is a shorthand for the double summation “\(\sum _{k=0}^{r-1}\sum _{l=0, l\ne k}^{r-1}\)”. Therefore,
On the other hand, it is easy to verify that
Consequently, when \(\varvec{\theta } \in \Omega (\varvec{\theta }^{*},2\xi )\), for any \(i \ne j\) , we have
According to definition of \(L_{mn}(q,Q)\), we have that \(-F^{'}_{i,j} \in L_{mn}(q,Q)\), where
The constants \(K_{1}, K_{2}\) and r in the upper bounds of Lemma 4 are given below.
Lemma 5
Take \(D = R^{m+n-1}\) and \(\varvec{\theta }^{0} = \varvec{\theta }^{*}\) in Lemma 4. Assume
Then we can choose the constants K1, K2 and r in Lemma 4 as
where \(c_{11},c_{12}\) are constants.
Proof
For fixed m, n, we first derive \(K_1\) and \(K_2\) in the inequalities of Lemma 4. Let \({\mathbf {x}},{\mathbf {y}}\in R^{m+n-1}\) and
Then, for \(i=1,\dots ,m\), we have
Since \(k+l-2s \le 2(r-1)\) and
We have
By the mean value theorem for vector-valued functions (Lang 1993, p.341), we have
where
Therefore,
Similarly, for \(i=m+1,\dots ,m+n-1,\) we also have \(F^{'}_i({\mathbf {x}})-F^{'}_i({\mathbf {y}})=J^{(i)}({\mathbf {x}}-{\mathbf {y}})\) and \(\sum ^{}_{s,l}\mid J^{(i)}_{(s,l)}\mid \le 4(m-1)(r-1)^{3}.\)
Consequently,
and for \(\forall ~{\mathbf {v}}\in R^{m+n-1}\),
so we choose \(K_1=4(n-1)(r-1)^{3}\) and \(K_2=2(n-1)(r-1)^{3}\) in the inequalities of Lemma 4.
It’s obvious that \(-F^{'}({ {\varvec{\theta }}}^{*})\in \mathcal{L}_{mn}(q_{*},Q_{*})\) where,
Note that
where \(c_{11},c_{12}\) are constants. \(\square \)
We present several results that we will use in the following lemmas. Recall that a random variable X is sub-exponential with parameter \(\kappa > 0\) (Vershynin R 2012) if
and sub-exponential random variables satisfy the concentration inequality.
Theorem 3
(Corollary 5.17 in Vershynin R (2012)). Let \(X_{1},\ldots ,X_{n}\) be independent centered variables, and suppose each \(X_{i}\) is sub-exponential with parameter \(\kappa \). Then for every \(\epsilon >0\),
where \(\gamma >0\) is an absolute constant.
Note that if X is a sub-exponential random variable with parameter X, then the centered random variable \(X-\mathbb {E}[X]\) is also sub-exponential with parameter \(2 \kappa \). This follows from the triangle inequality applied to the p-norm, followed by Jensen’s inequality for \(p \ge 1\):
Lemma 6
Let X be a discrete Laplace random variable with the probability distribution
Then X is sub-exponential with parameter \(2( \log \frac{1}{\lambda } )^{-1}\).
Proof
Note that
It follows that
\(\square \)
The following lemma assures that condition (A5) hold with a large probability.
Lemma 7
Let \(\kappa _{mn}=2(r-1)(-\log \lambda _{mn})^{-1}=4(r-1)^{2}/\epsilon _{mn}\), where \(\lambda _{mn} \in (0,1)\). We have
Proof
By Hoeffding’s inequality (Hoeffding 1963), and \(m<n\) we have
Therefore,
Similarly, we have
Consequently,
Then, we have
Note that \(\{e_i^+\}_{i=1}^n\) and \(\{ e_i^- \}_{i=1}^n\) are independently discrete Laplace random variables and sub-exponential with the same parameter \(\kappa _{mn}\) by Lemma 6. By the concentration inequality in Theorem 3, we have
and
where \(\gamma \) is an absolute constant appearing in the concentration inequality. So, with probability at least \(1-4n/(n-1)^2-2/n\), we have
Similarly, with probability at least \(1-4n/(n-1)^2-2/n\), we have
Let A and B be the events:
Consequently, as n goes to infinity, we have
This completes the proof. \(\square \)
It can be easily checked that \(-F'(\theta )\in \mathcal {L}_n(m, M)\), (Zhang et al. 2016) where \(M=(r-1)^2/2\) and \(m= 1/2( 1+ e^{ 2\Vert \theta \Vert _\infty })^2\). We are now ready to present the proof of Theorem 1.
Proof of Theorem 1
Assume that condition (A5) holds. Recall that the Newton’s iterates in Lemma 4, \( {\varvec{\theta }}^{(k+1)}={\varvec{\theta }}^{(k)}-[F^{'}( {\varvec{\theta }}^{(k)})]^{-1}F( {\varvec{\theta }}^{(k)})\) with \( {\varvec{\theta }}^{(0)}= {\varvec{\theta }}^{*}.\) If \( {\varvec{\theta }}\in \Omega ( {\varvec{\theta }}^{*},2\xi )\), then \(-F^{'}({ {\varvec{\theta }}}^{*})\in {{\mathcal {L}}}_{mn}(q,Q)\) with
To apply Lemma 4, we need to calculate r and \(\rho r\) in this theorem. Let
By Lemma 7, we have
By Lemma 2,
By Lemma (5), we have
By Lemma (5) and condition (A5), for sufficient small \(\xi \),
Note that if \((1+\kappa ) e^{12\Vert \theta ^*\Vert _\infty } = o( (n/\log n)^{1/2} )\), then \(\xi =o(1)\), and \(\rho \xi \rightarrow 0\) as \(n\rightarrow \infty \). Consequently, there exists N, when \(n\ge N\), \(\rho \xi <\frac{1}{2}\), by Lemma 4, \(\lim _{n\rightarrow \infty }{\widehat{{\varvec{\theta }}}}^{(n)}\) exists. Denote the limit as \(\widehat{\varvec{\theta }}\), then it satisfies
By Lemma 7, condition (A5) holds with probability one, thus the above inequality also holds with probability one. The uniqueness of the parameter estimator comes from Proposition 5 in Yan et al. (2016) of Sect. 5. \(\square \)
1.2 Appendix B: Proof for Theorem 2
We first present one proposition. Since \(d_i=\sum ^{n}_{j= 1}a_{i,j}\) and \(b_j=\sum ^{m}_{ j= 1}a_{i,j}\) are sums of n and m independent random variables, by the central limit theorem for the bounded case in Loeve (1977), p. 289, we know that \({v_{i,i}}^{-1/2}(d_i-\mathbb {E}(d_i))\) and \({v_{m+j,m+j}}^{-1/2}(b_j-\mathbb {E}(b_j))\) are asymptotically standard normal if \(v_{i,i}\), \(v_{m+j,m+j}\) diverges. Note that
Following Yan (2020), we have the following proposition.
Proposition 1
Let \(\kappa _{mn}=2(r-1)(-\log \lambda _{mn})^{-1}\), where \(\lambda _{mn}=\exp (-\epsilon _{mn}/2(r-1))\).
-
(i)
If \(\kappa _{mn} (\log n)^{1/2} e^{2\Vert \theta ^*\Vert _\infty }=o(1)\) and \(e^{\Vert \theta ^*\Vert _\infty }=o( n^{1/2} )\), then for any fixed \(k \ge 1\), as \(n\rightarrow \infty \), the vector consisting of the first k elements of \(S({\tilde{g}} - \mathbb {E}g )\) is asymptotically multivariate normal with mean zero and covariance matrix given by the upper left \(k \times k\) block of S.
-
(ii)
Let
$$\begin{aligned} s_{mn}^2=\mathrm {Var}\left( \sum _{i=1}^m e_i^+ - \sum _{i=1}^{n-1} e_i^-\right) = (m+n-1)\frac{ 2\lambda _{mn}}{ (1-\lambda _{mn})^2}. \end{aligned}$$Assume that \(s_{mn}/v_{m+n,m+n}^{1/2} \rightarrow c\) for some constant c. For any fixed \(k \ge 1\), the vector consisting of the first k elements of \(S({\tilde{g}} - \mathbb {E}g )\) is asymptotically k-dimensional multivariate normal distribution with mean \({\mathbf {0}}\) and covariance matrix
$$\begin{aligned} \mathrm {diag}\left( \frac{1}{v_{1,1}}, \ldots , \frac{1}{v_{k,k}}\right) + \left( \frac{1}{v_{m+n,m+n}} + \frac{s_{mn}^2}{v_{m+n,m+n}^2}\right) {\mathbf {1}}_k {\mathbf {1}}_k^\top , \end{aligned}$$where \({\mathbf {1}}_k\) is a k-dimensional column vector with all entries 1.
To complete the proof of Theorem 2, we need two lemmas as follows.
Lemma 8
Let \(R=V^{-1}-S\) and \(U=Cov[R\{{\mathbf {g}}-\mathbb {E}{\mathbf {g}}\}]\). Then
Proof
Note that
where I is a \((m+n-1)\times (m+n-1)\) identity matrix, and by Yan et al. (2016), we have
Thus,
\(\square \)
Lemma 9
Let \(\kappa _{mn}=2(r-1)(-\log \lambda _{mn})^{-1}=4(r-1)^2\epsilon _{mn}^{-1}\). If \((1+\kappa _{mn})^2 e^{ 18\Vert \theta ^*\Vert _\infty } = o( (n/\log n)^{1/2} )\), then for any i,
Proof
The proof is very similar to the proof of Lemma 9 in Yan et al. (2016). It only requires verification of the fact that all the steps hold by replacing g with \({\tilde{g}}\). \(\square \)
Proof
By Lemma 9 and noting that \(V^{-1}=S+R\), we have
By (A6) and \(\Vert {\tilde{g}} - g \Vert _\infty = O_p( \kappa _{mn} \sqrt{\log n})\), we have
where
If \(\kappa _{mn} e^{6\Vert \theta ^*\Vert _\infty } = o( (n/\log n)^{1/2})\), then \([R \{ {\widetilde{g}} - g \}]_i=o_p(n^{-1/2})\). Combing Lemma 8, it yields
Consequently,
Theorem 2 immediately follows from Proposition 1. \(\square \)
Rights and permissions
About this article
Cite this article
Luo, J., Liu, T. & Wang, Q. Affiliation weighted networks with a differentially private degree sequence. Stat Papers 63, 367–395 (2022). https://doi.org/10.1007/s00362-021-01243-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-021-01243-2