Skip to main content

Precision Threshold and Noise: An Alternative Framework of Sensitivity Measures

  • Conference paper
  • First Online:
Privacy in Statistical Databases (PSD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9867))

Included in the following conference series:

  • 787 Accesses

Abstract

At many national statistical organizations, linear sensitivity measures such as the prior-posterior and dominance rules provide the basis for assessing statistical disclosure risk in tabular magnitude data. However, these measures are not always well-suited for issues present in survey data such as negative values, respondent waivers and sampling weights. In order to address this gap, this paper introduces the Precision Threshold and Noise framework, defining a new class of sensitivity measures. These measures expand upon existing theory by relaxing certain restrictions, providing a powerful, flexible and functional tool for national statistical organizations in the assessment of disclosure risk.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Many NSOs have developed software to assess disclosure risk in tabular data; for examples please see [3, 8]. For a detailed description of the prior posterior and dominance rules, we refer the reader to [4]; Chap. 4 gives an in-depth description of the rules, with examples. The expression of these rules as linear measures is given in [1] and [7, Chap. 6].

  2. 2.

    All theorem proofs appear in the Appendix.

References

  1. Cox, L.H.: Disclosure risk for tabular economic data. In: Doyle, P., Lane, J., Theeuwes, J., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access, Chap. 8. North-Holland, Amsterdam (2001)

    Google Scholar 

  2. Daalmans, J., de Waal, T.: An improved formulation of the disclosure auditing problem for secondary cell suppression. Trans. Data Priv. 3(3), 217–251 (2010)

    MathSciNet  Google Scholar 

  3. Hundepool, A., van de Wetering, A., Ramaswamy, R., de Wolf, P., Giessing, S., Fischetti, M., Salazar-Gonzalez, J., Castro, J., Lowthian, P.: \(\tau \)-argus users manual. Version 3.5. Essnet-project (2011)

    Google Scholar 

  4. Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., De Wolf, P.P.: Statistical Disclosure Control. John Wiley & Sons, Hoboken (2012)

    Book  Google Scholar 

  5. O’Malley, M., Ernst, L.: Practical considerations in applying the pq-rule for primary disclosure suppressions. http://www.bls.gov/osmr/abstract/st/st070080.htm

  6. Tambay, J.L., Fillion, J.M.: Strategies for processing tabular data using the g-confid cell suppression software. In: Joint Statistical Meetings, Montréal, Canada, pp. 3–8 (2013)

    Google Scholar 

  7. Willenborg, L., De Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics, vol. 155. Springer, New York (2001)

    MATH  Google Scholar 

  8. Wright, P.: G-Confid: Turning the tables on disclosure risk. Joint UNECE/Eurostat work session on statistical data confidentiality. http://www.unece.org/stats/documents/2013.10.confidentiality.html

Download references

Acknowledgments

The author is very grateful to Peter Wright, Jean-Marc Fillion, Jean-Louis Tambay and Mark Stinner for their thoughtful feedback on this paper and the PTN framework in general. Additionally, the author thanks Peter Wright and Karla Fox for supporting the author’s interest in this field of research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Darren Gray .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Theorem 1

Proof

We start with the first statement, assuming \(\tau _1 \ne \sigma _1\). As \(f_t(\tau _1)\ge f_t (t) \) for any t and \(f_s (\sigma _1 ) \ge f_s (s) \) for any s, it should be clear from (6) that

$$\begin{aligned} S(\tau _1,\sigma _1) \ge S(t,s) \end{aligned}$$

for any pair (t, s), proving the first part of the theorem.

For the second part, we begin with the condition that \(\tau _1 = \sigma _1 \). Now, suppose \((\tau _1,\sigma _2 )\) is not maximal. Then there exists maximal \( (\tau _i,\sigma _j ) \) where \( (i,j) \ne (1,2) \) such that \( f_t (\tau _i )+f_s (\sigma _j )>f_t (\tau _1 )+f_s (\sigma _2 ) \). As \( f_t (\tau _1 )\ge f_t (\tau _i ) \) by definition, it follows that \( f_s (\sigma _j )>f_s (\sigma _2 ) \) and we can conclude that \( j=1 \). Then \( (\tau _i,\sigma _j )=(\tau _i,\sigma _1 ) \) for some \( i \ne 1 \). But we know that \( f_t (\tau _2 )\ge f_t (\tau _i ) \) and so \( S(\tau _2,\sigma _1 )\ge S(\tau _i,\sigma _1 ) \) for any \( i\ne 1 \). This shows that if \( (\tau _1,\sigma _2 ) \) is not maximal, \( (\tau _2,\sigma _1 ) \) must be, completing the proof.    \(\square \)

1.2 Proof of Theorem 2

Proof

When all respondents are self-aware, \(f_t=PT+N\) and \(f_s=N\), and consequently any ordering that results in non-ascending PT, N also results in non-ascending \(f_t\), \(f_s\). Setting \(\tau =\sigma =\eta \) and applying Theorem 1, we conclude that one of \((\eta _1,\eta _2)\) or \((\eta _2,\eta _1)\) is maximal. From (6) we can see that

$$S(\eta _1,\eta _2)-S(\eta _2,\eta _1) =PT(\eta _1)-PT(\eta _2) \ge 0$$

showing \(S(\eta _1,\eta _2) \ge S(\eta _2,\eta _1)\) and \((\eta _1,\eta _2)\) is maximal.   \(\square \)

1.3 Proof of Theorem 3

Proof

The proof is self-evident for cells with two or fewer respondents, so we will assume there are at least three. Applying Theorem 1 and noting \(f_s=N\) we can conclude that there exists a maximal pair of the form \((\eta _i,\eta _j)\) for \(j \le 2\). As this pair is maximal it can be used to calculated cell sensitivity:

$$ S^1_1=S(\eta _i,\eta _j)=PT(\eta _i)-\sum _{r \ne i,j} N(\eta _r) $$

As \(j \le 2\), if \(i \ge 3 \) then exactly one of \(N(\eta _1)\) or \(N(\eta _2)\) is included in the summation above. Both of these are \(\ge N(\eta _i)\) by ordering \(\eta \), which is \(\ge PT(\eta _i)\) by assumption. This means \(S^1_1 < 0\) and the cell is safe. Conversely, if the cell is sensitive, there must exist a maximal pair of the form \((\eta _i,\eta _j)\) with both \(i,j \le 2\), completing the proof.    \(\square \)

1.4 Interpreting Arbitrary Linear Sensitivity Measures in \(S^{n_t}_{n_s}\) Form

All linear sensitivity measures of the form \(\sum _r \alpha _r x_r\) can be expressed in PTN form, provided they satisfy the following conditions:

  • Finite number of non-negative coefficients

  • All positive coefficients have the same value, say \(\alpha _+\)

  • All negative coefficients have the same value, say \(\alpha _-\).

Assuming these conditions are met, an equivalent PTN sensitivity measure can be defined as follows:

  • Set \(n_t\) equal to the number of positive coefficients

  • Set \(n_s\) equal to the number of coefficients equal to zero

  • Set \(PT(r)=\alpha _+ x_r\) for all r

  • Set \(N(r)=|\alpha _-| x_r\) and \(SN(r)=0\) for all r

We show that the resulting PTN cell sensitivity measure is equivalent to \(\sum _r \alpha _r x_r\) by first writing (7) as follows:

$$\begin{aligned} S(T,S) = \sum _{t \in T}\left( PT(t)+N(t) \right) +\sum _{s \in S} \left( N(s) - SN(s) \right) - \sum _{r } N(r) \end{aligned}$$
(9)

Substituting in the appropriate PTN values gives

(10)

It is easy to see that T and S should be selected from the largest \(n_t + n_s\) respondents to maximize S(T, S). If they are already indexed in non-ascending order, then sensitivity is maximized when \(T=\left\{ 1,\ldots ,n_t \right\} \) and \(S= \left\{ n_t+1,\ldots ,n_t+n_s \right\} \). Then cell sensitivity is given by

$$\begin{aligned} S^{n_t}_{n_s} = \sum _{r =1}^{n_t} \alpha _+ x_r -\sum _{r > n_t + n_s} |\alpha _-| x_r \end{aligned}$$
(11)

which is exactly \(\sum _r \alpha _r x_r\).

1.5 Proof of Theorem 4

We begin with a simple lemma:

Lemma 1

Let T and S be non-intersecting sets of respondents. Let k be a respondent in neither, and assume \(SN(k) \le N(k)\). Then

(12)

Proof

We write (7) in maximal form, substituting in the target and suspect functions:

$$\begin{aligned} S(T,S) = \sum _{t \in T} f_t(t) + \sum _{s \in S} f_s(s)- \sum _{r} N(r) \end{aligned}$$
(13)

Then \(S(T, S \cup k) - S(T,S) = f_s(k)\). As \(SN(k) \le N(k)\) by assumption (we expect this to be true anyway, as a respondent should never know less about their own contribution than the general public), \(f_s \ge 0\) proves the first inequality. The second inequality holds because \(f_t \ge f_s\) for all respondents, including k.    \(\square \)

With this lemma, the proof of Theorem 4 is almost trivial:

Proof

Let (T, S) be maximal with respect to \(S^{n_t}_{n_s}\). We know there exists at least one respondent \(k \notin T \cup S\), and by Lemma 1, \( S(T, S) \le S(T, S \cup k)\), proving that\( S^{n_t}_{n_s} \le S^{n_t}_{n_s+1}\).

For the second inequality, we note that any set pair that is maximal with respect to \(S^{n_t}_{n_s+1}\) can be written in the form \((T, S \cup k)\) for some T of size \(n_t\), S of size \(n_s\) and single respondent k. Once again applying Lemma 1 we see that \(S(T, S \cup k) \le S(T \cup k , S)\) and consequently \(S^{n_t}_{n_s+1} \le S^{n_t+1}_{n_s}\).    \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Gray, D. (2016). Precision Threshold and Noise: An Alternative Framework of Sensitivity Measures. In: Domingo-Ferrer, J., Pejić-Bach, M. (eds) Privacy in Statistical Databases. PSD 2016. Lecture Notes in Computer Science(), vol 9867. Springer, Cham. https://doi.org/10.1007/978-3-319-45381-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45381-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45380-4

  • Online ISBN: 978-3-319-45381-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics