Skip to main content

Privacy Risk Assessment for Text Data Based on Semantic Correlation Learning

  • Conference paper
  • First Online:
Wireless Algorithms, Systems, and Applications (WASA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12939))

  • 1665 Accesses

Abstract

Privacy risk assessment determines the extent to which generalization and obfuscation should be applied to the sensitive data. In this paper, we propose PriTxt for evaluating the privacy risk associated with text data by exploiting the semantic correlation. Using definitions derived from the General Data Protection Regulation (GDPR), PriTxt first defines the private features that related to individual privacy. By using the word2vec algorithm, a word-embedding model is further constructed to identify the quasi-sensitive words. The privacy risk of a given text is finally evaluated by aggregating the weighted risks of the sensitive and the quasi-sensitive words in the text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, L., Yang, J., Wang, Q.: Privacy-preserving data publishing for free text Chinese electronic medical records. In: 2012 IEEE 36th Annual Computer Software and Applications Conference, pp. 567–572 (2012)

    Google Scholar 

  2. Fang, B., Jia, Y., Aiping, L.I., Jiang, R.: Privacy preservation in big data: a survey. Big Data Res. 5, 33 (2016)

    Google Scholar 

  3. Feyisetan, O., Balle, B., Drake, T., Diethe, T.: Privacy- and Utility-Preserving Textual Analysis via Calibrated Multivariate Perturbations. Association for Computing Machinery, New York (2020)

    Book  Google Scholar 

  4. Hu, K., et al.: A domain keyword analysis approach extending term frequency-keyword active index with google word2vec model. Scientometrics 114(3), 1031–1068 (2018)

    Article  Google Scholar 

  5. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  6. Orooji, M., Knapp, G.M.: A novel microdata privacy disclosure risk measure (2019)

    Google Scholar 

  7. Pellungrini, R., Monreale, A., Guidotti, R.: Privacy risk for individual basket patterns. In: ECML PKDD 2018 Workshops, pp. 141–155. Springer International Publishing, Cham (2019)

    Google Scholar 

  8. Pellungrini, R., Pappalardo, L., Pratesi, F., Monreale, A.: A data mining approach to assess privacy risk in human mobility data. ACM Trans. Intell. Syst. Technol. 9(3), 1–27 (2017)

    Article  Google Scholar 

  9. Presthus, W., Sørum, H.: Are consumers concerned about privacy? An online survey emphasizing the general data protection regulation. Procedia Comput. Sci. 138, 603–611 (2018)

    Article  Google Scholar 

  10. Torra, V.: Privacy Models and Disclosure Risk Measures, pp. 111–189. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57358-8_5

  11. Yan, Z., Li, G., Liu, J.: Private rank aggregation under local differential privacy. Int. J. Intell. Syst. 35(10), 1492–1519 (2020)

    Article  Google Scholar 

Download references

Acknowledgment

This work is supported by the Humanities and Social Sciences Planning Project of the China Ministry of Education under Grant No. 19YJAZH099.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ping Xiong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xiong, P., Liang, L., Zhu, Y., Zhu, T. (2021). Privacy Risk Assessment for Text Data Based on Semantic Correlation Learning. In: Liu, Z., Wu, F., Das, S.K. (eds) Wireless Algorithms, Systems, and Applications. WASA 2021. Lecture Notes in Computer Science(), vol 12939. Springer, Cham. https://doi.org/10.1007/978-3-030-86137-7_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86137-7_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86136-0

  • Online ISBN: 978-3-030-86137-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics