Skip to main content

SPARK-Based Partitioning Algorithm for k-Anonymization of Large RDFs

  • Conference paper
  • First Online:
Advanced Multimedia and Ubiquitous Engineering (MUE 2019, FutureTech 2019)

Abstract

Privacy protection for resource description framework data is very important because RDF (i.e., linked data) is widely used in published data format in many areas, including government open data, health-care for individuals, and social relationships. As data can include private information belonging to individuals or companies and can make private information available to third parties, there are several anonymization models provided for preserving privacy in practice. k-anonymity has thus gained attention in research. Recently, several RDF anonymization models have been proposed. However, current approaches focus on a model and a metric for measuring information loss but do not consider large-scale RDF data. In this paper, we propose an efficient anonymizing method for large-scale RDF data. We develop a greedy partitioning algorithm (i.e., SPARK) for RDF anonymization. SPARK is a leading platform for big data processing. The results of experiments on synthetic datasets demonstrate that our proposed method requires less running time than previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://jena.apache.org/.

  2. 2.

    http://swat.cse.lehigh.edu/projects/lubm/.

References

  1. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 10(05), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  2. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. In: ICDE 2006, p. 24. IEEE (2006)

    Google Scholar 

  3. Radulovic, F., Garcia Castro, R., Gomez-Perez, A.: Towards the anonymization of RDF data (2015)

    Google Scholar 

  4. Heitmann, B., Hermsen, F., Decker, S.: k-RDF-neighbourhood anonymity: combining structural and attribute-based anonymization for linked data. In: PrivOn@ ISWC (2017)

    Google Scholar 

  5. Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 106–115. IEEE (2007)

    Google Scholar 

  6. Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

  7. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, p. 25. IEEE (2006)

    Google Scholar 

  8. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: 21st International Conference on Data Engineering, ICDE 2005, Proceedings, pp. 217–228. IEEE (2005)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2018-01417) supervised by the IITP (Institute for Information & Communications Technology Promotion) and IITP grant funded by the Korea government (MSIP) (No. R0113-15-0005, Development of a Unified Data Engineering Technology for Largescale Transaction Processing and Real-Time Complex Analytics) and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2018R1D1A1B07048380).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong-Hyuk Im .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Temuujin, O., Jeon, M., Seo, K., Ahn, J., Im, DH. (2020). SPARK-Based Partitioning Algorithm for k-Anonymization of Large RDFs. In: Park, J., Yang, L., Jeong, YS., Hao, F. (eds) Advanced Multimedia and Ubiquitous Engineering. MUE FutureTech 2019 2019. Lecture Notes in Electrical Engineering, vol 590. Springer, Singapore. https://doi.org/10.1007/978-981-32-9244-4_41

Download citation

  • DOI: https://doi.org/10.1007/978-981-32-9244-4_41

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-32-9243-7

  • Online ISBN: 978-981-32-9244-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics