Skip to main content

Using Entities in Knowledge Graph Hierarchies to Classify Sensitive Information

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13390))

  • 1129 Accesses

Abstract

Text classification has been shown to be effective for assisting human reviewers to identify sensitive information when reviewing documents to release to the public. However, automatically classifying sensitive information is difficult, since sensitivity is often due to contextual knowledge that must be inferred from the text. For example, the mention of a specific named entity is unlikely to provide enough context to automatically know if the information is sensitive. However, knowing the conceptual role of the entity, e.g. if the entity is a politician or a terrorist, can provide useful additional contextual information. Human sensitivity reviewers use their prior knowledge of such contextual information when making sensitivity judgements. However, statistical or contextualized classifiers cannot easily resolve these cases from the text alone. In this paper, we propose a feature extraction method that models entities in a hierarchical structure, based on the underlying structure of Wikipedia, to generate a more informative representation of entities and their roles. Our experiments, on a test collection containing real-world sensitivities, show that our proposed approach results in a significant improvement in sensitivity classification performance (2.2% BAC, McNemar’s Test, p < 0.05) compared to a text based sensitivity classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Berardi, G., Esuli, A., Macdonald, C., Ounis, I., Sebastiani, F.: Semi-automated text classification for sensitivity identification. In Proceedings of CIKM (2015)

    Google Scholar 

  2. Cormack, G.V., Grossman, M.R.: Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In Proceedings of SIGIR (2014)

    Google Scholar 

  3. Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In Proceedings of I-SEMANTICS (2013)

    Google Scholar 

  4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  5. Flisar, J., Podgorelec, V.: Improving short text classification using information from DBpedia ontology. Fundamenta Informaticae 172(3), 261–297 (2020)

    Article  MathSciNet  Google Scholar 

  6. Kapanipathi, P., Jain, P., Venkataramani, C., Sheth, A.: User interests identification on Twitter using a hierarchical knowledge base. In: Proceedings of ESWC (2014)

    Google Scholar 

  7. Liu, B., Zuccon, G., Hua, W., Chen, W.: Diagnosis ranking with knowledge graph convolutional networks. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12656, pp. 359–374. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72113-8_24

    Chapter  Google Scholar 

  8. McDonald, G., Macdonald, C., Ounis, I.: Using part-of-speech n-grams for sensitive-text classification. In: Proceedings of ICTIR (2015)

    Google Scholar 

  9. McDonald, G., Macdonald, C., Ounis, I.: Enhancing sensitivity classification with semantic features using word embeddings. In: Proceedings of ECIR (2017)

    Google Scholar 

  10. McDonald, G., Macdonald, C., Ounis, I.: Towards maximising openness in digital sensitivity review using reviewing time predictions. In: Proceedings of ECIR (2018)

    Google Scholar 

  11. McDonald, G., Macdonald, C., Ounis, I., Gollins, T.: Towards a classifier for digital sensitivity review. In: Proceedings of ECIR (2014)

    Google Scholar 

  12. Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In: Proceedings of BioNLP Workshop and Shared Task (2019)

    Google Scholar 

  13. Poerner, N., Waltinger, U., Schütze, H.: E-BERT: efficient-yet-effective entity embeddings for BERT. arXiv preprint arXiv:1911.03681 (2019)

Download references

Acknowledgements

E. Frayling, C. Macdonald and I. Ounis acknowledge the support of Innovate UK through a Knowledge Transfer Partnership (# 12040). All authors thank SVGC Ltd. for their support.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Erlend Frayling or Craig Macdonald .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Frayling, E., Macdonald, C., McDonald, G., Ounis, I. (2022). Using Entities in Knowledge Graph Hierarchies to Classify Sensitive Information. In: Barrón-Cedeño, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2022. Lecture Notes in Computer Science, vol 13390. Springer, Cham. https://doi.org/10.1007/978-3-031-13643-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-13643-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13642-9

  • Online ISBN: 978-3-031-13643-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics