Using Entities in Knowledge Graph Hierarchies to Classify Sensitive Information

Frayling, Erlend; Macdonald, Craig; McDonald, Graham; Ounis, Iadh

doi:10.1007/978-3-031-13643-6_10

Erlend Frayling¹⁷,
Craig Macdonald¹⁷,
Graham McDonald¹⁷ &
…
Iadh Ounis¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13390))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

1129 Accesses

Abstract

Text classification has been shown to be effective for assisting human reviewers to identify sensitive information when reviewing documents to release to the public. However, automatically classifying sensitive information is difficult, since sensitivity is often due to contextual knowledge that must be inferred from the text. For example, the mention of a specific named entity is unlikely to provide enough context to automatically know if the information is sensitive. However, knowing the conceptual role of the entity, e.g. if the entity is a politician or a terrorist, can provide useful additional contextual information. Human sensitivity reviewers use their prior knowledge of such contextual information when making sensitivity judgements. However, statistical or contextualized classifiers cannot easily resolve these cases from the text alone. In this paper, we propose a feature extraction method that models entities in a hierarchical structure, based on the underlying structure of Wikipedia, to generate a more informative representation of entities and their roles. Our experiments, on a test collection containing real-world sensitivities, show that our proposed approach results in a significant improvement in sensitivity classification performance (2.2% BAC, McNemar’s Test, p < 0.05) compared to a text based sensitivity classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Berardi, G., Esuli, A., Macdonald, C., Ounis, I., Sebastiani, F.: Semi-automated text classification for sensitivity identification. In Proceedings of CIKM (2015)
Google Scholar
Cormack, G.V., Grossman, M.R.: Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In Proceedings of SIGIR (2014)
Google Scholar
Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In Proceedings of I-SEMANTICS (2013)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Flisar, J., Podgorelec, V.: Improving short text classification using information from DBpedia ontology. Fundamenta Informaticae 172(3), 261–297 (2020)
Article MathSciNet Google Scholar
Kapanipathi, P., Jain, P., Venkataramani, C., Sheth, A.: User interests identification on Twitter using a hierarchical knowledge base. In: Proceedings of ESWC (2014)
Google Scholar
Liu, B., Zuccon, G., Hua, W., Chen, W.: Diagnosis ranking with knowledge graph convolutional networks. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12656, pp. 359–374. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72113-8_24
Chapter Google Scholar
McDonald, G., Macdonald, C., Ounis, I.: Using part-of-speech n-grams for sensitive-text classification. In: Proceedings of ICTIR (2015)
Google Scholar
McDonald, G., Macdonald, C., Ounis, I.: Enhancing sensitivity classification with semantic features using word embeddings. In: Proceedings of ECIR (2017)
Google Scholar
McDonald, G., Macdonald, C., Ounis, I.: Towards maximising openness in digital sensitivity review using reviewing time predictions. In: Proceedings of ECIR (2018)
Google Scholar
McDonald, G., Macdonald, C., Ounis, I., Gollins, T.: Towards a classifier for digital sensitivity review. In: Proceedings of ECIR (2014)
Google Scholar
Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In: Proceedings of BioNLP Workshop and Shared Task (2019)
Google Scholar
Poerner, N., Waltinger, U., Schütze, H.: E-BERT: efficient-yet-effective entity embeddings for BERT. arXiv preprint arXiv:1911.03681 (2019)

Download references

Acknowledgements

E. Frayling, C. Macdonald and I. Ounis acknowledge the support of Innovate UK through a Knowledge Transfer Partnership (# 12040). All authors thank SVGC Ltd. for their support.

Author information

Authors and Affiliations

Univerity of Glasgow, Glasgow, G12 8QQ, UK
Erlend Frayling, Craig Macdonald, Graham McDonald & Iadh Ounis

Authors

Erlend Frayling
View author publications
You can also search for this author in PubMed Google Scholar
Craig Macdonald
View author publications
You can also search for this author in PubMed Google Scholar
Graham McDonald
View author publications
You can also search for this author in PubMed Google Scholar
Iadh Ounis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Erlend Frayling or Craig Macdonald .

Editor information

Editors and Affiliations

University of Bologna, Forlì, Italy
Alberto Barrón-Cedeño
University of Padua, Padova, Italy
Giovanni Da San Martino
University of Bologna, Bologna, Italy
Mirko Degli Esposti
Instituto di Scienza e Tecnologie dell' Informazione “Alessandro Faedo”, Pisa, Italy
Fabrizio Sebastiani
University of Glasgow, Glasgow, UK
Craig Macdonald
University Milano-Bicocca, Milan, Italy
Gabriella Pasi
TU Wien, Vienna, Austria
Allan Hanbury
Leipzig University, Leipzig, Germany
Martin Potthast
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Frayling, E., Macdonald, C., McDonald, G., Ounis, I. (2022). Using Entities in Knowledge Graph Hierarchies to Classify Sensitive Information. In: Barrón-Cedeño, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2022. Lecture Notes in Computer Science, vol 13390. Springer, Cham. https://doi.org/10.1007/978-3-031-13643-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-13643-6_10
Published: 25 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13642-9
Online ISBN: 978-3-031-13643-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Entities in Knowledge Graph Hierarchies to Classify Sensitive Information