Abstract
By combining both semantically annotated documents and semantically annotated services, it is possible for digital solutions to automatically retrieve and assign documents not only to their own services but also to those provided by others, thus improving and optimizing the experience of its users. Most of the information exchanged in and between services is still either in paper form or over email and is mostly unstructured and in lack of any form of annotation. Manual and semi-automatic approaches are not suitable to deal with the huge amounts of heterogeneous and constantly flowing data existent in this scenario, thus raising the issue of automatic annotation. In this paper, three data mining algorithms are used to annotate a set of documents and their results compared to manually provided annotations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
McIlraith, S.A., Son, T.C., Zeng, H.: Semantic web services. IEEE Intell. Syst. 16(2), 46–53 (2001)
Uren, V., et al.: Semantic annotation for knowledge management: requirements and a survey of the state of the art. In: Web Semantics: science, services and agents on the World Wide Web 4.1, pp. 14–28 (2006)
Abioui, H., et al.: Semantic annotation of documents: a comparative study. Int. J. Adv. Eng. Manage. Sci. 2(11)
Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: an introduction and a survey of current approaches. J. Inf. Sci 36, 306–323 (2010)
Pech, F., et al.: Semantic annotation of unstructured documents using concepts similarity. Sci. Program. 2017, 10 (2017)
Oliveira, P., Rocha, J.: Semantic annotation tools survey. In: 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM). IEEE (2013)
Corcho, O.: Ontology based document annotation: trends and open research problems. Int. J. Metadata Semant. Ontol. 1(1), 47–57 (2006)
Dou, D., Wang, H., Liu, H.: Semantic data mining: a survey of ontology-based approaches. In: Semantic Computing (ICSC). IEEE (2015)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Nadeau, D., Satoshi, S.: A survey of named entity recognition and classification. Lingvisticae Investig. 30, 3–26 (2007)
Allahyari, M., Kochut, K.J., Janik, M.: Ontology-based text classification into dynamically defined topics. In: 2014 IEEE International Conference on Semantic Computing (ICSC). IEEE (2014)
Martine TLO. https://www.ics.forth.gr/isl/MarineTLO/. Accessed 20 Nov 2018
Martine Top Level Ontology Specification. https://www.ics.forth.gr/isl/ontology/content-MTLO/html/index.html. Accessed 20 Nov 2018
Martine TLO Warehouse. https://www.ics.forth.gr/isl/MarineTLO/#warehouse. Accessed 20 Nov 2018
LingPipe. http://alias-i.com/lingpipe. Accessed 30 Nov 2018
Teahan, W.J.: Text classification and segmentation using minimum cross-entropy. In: Content-Based Multimedia Information Access, vol. 2 (2000)
Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 2(1), 86–97 (2012)
Cimiano, P., Völker, J.: Towards large-scale, open-domain and ontology-based named entity classification. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP) (2005)
Alvarado, A.B.R., Arevalo, I.L., Leal, E.T.: The acquisition of axioms for ontology learning using named entities. IEEE Lat. Am. Trans. 14(5), 2498–2503 (2016)
Acknowledgements
The present work has been developed under the EUREKA - ITEA2 Project INVALUE (ITEA-13015), INVALUE Project (ANI|P2020 17990), and has received funding from FEDER Funds through NORTE2020 program and from National Funds through FCT under the project UID/EEA/00760/2013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Canito, A., Marreiros, G., Corchado, J.M. (2019). Automatic Document Annotation with Data Mining Algorithms. In: Rocha, Á., Adeli, H., Reis, L., Costanzo, S. (eds) New Knowledge in Information Systems and Technologies. WorldCIST'19 2019. Advances in Intelligent Systems and Computing, vol 930. Springer, Cham. https://doi.org/10.1007/978-3-030-16181-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-16181-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16180-4
Online ISBN: 978-3-030-16181-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)