Abstract
Current research in large-scale information management systems is focused on unsupervised methods and techniques for information processing. Such approaches support scalability in regard to present-day exponential growth in information processing needs. In this paper we focus on the problem of automated quality evaluation of a completely unsupervised metadata extraction process in the Digital Libraries domain. In particular, we investigate resulting metadata quality applying specific extraction methodology for scientific documents. We propose and discuss precise quality metrics and measure the dynamics of such quality metrics as a function of the extracted information from the repository and size of the repository.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Takasu, A.: Bibliographic attribute extraction from erroneous references based on a statistical model. In: Proceedings of JCDL, IEEE, Los Alamitos (2003)
Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proceedings of JCDL, pp. 37–48. IEEE, Los Alamitos (2003)
Ivanyukovich, A., Marchese, M., Giunchiglia, F.: Sciencetreks: an autonomous digital library system. In: ICSD (2007)
Garfield, E.: Citation Indexing: Its Theory and Application in Science, Technology, and Humanities. Wiley, Chichester (1979)
Kiyavitskaya, N., Zeni, N., Cordy, J.R., Mich, L., Mylopoulos, J.: Semi-automatic semantic annotations for next generation information systems. In: Proceedings of AISE. LNCS, Springer, Heidelberg (2006)
Ivanyukovich, A., Marchese, M.: Unsupervised metadata extraction in scientific digital libraries using a-priori domain-specific knowledge. In: SWAP (2006)
Ivanyukovich, A., Marchese, M., Reuther, P.: Assessing quality dynamics in unsupervised metadata extraction for digital libraries. Technical Report DIT-07-035, University of Trento (2007)
Reuther, P., Walter, B.: Survey on test collections and techniques for personal name matching. IJMSO 1(2), 89–99 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ivanyukovich, A., Marchese, M., Reuther, P. (2007). Assessing Quality Dynamics in Unsupervised Metadata Extraction for Digital Libraries. In: Kovács, L., Fuhr, N., Meghini, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2007. Lecture Notes in Computer Science, vol 4675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74851-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-74851-9_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74850-2
Online ISBN: 978-3-540-74851-9
eBook Packages: Computer ScienceComputer Science (R0)