Abstract
Digitizing documents is becoming increasingly popular in various fields, and training computers to understand the contents of digitized documents is of growing interest. Since the early 90’s, research of natural language processing using large annotated corpora such as the Penn TreeBank has developed. Applying the methods of corpus-based research, we built a syntactically annotated corpus of theorem descriptions, using a book of set theory, and extracted a grammar model of theorems from the obtained corpus, as the first step to understanding mathematical documents by computer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Inoue, K., Miyazaki, R., Suzuki, M.: Optical Recognition of Printed Mathematical Documents. Proceedings of the Third Asian Technology Conference in Mathematics. Springer-Verlag. (1998) 280–289
Eto, Y., Suzuki, M.: Mathematical Formula Recognition Using Virtual Link Network. Proceedings of the Sixth International Conference on Document Analysis and Recognition. Seattle. IEEE Computer Society Press. (2001) 430–437
Michler, G.: A prototype of a combined digital and retrodigitaized searchable mathematical journal. Lecture Notes in Control and Infomation Sciences. 249 (1999) 219–235
Michler, G.: Report on the retrodigitiization project “Archiv der Mathemark”. Archiv der Mathemark. 77 (2001) 116–128
Marcus, M., et al.: Building a large annotated corpus of English:the Penn Tree-Bank. In the distributed Penn TreeBank Project CD-ROM. Linguistic Data Consortium. University of Pennsylvania
Sekine, S., Grishman, R.: A Corpus-based Probabilistic Grammar with Only Two Non-terminals. Fourth International Workshop on Parsing Technology. (1995)
J Cameron, P.: Sets, Logic and Categories. Springer. (1999)
Rotman, J.: Galois theory(2nd ed.). Springer. (1998)
Hodges, W.: A shorter model theory. Cambridge University Press. (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baba, Y., Suzuki, M. (2003). An Annotated Corpus and a Grammar Model of Theorem Description. In: Asperti, A., Buchberger, B., Davenport, J.H. (eds) Mathematical Knowledge Management. MKM 2003. Lecture Notes in Computer Science, vol 2594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36469-2_8
Download citation
DOI: https://doi.org/10.1007/3-540-36469-2_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00568-1
Online ISBN: 978-3-540-36469-6
eBook Packages: Springer Book Archive