Abstract
We introduce a technique for inducing a refinement of the set of part of speech tags related to verbs. We cluster verbs according to their syntactic behavior in a dependency structure setting. The set of clusters is automatically determined by means of a quality measure over the probabilistic automata that describe words in a bilexical grammar. Each of the resulting clusters defines a new part of speech tag. We try out the resulting tag set in a state-of-the art phrase structure parser and we show that the induced part of speech tags significantly improve the accuracy of the parser.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Eisner, J.: Bilexical grammars and a cubictime probabilistic parser. In: Proceedings of IWPT 2004 (1994)
Marcus, M., Santorini, B.: Building a large annotated corpus of English: The Penn treebank. Computational Linguistics 19, 313–330 (1993)
Collins, M.: Three generative, lexicalized models for statistical parsing. In: ACL 1997 (1997)
Bikel, D.: On the Parameter Space of Generative Lexicalized Statistical Parsing Models. PhD thesis, University of Pennsylvania (2004)
Thollard, F., Dupont, P., de la Higuera, C.: Probabilistic DFA inference using Kullback-Leibler divergence and minimality. In: Proc. ICML, Stanford (2000)
Infante-Lopez, G., de Rijke, M.: Alternative approaches for generating bodies of grammar rules. In: Proc. 42nd ACL (2004)
Gen, M., Cheng, R.: Genetic Algorithms and Engineering Design. John Wiley, Chichester (1997)
Infante-Lopez, G.: Two-Level Probabilistic Grammars for Natural Language Parsing. PhD thesis, Universiteit van Amsterdam (2005)
Klein, D., Manning, C.: Accurate unlexicalized parsing. In: Proc. 41st ACL (2003)
Matsuzaki, T., M.Y.: Probabilistic cfg with latent annotations. In: ACL (2005)
Petrov, S., Barrett, L., Klein, D.: Learning accurate, compact, and interpretable tree annotation. In: ACL (2006)
Charniak, E.: A maximum-entropy-inspired parser. In: NAACL 2000 (2000)
Mohri, M., Roark, B.: Probabilistic context-free grammar induction based on structural zeros. In: HLT-NAACL 2006 (2006)
Klein, D., Manning, C.: Distributional phrase structure induction. In: CoNLL 2001 (2001)
Schone, P., Jurafsky, D.: Language-independent induction of part of speech class labels using only language universals. In: IJCAI 2001 (2001)
Henderson, J., Titor, I.: Data-defined kernels for parse reranking derived from probabilistic models. In: ACL (2005)
Osborne, M.: Shallow parsing as part-of-speech tagging. In: Conll. (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Domínguez, M.A., Infante-Lopez, G. (2008). Searching for Part of Speech Tags That Improve Parsing Models. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-85287-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)