Searching for Part of Speech Tags That Improve Parsing Models

Domínguez, Martín Ariel; Infante-Lopez, Gabriel

doi:10.1007/978-3-540-85287-2_13

Martín Ariel Domínguez² &
Gabriel Infante-Lopez^2,3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5221))

Included in the following conference series:

International Conference on Natural Language Processing

1446 Accesses
2 Citations

Abstract

We introduce a technique for inducing a refinement of the set of part of speech tags related to verbs. We cluster verbs according to their syntactic behavior in a dependency structure setting. The set of clusters is automatically determined by means of a quality measure over the probabilistic automata that describe words in a bilexical grammar. Each of the resulting clusters defines a new part of speech tag. We try out the resulting tag set in a state-of-the art phrase structure parser and we show that the induced part of speech tags significantly improve the accuracy of the parser.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Eisner, J.: Bilexical grammars and a cubictime probabilistic parser. In: Proceedings of IWPT 2004 (1994)
Google Scholar
Marcus, M., Santorini, B.: Building a large annotated corpus of English: The Penn treebank. Computational Linguistics 19, 313–330 (1993)
Google Scholar
Collins, M.: Three generative, lexicalized models for statistical parsing. In: ACL 1997 (1997)
Google Scholar
Bikel, D.: On the Parameter Space of Generative Lexicalized Statistical Parsing Models. PhD thesis, University of Pennsylvania (2004)
Google Scholar
Thollard, F., Dupont, P., de la Higuera, C.: Probabilistic DFA inference using Kullback-Leibler divergence and minimality. In: Proc. ICML, Stanford (2000)
Google Scholar
Infante-Lopez, G., de Rijke, M.: Alternative approaches for generating bodies of grammar rules. In: Proc. 42nd ACL (2004)
Google Scholar
Gen, M., Cheng, R.: Genetic Algorithms and Engineering Design. John Wiley, Chichester (1997)
Google Scholar
Infante-Lopez, G.: Two-Level Probabilistic Grammars for Natural Language Parsing. PhD thesis, Universiteit van Amsterdam (2005)
Google Scholar
Klein, D., Manning, C.: Accurate unlexicalized parsing. In: Proc. 41st ACL (2003)
Google Scholar
Matsuzaki, T., M.Y.: Probabilistic cfg with latent annotations. In: ACL (2005)
Google Scholar
Petrov, S., Barrett, L., Klein, D.: Learning accurate, compact, and interpretable tree annotation. In: ACL (2006)
Google Scholar
Charniak, E.: A maximum-entropy-inspired parser. In: NAACL 2000 (2000)
Google Scholar
Mohri, M., Roark, B.: Probabilistic context-free grammar induction based on structural zeros. In: HLT-NAACL 2006 (2006)
Google Scholar
Klein, D., Manning, C.: Distributional phrase structure induction. In: CoNLL 2001 (2001)
Google Scholar
Schone, P., Jurafsky, D.: Language-independent induction of part of speech class labels using only language universals. In: IJCAI 2001 (2001)
Google Scholar
Henderson, J., Titor, I.: Data-defined kernels for parse reranking derived from probabilistic models. In: ACL (2005)
Google Scholar
Osborne, M.: Shallow parsing as part-of-speech tagging. In: Conll. (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Grupo de Procesamiento de Lenguaje Natural, Universidad Nacional de Córdoba, Argentina
Martín Ariel Domínguez & Gabriel Infante-Lopez
Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina
Gabriel Infante-Lopez

Authors

Martín Ariel Domínguez
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Infante-Lopez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 41296, Göteborg, Sweden
Bengt Nordström & Aarne Ranta &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Domínguez, M.A., Infante-Lopez, G. (2008). Searching for Part of Speech Tags That Improve Parsing Models. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-85287-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics