Backoff DOP: Parameter Estimation by Backoff

Buratto, Luciano; Sima’an, Khalil

doi:10.1007/978-3-540-39398-6_6

Luciano Buratto⁷ &
Khalil Sima’an⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2807))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

425 Accesses

Abstract

The Data Oriented Parsing (DOP) model currently achieves state-of-the-art parsing on benchmark corpora. However, existing DOP parameter estimation methods are known to be biased, and ad hoc adjustments are needed in order to reduce the effects of these biases on performance. This paper presents a novel estimation procedure that exploits a unique property of DOP: different derivations can generate the same parse-tree. We show that the different derivations represent different “Markov orders” that the DOP model interpolates together. The idea behind the present method is to combine the different derivation orders by backoff instead of interpolation. This allows for a novel estimation procedure that employs Katz backoff for estimation. We report on experiments showing error reduction of up to 15% with respect to earlier methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bod, R.: What is the minimal set of fragments that achieves maximal parse accuracy? In: Proceedings of the 39th Annual Meeting of the ACL (ACL 2001) (2001)
Google Scholar
Bod, R.: Enriching Linguistics with Statistics: Performance models of Natural Language. PhD dissertation. ILLC dissertation series 1995-14, University of Amsterdam (1995)
Google Scholar
Chelba, C., Jelinek, F.: Exploiting syntactic structure for language modeling. In: Boitet, C., Whitelock, P. (eds.) Proceedings of the Thirty-Sixth Annual Meeting of the Association for Computational Linguistics and Seventeenth International Conference on Computational Linguistics, pp. 225–231. Morgan Kaufmann Publishers, San Francisco (1998)
Google Scholar
Charniak, E.: A maximum entropy inspired parser. In: Proceedings of the 1st Meeting of the North American Chapter of the ACL (NAACL 2000), Seattle, Washington, USA, pp. 132–139 (2000)
Google Scholar
Black, E., Jelinek, F., Lafferty, J., Magerman, D., Mercer, R., Roukos, S.: Towards History based Grammars: Using Richer Models for Probabilistic Parsing. In: Proceedings of the 31st Annual Meeting of the ACL (ACL 1993), Columbus, Ohio (1993)
Google Scholar
Sima’an, K.: Computational complexity of probabilistic disambiguation. Grammars 5(2), 125–151 (2002)
Article MATH MathSciNet Google Scholar
Bonnema, R., Buying, P., Scha, R.: A new probability model for data oriented parsing. In: Dekker, P. (ed.) Proceedings of the Twelfth Amsterdam Colloquium, pp. 85–90. University of Amsterdam, Amsterdam (1999)
Google Scholar
Johnson, M.: The DOP estimation method is biased and inconsistent. Computational Linguistics 28(1), 71–76 (2002)
Article Google Scholar
Buratto, L.: Back-off as parameter estimation for DOP models. In: de Jongh, D. (ed.) Institute for Logic, Language and Computation (ILLC). Master of Logic Series (MoL-2002-07). ILLC Scientific Publications, Amsterdam (2002)
Google Scholar
Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing (ASSP) 35(3), 400–401 (1987)
Article Google Scholar
Chen, S., Goodman, J.: An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Harvard University (1998)
Google Scholar
Good, I.: The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–264 (1953)
MATH MathSciNet Google Scholar
Veldhuijzen van Zanten, G.: Semantics of update expressions. Technical report #24, Netherlands Organization for Scientific Research (NWO), Priority Programme for Speech and Language Technology (1996), http://grid.let.rug.nl:4321/
Black, E., et al.: A procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars. In: Proceedings of the February 1991 DARPA Speech and Natural Language Workshop, pp. 306–311. Morgan Kaufman, San Mateo (1991)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Logic, Language and Computation (ILLC), University of Amsterdam, Amsterdam, The Netherlands
Luciano Buratto & Khalil Sima’an

Authors

Luciano Buratto
View author publications
You can also search for this author in PubMed Google Scholar
Khalil Sima’an
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Václav Matoušek & Pavel Mautner &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buratto, L., Sima’an, K. (2003). Backoff DOP: Parameter Estimation by Backoff. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-39398-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics