Skip to main content

Building a Pronominalization Model by Feature Selection and Machine Learning

  • Conference paper
Natural Language Processing – IJCNLP 2004 (IJCNLP 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

  • 1577 Accesses

Abstract

Pronominalization is an important component in generating a coherent text. In this paper, we identify features that influence pronominalization, and construct a pronoun generation model by using various machine learning techniques. The old entities, which are the target of pronominalization, are categorized into three types according to their tendency in attentional state: Cb and old-Cp derived from a Centering model, and the remaining old entities. We construct a pronoun generation model for each type. Eighty-seven texts are gathered from three genres for training and testing. Using this, we verify that our proposed features are well defined to explain pronominalization in Korean, and we also show that our model significantly outperforms previous ones with 99% confidence level by t-test. We also identify central features that have a strong influence on pronominalization across genres.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Grosz, B.J., Joshi, A.K., Weinstein, S.: Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics 21(2), 203–225 (1995)

    Google Scholar 

  2. Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. Thesis submitted in partial fulfillment of the requirements of the degree of Doctor of Philosophy at the University of Waikato (1998)

    Google Scholar 

  3. Sachie, H.: Anaphoric Expression Selection in the Generation of Japanese. Information Processing Society of Japan (143) (2001)

    Google Scholar 

  4. Kibble, R., Power, R.: Using centering theory to plan coherent texts. In: Proceedings of the 12th Amsterdam Colloquium (1999)

    Google Scholar 

  5. Kibble, R., Power, R.: An integrated framework for text planning and pronomi-nalization. In: Proceedings of the 1st International Conference on Natural Language Generation (INLG 2000), Mitzpe Ramon, Israel, pp. 77–84 (2000)

    Google Scholar 

  6. Kim, M.K.: Conditions on Deletion in Korean based on Information Packaging. Discourse and Cognition 1(2), 61–88 (1999)

    Google Scholar 

  7. Kim, M.K.: Zero vs. Overt NPs in Korean Discourse: A Centering Analysis. Korean Journal of Linguistics 28-1, 29–49 (2003)

    Google Scholar 

  8. Poesio, M., Henschel, R., Hitzeman, J., Kibble, R.: Statistical NP generation: A first report. In: Kibble, R., van Deemter, K. (eds.) Proceedings of the Workshop on The Generation of Nominal Expressions, 11th European Summer School on Logic, Language, and Information, Utrecht, August 9-13 (1999)

    Google Scholar 

  9. Roh, J.E., Lee, J.H.: Coherent Text Generation using Entity-based Coherence Measures. In: ICCPOL, Shen-Yang, China, pp. 243–249 (2003)

    Google Scholar 

  10. Ryu, B.R.: Centering and Zero Anaphora in the Korean Discourse, Seoul National University, Ms Thesis (2001)

    Google Scholar 

  11. Strube, M., Hahn, U.: Functional Centering: Grounding Referential Coherence in Information Structure. Computational Linguistics 25(3), 309–344 (1999)

    Google Scholar 

  12. Strube, M., Wolters, M.: A Probabilistic Genre-Independent Model of Pronominali-zation. In: Proceedings of the first Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, USA, April 29-May 4, pp. 18–25 (2000)

    Google Scholar 

  13. Yamura-Takei, M., Fujiwara, M., Aizawa, T.: Centering as an Anaphora Gen-eration Algorithm: A Language Learning Aid Perspective. In: NLPRS 2001, Tokyo, Japan, pp. 557–562 (2001)

    Google Scholar 

  14. Yeh, C.-L., Mellish, C.: An Empirical Study on the Generation of Anaph-ora in Chinese. Computational Linguistics 23-1, 169–190 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Roh, JE., Lee, JH. (2005). Building a Pronominalization Model by Feature Selection and Machine Learning. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30211-7_60

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24475-2

  • Online ISBN: 978-3-540-30211-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics