Skip to main content
  • 217 Accesses

Abstract

This chapter first analyzes the general relation between linguistic analysis and computational method. As a familiar example, automatic word form recognition is used. This example exhibits a number of properties which are methodologically characteristic for all components of grammar. We then show methods for investigating the frequency distribution of words in natural language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. See O. Jespersen 1921, p. 341–346.

    Google Scholar 

  2. See K. Hess, J. Brustkern and W. Lenders 1983.

    Google Scholar 

  3. Cf. H. Bergenholtz 1989, D. Biber 1994, N. Oostdijk and P. de Haan (eds.) 1994.

    Google Scholar 

  4. The consequences of the tagset choice on the results of the corpus analysis are mentioned in S. Greenbaum and N. Yibin 1994, p. 34.

    Google Scholar 

  5. The use of HMMs for the grammatical tagging of corpora is described in, e.g., G. Leech, R. Garside and E. Atwell 1983, I. Marshall 1983, S. DeRose 1988, R. Sharman 1990, P. Brown, V. Della Pietra, et al. 1991. See also K. Church and L.R. Mercer 1993.

    Google Scholar 

  6. Meanwhile, the tagged BNC-lists have been removed from the web.

    Google Scholar 

  7. Unfortunately, neither G. Leech 1995 nor L. Burnard 1995 specify what exactly constitutes an error in tagging the BNC. A new project to improve the tagger was started in June 1995, however. It is called The British National Corpus Tag Enhancement Project’ and its results were originally scheduled to be made available in September 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hausser, R. (2001). Corpus analysis. In: Foundations of Computational Linguistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04337-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-04337-0_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-07626-8

  • Online ISBN: 978-3-662-04337-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics