Skip to main content

Corpus Linguistics

  • Living reference work entry
  • First Online:
Encyclopedia of Big Data
  • 169 Accesses

Introduction

Corpus linguistics is, broadly speaking, the application of “big data” to the science of linguistics. Unlike traditional linguistic analysis [caricatured by Fillmore (1992) as “armchair linguistics”], which relies on native intuition and introspection, corpus linguists rely on large samples to quantitatively analyze the distribution of linguistic items. It has therefore tended to focus on what can be easily measured by computer and quantified, such as words, phrases, and word-based grammar, instead of more abstract concepts such as discourse or formal syntax. With the advent of high-powered computers and the increased availability of machine-readable texts, it has become a major force in modern linguistic research.

History

The use of corpora for language analysis long predates computers. Theologians were making Biblical concordances in the eighteenth century, and Samuel Johnson started a tradition followed to this day (e.g., most famously by the Oxford English Dictionary)...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Further Readings

  • Fillmore, C. J. (1992). “Corpus linguistics” or “computer-aided armchair linguistics”. In J. Svartvik (Ed.), Directions in corpus linguistics: Proceedings of Nobel symposium 82. 4–8 August 1991 (pp. 35–60). Berlin: Mouton de Gruyter.

    Google Scholar 

  • Juola, P. (2006). Authorship attribution. Foundations and Trends in Information Retrieval, 1(3), 233–334.

    Article  Google Scholar 

  • Kennedy, G. (1998). An introduction to corpus linguistics. London: Longman.

    Google Scholar 

  • Kučera, H., & Nelson Francis, W. (1967). Computational analysis of present-day American English. Providence: Brown University Press.

    Google Scholar 

  • McEnery, T., & Hardy, A. (2012). Corpus linguistics: Method, theory, practice. Cambridge: Cambridge University Press.

    Google Scholar 

  • Meyer, C. F. (2002). English corpus linguistics: An introduction. Cambridge: Cambridge University Press.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Juola .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Juola, P. (2018). Corpus Linguistics. In: Schintler, L., McNeely, C. (eds) Encyclopedia of Big Data. Springer, Cham. https://doi.org/10.1007/978-3-319-32001-4_523-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32001-4_523-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32001-4

  • Online ISBN: 978-3-319-32001-4

  • eBook Packages: Springer Reference Business and ManagementReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences

Publish with us

Policies and ethics