Skip to main content

Part of the book series: Textbooks on Political Analysis ((TPA))

  • 1476 Accesses

Abstract

This chapter will introduce concepts and techniques for using unstructured text as a data source. We will first review examples of the types of extant text data that you may encounter. We will then discuss the process of turning that text into a data source more amenable to the types of quantitative analysis that we are likely to perform.

Electronic Supplementary Material The online version of this chapter (https://doi.org/10.1007/978-3-030-36826-5_14) contains supplementary material, which is available to authorized users.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.cs.cmu.edu/~enron/.

  2. 2.

    Part of the reason that they can be a bad signal is due to URL shorteners obfuscating the final destination.

  3. 3.

    https://www.nltk.org/.

  4. 4.

    https://stanfordnlp.github.io/CoreNLP/.

  5. 5.

    https://spacy.io/.

  6. 6.

    https://github.com/CivilServiceUSA/us-senate.

  7. 7.

    If you wish to replicate this study exactly, the CSV version of the dataset is included with the book at https://dataverse.harvard.edu/dataverse/python-book.

  8. 8.

    https://www.congress.gov/help/field-values/member-bioguide-ids.

  9. 9.

    https://github.com/chartbeat-labs/textacy.

References

  1. Lewis, J. B., Poole, K., Rosenthal, H., Boche, A., Rudkin, A., & Sonnet, L. (2017). Voteview: Congressional roll-call votes database. https://voteview.com/

    Google Scholar 

  2. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781

    Google Scholar 

  3. Reese, D. (2012). Is Sen. Claire McCaskill a moderate? The Washington Post. Retrieved August 22, 2013.

    Google Scholar 

  4. Spärck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.

    Article  Google Scholar 

  5. van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cutler, J., Dickenson, M. (2020). Case Study: Natural Language Processing. In: Computational Frameworks for Political and Social Research with Python. Textbooks on Political Analysis. Springer, Cham. https://doi.org/10.1007/978-3-030-36826-5_14

Download citation

Publish with us

Policies and ethics