Abstract
The main focus of this textbook thus far has been the analysis of numerical data. Text analytics, introduced in this chapter, concerns itself with understanding and examining data in word formats, which tend to be more unstructured and therefore more complex. Text analytics uses tools such as those embedded in R in order to extract meaning from large amounts of word-based data. Two methods are described in this chapter: bag-of-words and natural language processing (NLP). This chapter is focused on the bag-of-words approach. The bag-of-words approach does not attribute meaning to the sequence of words. Its applications include clustering or segmentation of documents and sentiment analysis. Natural language processing uses the order and “type” of words to infer the meaning. Hence, NLP deals more with issues such as parts of speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Tutorial link—http://www.mjdenny.com/Text_Processing_In_R.html (accessed on Dec 27, 2017).
- 2.
The dataset “icecream.csv” can be downloaded from the book’s website.
- 3.
https://raw.githubusercontent.com/sudhir-voleti/profile-script/master/sudhir%20shiny%20app%20run%20lists.txt (accessed on Dec 27, 2017).
- 4.
https://www.youtube.com/watch?v=tN6FYIOe0bs (accessed on Dec 27, 2017) Sudhir Voleti is the creator of video.
- 5.
http://wordnet.princeton.edu/ (accessed on Feb 7, 2018).
- 6.
NLTK package and documentation are available on http://www.nltk.org/ (accessed on Feb 10, 2018).
- 7.
Apache OpenNLP package and documentation are available on https://opennlp.apache.org/ (accessed on Feb 10, 2018).
References
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. Sebastopol, CA: O’Reilly Media.
Robinson, D., & Silge, J. (2017). Text mining with R: A tidy approach. Sebastopol, CA: O’Reilly Media.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
Supplementary Data 1
Generate_Document_Word_Matrix (CPP 1 kb)
Supplementary Data 2
Github shiny code (R 3 kb)
Supplementary Data 3
Icecream (R 4 kb)
Supplementary Data 4
Ice-cream (TXT 129 kb)
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Voleti, S. (2019). Text Analytics. In: Pochiraju, B., Seshadri, S. (eds) Essentials of Business Analytics. International Series in Operations Research & Management Science, vol 264. Springer, Cham. https://doi.org/10.1007/978-3-319-68837-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-68837-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68836-7
Online ISBN: 978-3-319-68837-4
eBook Packages: Business and ManagementBusiness and Management (R0)