Skip to main content

The Role of Pre-processing in Twitter Sentiment Analysis

  • Conference paper
Intelligent Computing Methodologies (ICIC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8589))

Included in the following conference series:

Abstract

Recently, increasing attention has been attracted to Social Networking Sentiment Analysis. Twitter as one of the most fashional social networking platforms has been researched as a hot topic in this domain. Normally, sentiment analysis is regarded as a classification problem. Training a classifier with tweets data, there is a large amount of noise due to tweets’ shortness, marks, irregular words etc. In this work we explore the impact pre-processing methods make on twitter sentiment classification. We evaluate the effects of URLs, negation, repeated letters, stemming and lemmatization. Experimental results on the Stanford Twitter Sentiment Dataset show that sentiment classification accuracy rises when URLs features reservation, negation transformation and repeated letters normalization are employed while descends when stemming and lemmatization are applied. Moreover, we get a better result by augmenting the original feature space with bigram and emotions features. Comprehensive application of these measures makes us achieve classification accuracy of 85.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, pp. 1–12 (2009)

    Google Scholar 

  2. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)

    Google Scholar 

  3. Zhang, X., Fuehres, H., Gloor, P.: Predicting Stock Market Indicators Through Twitter “I hope it is not as bad as I fear”. Procedia - Social and Behavioral Sciences 26, 55–62 (2011)

    Article  Google Scholar 

  4. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. Journal of Computational Science 2(1), 1–8 (2011)

    Article  Google Scholar 

  5. Haddi, E., Liu, X., Shi, Y.: The Role of Text Pre-processing in Sentiment Analysis. Procedia Computer Science 17, 26–32 (2013)

    Article  Google Scholar 

  6. Asur, S., Huberman, B.A.: Predicting the future with social media. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 1, pp. 492–499. IEEE (2010)

    Google Scholar 

  7. Stieglitz, S., Dang-Xuan, L.: Political Communication and Influence through Microblogging-An Empirical Analysis of Sentiment in Twitter Messages and Retweet Behavior. In: 2012 45th Hawaii International Conference on System Science (HICSS), pp. 3500–3509. IEEE (2012)

    Google Scholar 

  8. Tumasjan, A., Sprenger, T.O., Sandner, P.G., et al.: Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. ICWSM 10, 178–185 (2010)

    Google Scholar 

  9. Williams, C., Gulati, G.: What is a social network worth? Facebook and vote share in the 2008 presidential primaries. American Political Science Association (2008)

    Google Scholar 

  10. Mishne, G., Glance, N.S.: Predicting Movie Sales from Blogger Sentiment. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 155–158 (2006)

    Google Scholar 

  11. Aciar, S., Zhang, D., Simoff, S., et al.: Informed recommender: Basing recommendations on consumer product reviews. IEEE Intelligent Systems 22(3), 39–47 (2007)

    Article  Google Scholar 

  12. Aguwa, C.C., Monplaisir, L., Turgut, O.: Voice of the customer: Customer satisfaction ratio based analysis. Expert Systems with Applications 39(11), 10112–10119 (2012)

    Article  Google Scholar 

  13. Kang, H., Yoo, S.J., Han, D.: Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications 39(5), 6000–6010 (2012)

    Article  Google Scholar 

  14. Saif, H., He, Y., Alani, H.: Alleviating data sparsity for twitter sentiment analysis. In: The 2nd Workshop on Making Sense of Microposts (2012)

    Google Scholar 

  15. Speriosu, M., Sudan, N., Upadhyay, S., et al.: Twitter polarity classification with label propagation over lexical links and the follower graph. In: Proceedings of the First workshop on Unsupervised Learning in NLP, pp. 53–63. Association for Computational Linguistics (2011)

    Google Scholar 

  16. Agarwal, A., Xie, B., Vovsha, I., et al.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38. Association for Computational Linguistics (2011)

    Google Scholar 

  17. Lin, C.J., Weng, R.C., Keerthi, S.S.: Trust region newton method for logistic regression. The Journal of Machine Learning Research 9, 627–650 (2008)

    MATH  MathSciNet  Google Scholar 

  18. Quan, C.Q., Ren, F.J.: Target Based Review Classification for Fine-grained Sentiment Analysis. International Journal of Innovative Computing, Information and Control 10(1) (2014)

    Google Scholar 

  19. Quan, C.Q., Ren, F.J.: Unsupervised Product Feature Extraction for Feature-oriented Opinion Determination. Information Sciences (2014), doi: http://dx.doi.org/10.1016/j.ins.2014.02.063

  20. Quan, C.Q., Wei, X.Q., Ren, F.J.: Combine Sentiment Lexicon and Dependency Parsing for Sentiment Classification. In: SII 2013 (December 2013)

    Google Scholar 

  21. Quan, C.Q., Ren, F.J., He, T.T.: Sentimental Classification Based on Kernel Methods. International Journal of Innovative Computing, Information and Control 6(6) (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Bao, Y., Quan, C., Wang, L., Ren, F. (2014). The Role of Pre-processing in Twitter Sentiment Analysis. In: Huang, DS., Jo, KH., Wang, L. (eds) Intelligent Computing Methodologies. ICIC 2014. Lecture Notes in Computer Science(), vol 8589. Springer, Cham. https://doi.org/10.1007/978-3-319-09339-0_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09339-0_62

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09338-3

  • Online ISBN: 978-3-319-09339-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics