Skip to main content

Using Stylometric Features for Sentiment Classification

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2015)

Abstract

This paper is a comparative study about text feature extraction methods in statistical learning of sentiment classification. Feature extraction is one of the most important steps in classification systems. We use stylometry to compare with TF-IDF and Delta TF-IDF baseline methods in sentiment classification. Stylometry is a research area of Linguistics that uses statistical techniques to analyze literary style. In order to assess the viability of the stylometry, we create a corpus of product reviews from the most traditional online service in Portuguese, namely, Buscapé. We gathered 2000 review about Smartphones. We use three classifiers, Support Vector Machine (SVM), Naive Bayes, and J48 to evaluate whether the stylometry has higher accuracy than the TF-IDF and Delta TF-IDF methods in sentiment classification. We found the better result with the SVM classifier (82,75%) of accuracy with stylometry and (72,62%) with Delta TF-IDF and (56,25%) with TF-IDF. The results show that stylometry is quite feasible method for sentiment classification, outperforming the accuracy of the baseline methods. We may emphasize that approach used has promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liu, B.: Sentiment Analysis and Opinion Mining. In: Synthesis Digital Library of Engineering and Computer Science. Morgan & Claypool (2012)

    Google Scholar 

  2. Kotsiantis, S.B.: Supervised machine learning: A review of classification techniques. In: Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies, pp. 3–24. IOS Press, Amsterdam (2007)

    Google Scholar 

  3. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)

    Article  Google Scholar 

  4. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Data-centric systems and applications. Springer (2007)

    Google Scholar 

  5. Turney, P.D.: Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 417–424. Association for Computational Linguistics, Stroudsburg (2002)

    Google Scholar 

  6. Liu, B.: Sentiment analysis and subjectivity. In: Handbook of Natural Language Processing, 2nd edn., Taylor and Francis Group, Boca (2010)

    Google Scholar 

  7. He, R.C., Rasheed, K.: Using machine learning techniques for stylometry. In: Arabnia, H.R., Mun, Y. (eds.) IC-AI, pp. 897–903. CSREA Press (2004)

    Google Scholar 

  8. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002, vol. 10, pp. 79–86. Association for Computational Linguistics, Stroudsburg (2002)

    Chapter  Google Scholar 

  9. Yu, H., Hatzivassiloglou, V.: Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP 2003, pp. 129–136. Association for Computational Linguistics, Stroudsburg (2003)

    Google Scholar 

  10. Sharma, A., Dey, S.: A document-level sentiment analysis approach using artificial neural network and sentiment lexicons. SIGAPP Appl. Comput. Rev. 12, 67–75 (2012)

    Article  Google Scholar 

  11. Sharma, A., Dey, S.: A boosted svm based ensemble classifier for sentiment analysis of online reviews. SIGAPP Appl. Comput. Rev. 13, 43–52 (2013)

    Article  Google Scholar 

  12. Njolstad, P., Hoysaeter, L., Wei, W., Gulla, J.: Evaluating feature sets and classifiers for sentiment analysis of financial news. In: 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol. 2, pp. 71–78 (2014)

    Google Scholar 

  13. He, R.C., Rasheed, K.: Using machine learning techniques for stylometry. In: Proceedings of the International Conference on Artificial Intelligence, IC-AI 2004, Proceedings of the International Conference on Machine Learning; Models, Technologies & Applications, MLMTA 2004, Las Vegas, Nevada, USA, June 21-24, vol. 2, pp. 897–903 (2004)

    Google Scholar 

  14. Hartmann, N., Avanço, L., Filho, P.P.B., Duran, M.S., das Graças Volpe Nunes, M., Pardo, T., Aluísio, S.M.: A large corpus of product reviews in portuguese: Tackling out-of-vocabulary words. In: LREC, pp. 3865–3871 (2014)

    Google Scholar 

  15. Aghdam, M.H., Ghasem-Aghaee, N., Basiri, M.E.: Text feature selection using ant colony optimization. Expert Syst. Appl. 36, 6843–6853 (2009)

    Article  Google Scholar 

  16. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)

    Book  Google Scholar 

  17. Martineau, J., Finin, T.: Delta tfidf: An improved feature space for sentiment analysis. In: ICWSM (2009)

    Google Scholar 

  18. Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: Writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57, 378–393 (2006)

    Article  Google Scholar 

  19. Iqbal, F., Khan, L.A., Fung, B.C.M., Debbabi, M.: e-mail authorship verification for forensic investigation. In: Proceedings of the 2010 ACM Symposium on Applied Computing 2010, pp. 1591–1598. ACM, New York (2010)

    Google Scholar 

  20. Pavelec, D., Justino, E., Oliveira, L.S.: Author identification using stylometric features. Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial 11, 59–65 (2007)

    Google Scholar 

  21. Abbasi, A., Chen, H.: Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26, 7:1–7:29 (2008)

    Google Scholar 

  22. Iqbal, F., Hadjidj, R., Fung, B.C., Debbabi, M.: A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digital Investigation 5(suppl.), S42–S51 (2008), The Proceedings of the Eighth Annual {DFRWS} Conference

    Google Scholar 

  23. Schmid, H.: Probabilistic part-of-speech tagging using decision trees (1994)

    Google Scholar 

  24. Pablo Gamallo, M.G.: Freeling e treetagger: um estudo comparativo no âmbito do português. Technical report, Universidade de Santiago de Compostela (2013)

    Google Scholar 

  25. Maziero, E.G., Pardo, T.A.S., Di Felippo, A., Dias-da Silva, B.C.: A base de dados lexical e a interface web do tep 2.0: Thesaurus eletrnico para o portugus do brasil. In: Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web, WebMedia 2008, pp. 390–392. ACM, New York (2008)

    Google Scholar 

  26. Tweedie, F.J., Baayen, R.H.: How variable a constant be? measures of lexical richness in perspective. Computers and the Humanities 32, 323–352 (1998)

    Article  Google Scholar 

  27. Wang, S., Manning, C.D.: Baselines and bigrams: Simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, ACL 2012, vol. 2, pp. 90–94. Association for Computational Linguistics, Stroudsburg (2012)

    Google Scholar 

  28. Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment in short strength detection informal text. J. Am. Soc. Inf. Sci. Technol. 61, 2544–2558 (2010)

    Article  Google Scholar 

  29. Castillo, C., Mendoza, M., Poblete, B.: Information credibility on twitter. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 675–684. ACM, New York (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael T. Anchiêta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Anchiêta, R.T., Neto, F.A.R., de Sousa, R.F., Moura, R.S. (2015). Using Stylometric Features for Sentiment Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18117-2_15

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18116-5

  • Online ISBN: 978-3-319-18117-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics