Using Stylometric Features for Sentiment Classification

Anchiêta, Rafael T.; Neto, Francisco Assis Ricarte; de Sousa, Rogério Figueiredo; Moura, Raimundo Santos

doi:10.1007/978-3-319-18117-2_15

Rafael T. Anchiêta¹⁴,
Francisco Assis Ricarte Neto¹⁵,
Rogério Figueiredo de Sousa¹⁶ &
…
Raimundo Santos Moura¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9042))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

3413 Accesses
4 Citations

Abstract

This paper is a comparative study about text feature extraction methods in statistical learning of sentiment classification. Feature extraction is one of the most important steps in classification systems. We use stylometry to compare with TF-IDF and Delta TF-IDF baseline methods in sentiment classification. Stylometry is a research area of Linguistics that uses statistical techniques to analyze literary style. In order to assess the viability of the stylometry, we create a corpus of product reviews from the most traditional online service in Portuguese, namely, Buscapé. We gathered 2000 review about Smartphones. We use three classifiers, Support Vector Machine (SVM), Naive Bayes, and J48 to evaluate whether the stylometry has higher accuracy than the TF-IDF and Delta TF-IDF methods in sentiment classification. We found the better result with the SVM classifier (82,75%) of accuracy with stylometry and (72,62%) with Delta TF-IDF and (56,25%) with TF-IDF. The results show that stylometry is quite feasible method for sentiment classification, outperforming the accuracy of the baseline methods. We may emphasize that approach used has promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liu, B.: Sentiment Analysis and Opinion Mining. In: Synthesis Digital Library of Engineering and Computer Science. Morgan & Claypool (2012)
Google Scholar
Kotsiantis, S.B.: Supervised machine learning: A review of classification techniques. In: Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies, pp. 3–24. IOS Press, Amsterdam (2007)
Google Scholar
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2, 1–135 (2008)
Article Google Scholar
Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Data-centric systems and applications. Springer (2007)
Google Scholar
Turney, P.D.: Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 417–424. Association for Computational Linguistics, Stroudsburg (2002)
Google Scholar
Liu, B.: Sentiment analysis and subjectivity. In: Handbook of Natural Language Processing, 2nd edn., Taylor and Francis Group, Boca (2010)
Google Scholar
He, R.C., Rasheed, K.: Using machine learning techniques for stylometry. In: Arabnia, H.R., Mun, Y. (eds.) IC-AI, pp. 897–903. CSREA Press (2004)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002, vol. 10, pp. 79–86. Association for Computational Linguistics, Stroudsburg (2002)
Chapter Google Scholar
Yu, H., Hatzivassiloglou, V.: Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP 2003, pp. 129–136. Association for Computational Linguistics, Stroudsburg (2003)
Google Scholar
Sharma, A., Dey, S.: A document-level sentiment analysis approach using artificial neural network and sentiment lexicons. SIGAPP Appl. Comput. Rev. 12, 67–75 (2012)
Article Google Scholar
Sharma, A., Dey, S.: A boosted svm based ensemble classifier for sentiment analysis of online reviews. SIGAPP Appl. Comput. Rev. 13, 43–52 (2013)
Article Google Scholar
Njolstad, P., Hoysaeter, L., Wei, W., Gulla, J.: Evaluating feature sets and classifiers for sentiment analysis of financial news. In: 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol. 2, pp. 71–78 (2014)
Google Scholar
He, R.C., Rasheed, K.: Using machine learning techniques for stylometry. In: Proceedings of the International Conference on Artificial Intelligence, IC-AI 2004, Proceedings of the International Conference on Machine Learning; Models, Technologies & Applications, MLMTA 2004, Las Vegas, Nevada, USA, June 21-24, vol. 2, pp. 897–903 (2004)
Google Scholar
Hartmann, N., Avanço, L., Filho, P.P.B., Duran, M.S., das Graças Volpe Nunes, M., Pardo, T., Aluísio, S.M.: A large corpus of product reviews in portuguese: Tackling out-of-vocabulary words. In: LREC, pp. 3865–3871 (2014)
Google Scholar
Aghdam, M.H., Ghasem-Aghaee, N., Basiri, M.E.: Text feature selection using ant colony optimization. Expert Syst. Appl. 36, 6843–6853 (2009)
Article Google Scholar
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, New York (2011)
Book Google Scholar
Martineau, J., Finin, T.: Delta tfidf: An improved feature space for sentiment analysis. In: ICWSM (2009)
Google Scholar
Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: Writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57, 378–393 (2006)
Article Google Scholar
Iqbal, F., Khan, L.A., Fung, B.C.M., Debbabi, M.: e-mail authorship verification for forensic investigation. In: Proceedings of the 2010 ACM Symposium on Applied Computing 2010, pp. 1591–1598. ACM, New York (2010)
Google Scholar
Pavelec, D., Justino, E., Oliveira, L.S.: Author identification using stylometric features. Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial 11, 59–65 (2007)
Google Scholar
Abbasi, A., Chen, H.: Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26, 7:1–7:29 (2008)
Google Scholar
Iqbal, F., Hadjidj, R., Fung, B.C., Debbabi, M.: A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digital Investigation 5(suppl.), S42–S51 (2008), The Proceedings of the Eighth Annual {DFRWS} Conference
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees (1994)
Google Scholar
Pablo Gamallo, M.G.: Freeling e treetagger: um estudo comparativo no âmbito do português. Technical report, Universidade de Santiago de Compostela (2013)
Google Scholar
Maziero, E.G., Pardo, T.A.S., Di Felippo, A., Dias-da Silva, B.C.: A base de dados lexical e a interface web do tep 2.0: Thesaurus eletrnico para o portugus do brasil. In: Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web, WebMedia 2008, pp. 390–392. ACM, New York (2008)
Google Scholar
Tweedie, F.J., Baayen, R.H.: How variable a constant be? measures of lexical richness in perspective. Computers and the Humanities 32, 323–352 (1998)
Article Google Scholar
Wang, S., Manning, C.D.: Baselines and bigrams: Simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, ACL 2012, vol. 2, pp. 90–94. Association for Computational Linguistics, Stroudsburg (2012)
Google Scholar
Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., Kappas, A.: Sentiment in short strength detection informal text. J. Am. Soc. Inf. Sci. Technol. 61, 2544–2558 (2010)
Article Google Scholar
Castillo, C., Mendoza, M., Poblete, B.: Information credibility on twitter. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 675–684. ACM, New York (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Federal do Piauí, Teresina, Brazil
Rafael T. Anchiêta
Universidade Federal de Pernambuco, Recife, Brazil
Francisco Assis Ricarte Neto
Universidade Federal do Piauí, Teresina, Brazil
Rogério Figueiredo de Sousa & Raimundo Santos Moura

Authors

Rafael T. Anchiêta
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Assis Ricarte Neto
View author publications
You can also search for this author in PubMed Google Scholar
Rogério Figueiredo de Sousa
View author publications
You can also search for this author in PubMed Google Scholar
Raimundo Santos Moura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael T. Anchiêta .

Editor information

Editors and Affiliations

Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico DF, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anchiêta, R.T., Neto, F.A.R., de Sousa, R.F., Moura, R.S. (2015). Using Stylometric Features for Sentiment Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-18117-2_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Stylometric Features for Sentiment Classification