Simplifying the Classification of App Reviews Using Only Lexical Features

Shah, Faiz Ali; Sirts, Kairit; Pfahl, Dietmar

doi:10.1007/978-3-030-29157-0_8

Faiz Ali Shah⁹,
Kairit Sirts⁹ &
Dietmar Pfahl⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1077))

Included in the following conference series:

International Conference on Software Technologies

409 Accesses
2 Citations

Abstract

User reviews submitted to app marketplaces contain information that falls into different categories, e.g., feature evaluation, feature request, and bug report. This information is valuable for developers to improve the quality of mobile applications. However, due to the large volume of reviews received every day, manual classification of user reviews into these categories is not feasible. Therefore, developing automatic classification methods using machine learning approaches is desirable. In this study, we address the problem of automatic classification of app review sentences (as opposed to full reviews) into different categories. We compare the simplest textual machine learning classifier using only lexical features – the so-called Bag-of-Words (BoW) approach – with more complex models used in previous work adopting rich linguistic features. We find that the performance of the simple BoW model is very competitive and has the advantage of not requiring any external linguistic tools to extract the features. Moreover, we experiment with deep learning based Convolutional Neural Network (CNN) models that have recently achieved state-of-the-art results in many classification tasks. We find that, on average, the CNN models do not perform significantly better than the simple BoW model. Finally, the manual analysis of misclassification errors and data annotations suggests that classifying review sentences in isolation does not always contain enough information to make a correct prediction. Thus, we suggest that adopting neural models to incorporate additional contextual knowledge might improve the classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://guxd.github.io/srminer/appendix.html.
2.
https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.
3.
http://www.nltk.org/.
4.
https://stanfordnlp.github.io/CoreNLP/.
5.
https://spacy.io/.
6.
https://code.google.com/archive/p/word2vec/.
7.
http://scikit-learn.org/stable/.
8.
https://github.com/dennybritz/cnn-text-classification-tf.
9.
https://www.tensorflow.org/.
10.
There are no examples from the sentence type Feature Request because all sentences in our sample annotated with that type contained an aspect term.

References

Chen, N., Lin, J., Hoi, S.C.H., Xiao, X., Zhang, B.: AR-miner: mining informative reviews for developers from mobile app marketplace. In: Proceedings of the ICSE 2014, pp. 767–778. ACM Press (2014)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)
MATH Google Scholar
Du, J., Gui, L., Xu, R., He, Y.: A convolutional attention model for text classification. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds.) NLPCC 2017. LNCS (LNAI), vol. 10619, pp. 183–195. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73618-1_16
Chapter Google Scholar
Fu, W., Menzies, T.: Easy over hard: a case study on deep learning. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, pp. 49–60. ACM, New York (2017). https://doi.org/10.1145/3106237.3106256, http://doi.acm.org/10.1145/3106237.3106256
Gao, C., Zeng, J., Lo, D., Lin, C.Y., Lyu, M.R., King, I.: Infar: insight extraction from app reviews. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2018, pp. 904–907. ACM, New York (2018). https://doi.org/10.1145/3236024.3264595, http://doi.acm.org/10.1145/3236024.3264595
Genc-Nayebi, N., Abran, A.: A systematic literature review: opinion mining studies from mobile app store user reviews. J. Syst. Softw. 125, 207–219 (2017)
Article Google Scholar
Gu, X., Kim, S.: What parts of your apps are loved by users? In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 760–770, November 2015. https://doi.org/10.1109/ASE.2015.57
Iacob, C., Harrison, R.: Retrieving and analyzing mobile apps feature requests from online reviews. In: Proceedings of the 10th Working Conference on Mining Software Repositories, pp. 41–44. IEEE Press (2013)
Google Scholar
Iacob, C., Harrison, R., Faily, S.: Online reviews as first class artifacts in mobile app development. In: Memmi, G., Blanke, U. (eds.) MobiCASE 2013. LNICST, vol. 130, pp. 47–53. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05452-0_4
Chapter Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the EMNLP 2014, pp. 1746–1751. ACL (2014)
Google Scholar
Liu, T., Yu, S., Xu, B., Yin, H.: Recurrent networks with attention and convolutional networks for sentence representation and classification. Appl. Intell. 48(10), 3797–3806 (2018)
Article Google Scholar
Lu, M., Liang, P.: Automatic classification of non-functional requirements from augmented app user reviews. In: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering, EASE 2017, pp. 344–353. ACM, New York (2017). https://doi.org/10.1145/3084226.3084241, http://doi.acm.org/10.1145/3084226.3084241
Maalej, W., Nabil, H.: Bug report, feature request, or simply praise? On automatically classifying app reviews. In: Proceedings of RE 2015, pp. 116–125. IEEE, August 2015
Google Scholar
Martin, W., Sarro, F., Jia, Y., Zhang, Y., Harman, M.: A survey of app store analysis for software engineering. IEEE Trans. Softw. Eng. 43(9), 817–847 (2017)
Article Google Scholar
McIlroy, S., Ali, N., Khalid, H., Hassan, A.E.: Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empir. Softw. Eng. 21(3), 1067–1106 (2016)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pagano, D., Maalej, W.: User feedback in the appstore: an empirical study. In: Proceedings of RE 2013, pp. 125–134 (2013)
Google Scholar
Panichella, S., Di Sorbo, A., Guzman, E., Visaggio, C.A., Canfora, G., Gall, H.C.: How can i improve my app? Classifying user reviews for software maintenance and evolution. In: Proceedings of the 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), ICSME 2015, pp. 281–290. IEEE Computer Society, Washington, D.C. (2015). https://doi.org/10.1109/ICSM.2015.7332474, http://dx.doi.org/10.1109/ICSM.2015.7332474
Shah, F.A., Sirts, K., Pfahl, D.: Simple app review classification with only lexical features. In: Proceedings of the 13th International Conference on Software Technologies, ICSOFT, vol. 1, pp. 112–119. INSTICC, SciTePress (2018). https://doi.org/10.5220/0006855901460153
Socher, R., Lin, C.C.Y., Ng, A.Y., Manning, C.D.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML 2011, pp. 129–136. Omnipress, Madison (2011). http://dl.acm.org/citation.cfm?id=3104482.3104499
Sorbo, A.D., Panichella, S., Alexandru, C.V., Visaggio, C.A., Canfora, G.: Surf: summarizer of user reviews feedback. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 55–58, May 2017. https://doi.org/10.1109/ICSE-C.2017.5
Villarroel, L., Bavota, G., Russo, B., Oliveto, R., Di Penta, M.: Release planning of mobile apps based on user reviews. In: Proceedings of the ICSE 2016, pp. 14–24. ACM (2016)
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Google Scholar

Download references

Acknowledgments

We are grateful to Xiaodong Gu for sharing the review dataset for this study. This research was supported by the institutional research grant IUT20-55 of the Estonian Research Council and the Estonian Center of Excellence in ICT research (EXCITE).

Author information

Authors and Affiliations

Institute of Computer Science, University of Tartu, Tartu, Estonia
Faiz Ali Shah, Kairit Sirts & Dietmar Pfahl

Authors

Faiz Ali Shah
View author publications
You can also search for this author in PubMed Google Scholar
Kairit Sirts
View author publications
You can also search for this author in PubMed Google Scholar
Dietmar Pfahl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Faiz Ali Shah .

Editor information

Editors and Affiliations

Information Systems Group, University of Twente, Enschede, The Netherlands
Marten van Sinderen
Wrocław University of Economics, Wrocław, Poland
Leszek A. Maciaszek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shah, F.A., Sirts, K., Pfahl, D. (2019). Simplifying the Classification of App Reviews Using Only Lexical Features. In: van Sinderen, M., Maciaszek, L. (eds) Software Technologies. ICSOFT 2018. Communications in Computer and Information Science, vol 1077. Springer, Cham. https://doi.org/10.1007/978-3-030-29157-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-29157-0_8
Published: 13 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29156-3
Online ISBN: 978-3-030-29157-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics