Combining Feature Selection Methods with BERT: An In-depth Experimental Study of Long Text Classification

Wang, Kai; Huang, Jiahui; Liu, Yuqi; Cao, Bin; Fan, Jing

doi:10.1007/978-3-030-67537-0_34

Kai Wang²¹,
Jiahui Huang²¹,
Yuqi Liu²¹,
Bin Cao²¹ &
…
Jing Fan²¹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 349))

Included in the following conference series:

International Conference on Collaborative Computing: Networking, Applications and Worksharing

1558 Accesses
3 Citations

Abstract

With the introduction of BERT by Google, a large number of pre-training models have been proposed. Using pre-training models to solve text classification problems has become the mainstream. However, the complexity of BERT grows quadratically with the text length, hence BERT is not suitable for processing long text. Then the researchers proposed a new pre-training model XLNet to solve the long text classification problem. But XLNet requires more GPUs and longer fine-tuning time than BERT. To the best of our knowledge, no attempt has been done before combining traditional feature selection methods with BERT for long text classification. In this paper, we use the classic feature selection methods to shorten the long text and then use the shortened text as the input of BERT. Finally, we conduct extensive experiments on the public data set and the real-world data set from China Telecom. The experimental results prove that our methods are effective for helping BERT to process long text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmed, H., Traore, I., Saad, S.: Detecting opinion spams and fake news using text classification. Secur. Priv. 1(1), e9 (2018)
Article Google Scholar
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019)
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.: Transformer-xl: attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Diao, Y., Lu, H., Wu, D.: A comparative study of classification based personal e-mail filtering. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 408–419. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45571-X_48
Chapter Google Scholar
Genkin, A., Lewis, D.D., Madigan, D.: Large-scale Bayesian logistic regression for text categorization. Technometrics 49(3), 291–304 (2007)
Article MathSciNet Google Scholar
Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, Hoboken (2013)
Book Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Chapter Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Kitaev, N., Kaiser, Ł., Levskaya, A.: Reformer: the efficient transformer. arXiv preprint arXiv:2001.04451 (2020)
Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E 69(6), 066138 (2004)
Article MathSciNet Google Scholar
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)
Liu, X., Wangperawong, A.: Transfer learning robustness in multi-class categorization by fine-tuning pre-trained contextualized language models. arXiv preprint arXiv:1909.03564 (2019)
Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. Res. 2(Dec), 139–154 (2001)
MATH Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Tech. rep., Stanford InfoLab (1999)
Google Scholar
Pappagari, R., Zelasko, P., Villalba, J., Carmiel, Y., Dehak, N.: Hierarchical transformers for long document classification. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 838–844. IEEE (2019)
Google Scholar
Rae, J.W., Potapenko, A., Jayakumar, S.M., Lillicrap, T.P.: Compressive transformers for long-range sequence modelling. arXiv preprint arXiv:1911.05507 (2019)
Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142, New Jersey (2003)
Google Scholar
Satorra, A., Bentler, P.M.: A scaled difference chi-square test statistic for moment structure analysis. Psychometrika 66(4), 507–514 (2001)
Article MathSciNet Google Scholar
Sukhbaatar, S., Grave, E., Bojanowski, P., Joulin, A.: Adaptive attention span in transformers. arXiv preprint arXiv:1905.07799 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, S., Wu, B., Wang, B., Tong, X.: Complaint classification using hybrid-attention GRU neural network. In: Yang, Q., Zhou, Z.-H., Gong, Z., Zhang, M.-L., Huang, S.-J. (eds.) PAKDD 2019. LNCS (LNAI), vol. 11439, pp. 251–262. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16148-4_20
Chapter Google Scholar
Xu, L., et al.: Clue: a Chinese language understanding evaluation benchmark. arXiv preprint arXiv:2004.05986 (2020)
Yang, W., Zhang, H., Lin, J.: Simple applications of BERT for ad hoc document retrieval. arXiv preprint arXiv:1903.10972 (2019)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, pp. 5753–5763 (2019)
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Google Scholar
Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., Xu, B.: Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv preprint arXiv:1611.06639 (2016)

Download references

Acknowledgments

This research was sponsored by Zhejiang Lab (2020AA3AB05).

Author information

Authors and Affiliations

Zhejiang University of Technology, Hangzhou, China
Kai Wang, Jiahui Huang, Yuqi Liu, Bin Cao & Jing Fan

Authors

Kai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yuqi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Cao
View author publications
You can also search for this author in PubMed Google Scholar
Jing Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Cao .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
Xi’an Jiaotong-Liverpool University, Suzhou, China
Xinheng Wang
London South Bank University, London, UK
Muddesar Iqbal
Hangzhou Dianzi University, Hangzhou, China
Yuyu Yin
Zhejiang University, Hangzhou, China
Jianwei Yin
Fudan University, Shanghai, China
Ning Gu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, K., Huang, J., Liu, Y., Cao, B., Fan, J. (2021). Combining Feature Selection Methods with BERT: An In-depth Experimental Study of Long Text Classification. In: Gao, H., Wang, X., Iqbal, M., Yin, Y., Yin, J., Gu, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 349. Springer, Cham. https://doi.org/10.1007/978-3-030-67537-0_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-67537-0_34
Published: 22 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67536-3
Online ISBN: 978-3-030-67537-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics