Abstract
Text classification is a fundamental task that is widely used in various sub-domains of natural language processing, such as information extraction, semantic understanding, etc. For the general text classification problems, various deep learning models, such as Bi-LSTM, Transformer, BERT, etc. have been used which achieved good performance. In this paper, however, we consider a new problem on how to deal with a special scenario in text classification which has a weak sequential relationship among different classification entities. A typical example is in the block classification of resumes where there are sequential relationships existing amongst different blocks. By fully utilizing this useful sequential feature, we in this paper propose an effective hybrid model which combines a fully connected neural network model and a block-level recurrent neural network model with feature fusion that makes full use of such a sequential feature. The experimental results show that the average F1-score value of our model on three 1,400 real resume datasets is 5.5–11% higher than the existing mainstream algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chaib, S., Liu, H., Gu, Y., Yao, H.: Deep feature fusion for VHR remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 55(8), 4775–4784 (2017)
Chakrabarty, N., Kundu, T., Dandapat, S., Sarkar, A., Kole, D.K.: Flight arrival delay prediction using gradient boosting classifier. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds.) Emerging Technologies in Data Mining and Information Security. AISC, vol. 813, pp. 651–659. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1498-8_57
Chen, J., Zhang, C., Niu, Z.: A two-step resume information extraction algorithm. Math. Probl. Eng. 2018 (2018)
Chen, P.L., et al.: A linear ensemble of individual and blended models for music rating prediction. In: Proceedings of KDD Cup 2011, pp. 21–60 (2012)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)
Goldberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)
Gu, N., Feng, J., Sun, X., Zhao, Y., Zhang, L.: Chinese resume information automatic extraction and recommendation algorithm. Comput. Eng. Appl. 53, 141–148 (2017)
Jiang, Z., Zhang, C., Xiao, B., Lin, Z.: Research and implementation of intelligent Chinese resume parsing. In: 2009 WRI International Conference on Communications and Mobile Computing, vol. 3, pp. 588–593. IEEE (2009)
Kim, Y.: Convolutional neural networks for sentence classification (2014)
Li, Q., et al.: A survey on text classification: from shallow to deep learning. arXiv preprint arXiv:2008.00364 (2020)
Liaw, A., Wiener, M., et al.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Pham, T., Tao, X., Zhang, J., Yong, J.: Constructing a knowledge-based heterogeneous information graph for medical health status classification. Health Inf. Sci. Syst. 8 (2020). Article number: 10. https://doi.org/10.1007/s13755-020-0100-6
Pham, T., Tao, X., Zhang, J., Yong, J., Zhang, W., Cai, Y.: Mining heterogeneous information graph for health status classification. In: The 5th International Conference on Behavioral, Economic, and Socio-Cultural Computing (2018)
Rényi, A., et al.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. The Regents of the University of California (1961)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Sun, W., Trevor, B.: A stacking ensemble learning framework for annual river ice breakup dates. J. Hydrol. 561, 636–650 (2018)
Suykens, J.A.: Support vector machines: a nonlinear modelling and control perspective. Eur. J. Control. 7(2–3), 311–327 (2001)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008 (2017)
Xu, Q., Zhang, J., Zhu, Y., Li, B., Guan, D., Wang, X.: A block-level RNN model for resume block classification. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 5855–5857. IEEE (2020)
Acknowledgement
The authors would like to thank the support from Zhejiang Lab (111007-PI2001) and Zhejiang Provincial Natural Science Foundation (LZ21F030001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, Q. et al. (2021). An Effective Algorithm for Classification of Text with Weak Sequential Relationships. In: Strauss, C., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2021. Lecture Notes in Computer Science(), vol 12924. Springer, Cham. https://doi.org/10.1007/978-3-030-86475-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-86475-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86474-3
Online ISBN: 978-3-030-86475-0
eBook Packages: Computer ScienceComputer Science (R0)