An Effective Algorithm for Classification of Text with Weak Sequential Relationships

Xu, Qiqiang; Zhang, Ji; Yu, Ting; Zhang, Wenbin; Zhang, Mingli; Luo, Yonglong; Chen, Fulong; Liu, Zhen

doi:10.1007/978-3-030-86475-0_28

Qiqiang Xu¹²,
Ji Zhang¹³,
Ting Yu¹⁴,
Wenbin Zhang¹⁵,
Mingli Zhang¹⁶,
Yonglong Luo¹⁷,
Fulong Chen¹⁷ &
…
Zhen Liu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12924))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

828 Accesses

Abstract

Text classification is a fundamental task that is widely used in various sub-domains of natural language processing, such as information extraction, semantic understanding, etc. For the general text classification problems, various deep learning models, such as Bi-LSTM, Transformer, BERT, etc. have been used which achieved good performance. In this paper, however, we consider a new problem on how to deal with a special scenario in text classification which has a weak sequential relationship among different classification entities. A typical example is in the block classification of resumes where there are sequential relationships existing amongst different blocks. By fully utilizing this useful sequential feature, we in this paper propose an effective hybrid model which combines a fully connected neural network model and a block-level recurrent neural network model with feature fusion that makes full use of such a sequential feature. The experimental results show that the average F1-score value of our model on three 1,400 real resume datasets is 5.5–11% higher than the existing mainstream algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chaib, S., Liu, H., Gu, Y., Yao, H.: Deep feature fusion for VHR remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 55(8), 4775–4784 (2017)
Article Google Scholar
Chakrabarty, N., Kundu, T., Dandapat, S., Sarkar, A., Kole, D.K.: Flight arrival delay prediction using gradient boosting classifier. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds.) Emerging Technologies in Data Mining and Information Security. AISC, vol. 813, pp. 651–659. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-1498-8_57
Chapter Google Scholar
Chen, J., Zhang, C., Niu, Z.: A two-step resume information extraction algorithm. Math. Probl. Eng. 2018 (2018)
Google Scholar
Chen, P.L., et al.: A linear ensemble of individual and blended models for music rating prediction. In: Proceedings of KDD Cup 2011, pp. 21–60 (2012)
Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)
Google Scholar
Goldberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)
Gu, N., Feng, J., Sun, X., Zhao, Y., Zhang, L.: Chinese resume information automatic extraction and recommendation algorithm. Comput. Eng. Appl. 53, 141–148 (2017)
Google Scholar
Jiang, Z., Zhang, C., Xiao, B., Lin, Z.: Research and implementation of intelligent Chinese resume parsing. In: 2009 WRI International Conference on Communications and Mobile Computing, vol. 3, pp. 588–593. IEEE (2009)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification (2014)
Google Scholar
Li, Q., et al.: A survey on text classification: from shallow to deep learning. arXiv preprint arXiv:2008.00364 (2020)
Liaw, A., Wiener, M., et al.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)
Google Scholar
Pham, T., Tao, X., Zhang, J., Yong, J.: Constructing a knowledge-based heterogeneous information graph for medical health status classification. Health Inf. Sci. Syst. 8 (2020). Article number: 10. https://doi.org/10.1007/s13755-020-0100-6
Pham, T., Tao, X., Zhang, J., Yong, J., Zhang, W., Cai, Y.: Mining heterogeneous information graph for health status classification. In: The 5th International Conference on Behavioral, Economic, and Socio-Cultural Computing (2018)
Google Scholar
Rényi, A., et al.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. The Regents of the University of California (1961)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Sun, W., Trevor, B.: A stacking ensemble learning framework for annual river ice breakup dates. J. Hydrol. 561, 636–650 (2018)
Article Google Scholar
Suykens, J.A.: Support vector machines: a nonlinear modelling and control perspective. Eur. J. Control. 7(2–3), 311–327 (2001)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008 (2017)
Google Scholar
Xu, Q., Zhang, J., Zhu, Y., Li, B., Guan, D., Wang, X.: A block-level RNN model for resume block classification. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 5855–5857. IEEE (2020)
Google Scholar

Download references

Acknowledgement

The authors would like to thank the support from Zhejiang Lab (111007-PI2001) and Zhejiang Provincial Natural Science Foundation (LZ21F030001).

Author information

Authors and Affiliations

Nanjing University of Aeronautics and Astronautics, Nanjing, China
Qiqiang Xu
University of Southern Queensland, Toowoomba, Australia
Ji Zhang
Zhejiang Lab, Hangzhou, China
Ting Yu
Carnegie Mellon University, Pittsburgh, USA
Wenbin Zhang
Montreal Neurological Institute, Mcgill University, Montreal, Canada
Mingli Zhang
Anhui Normal University, Wuhu, China
Yonglong Luo & Fulong Chen
Guangdong Pharmaceutical University, Guangzhou, China
Zhen Liu

Authors

Qiqiang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ji Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ting Yu
View author publications
You can also search for this author in PubMed Google Scholar
Wenbin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mingli Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yonglong Luo
View author publications
You can also search for this author in PubMed Google Scholar
Fulong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ji Zhang .

Editor information

Editors and Affiliations

University of Vienna, Vienna, Austria
Christine Strauss
Johannes Kepler University Linz, Linz, Oberösterreich, Austria
Gabriele Kotsis
Vienna University of Technology, Vienna, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, Q. et al. (2021). An Effective Algorithm for Classification of Text with Weak Sequential Relationships. In: Strauss, C., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2021. Lecture Notes in Computer Science(), vol 12924. Springer, Cham. https://doi.org/10.1007/978-3-030-86475-0_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-86475-0_28
Published: 01 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86474-3
Online ISBN: 978-3-030-86475-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics