An Efficient Text Labeling Framework Using Active Learning Model

Tandra, Sulochana; Nautiyal, Akshay; Gupta, Deepa

doi:10.1007/978-981-15-3914-5_11

Sulochana Tandra²¹,
Akshay Nautiyal²¹ &
Deepa Gupta²¹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1148))

321 Accesses
2 Citations

Abstract

Electronic medical discharge summaries provide a wealth of information. Extracting useful structured information from such unstructured text is challenging. However, supervised machine learning (ML) algorithms can achieve good performance in extracting useful relations between different entities. To use supervised ML techniques, huge annotated datasets are required. Annotating manually is very expensive and time taking due to the requirement of domain experts for annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. Active learning leverages the advantage of training the classifier with a limited number of samples but achieving maximum performance. This strategy not only saves time but also decreases the annotation cost involved. Active learning works well with datasets where annotation cost is high, and training a decent classifier with the available annotated dataset is a requirement. The key factor for an active learning model’s success is its selection of samples that needs annotation. The more informative the samples are, the less time it takes to train the supervised model with high accuracy. Thus, the query strategy in sample selection plays a vital role in the AL process. In this study, we aim to develop a novel query strategy to select the most informative samples from the dataset that can eventually accelerate the supervised model’s performance. The query strategy is designed using deep reinforcement learning techniques like actor-critic. The performance of the sample selection strategy is determined by finding the accuracy of the model after a predefined number of iterations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yadav, R., Gupta, D.: Annotation guidelines for hindi-english word alignment. In: Proceedings of the International Conference on Asian Language Processing, pp. 293–296 (2010)
Google Scholar
Sanagar, S., Gupta, D.: Roadmap for polarity lexicon learning and resources: a survey. In: International Symposium on Intelligent Systems Technologies and Applications, pp. 647–663 (2016)
Google Scholar
Dligach, S., Palmer, M.: Good seed makes a good crop: accelerating active learning using language modeling. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: short papers, pp. 6–10 (2011)
Google Scholar
Chairi, I., Alaoui, S., Lyhyaouier, A.: Sample selection based active learning for imbalanced data. In: Tenth International Conference on Signal-Image Technology & Internet-Based Systems (2014)
Google Scholar
Vu, V.-V., Labroche, N.: Active seed selection for constrained clustering. In: Intelligent Data Analysis. IOS Press, pp. 537–552 (2017)
Google Scholar
Xu, Y., Hong, K., Tsujii, J., Chang, E.I.-C.: Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. J. Am. Med. Inf. Assoc. JAMIA 195, 824–832 (2012)
Article Google Scholar
Siddhant, A., Lipton, Z.: Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study. ArXIV (2019)
Google Scholar
Fang, M., Li, Y., Cohn, T.: Learning how to Active Learn: A Deep Reinforcement Learning Approach. ArXIV (2017)
Google Scholar
Narasimhan, K., Yala, A., Barzilay, R.: Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning. ArXIV (2016)
Google Scholar
Chalapathy, R., Borzeshi, E.Z., Piccardi, M.: Bidirectional LSTM-CRF for Clinical Concept Extraction. ArXIV (2016)
Google Scholar
Zhu, H., Paschalidis, I.C., Tahmasebi, A.: Clinical Concept Extraction with Contextual Word Embedding. ArXIV (2018)
Google Scholar
Unanue, I.J., Borzeshi, E.Z., Piccardi, M.: Recurrent Neural Networks with Specialized Word Embeddings for Health-Domain Named-Entity Recognition. ArXIV (2018)
Google Scholar
Ling, Y., Hasan, S.A., Datla, V., Qadir, A., Lee, K., Liu, J., Farri, O.: Learning to diagnose: assimilating clinical narratives using deep reinforcement learning. In: Proceedings of the 8th International Joint Conference on Natural Language Processing, pp. 895–905 (2017)
Google Scholar
Millan, C., Fernandes, B., Cruz, F.: Human feedback in continuous actor-critic reinforcement learning. In: Proceedings of the 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 661–666 (2019)
Google Scholar
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: 32nd International Conference on Machine Learning, vol. 2, pp. 957–966 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India
Sulochana Tandra, Akshay Nautiyal & Deepa Gupta

Authors

Sulochana Tandra
View author publications
You can also search for this author in PubMed Google Scholar
Akshay Nautiyal
View author publications
You can also search for this author in PubMed Google Scholar
Deepa Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sulochana Tandra .

Editor information

Editors and Affiliations

School of CS/IT, Indian Institute of Information Technology and Management-Kerala (IIITM-K), Technopark Campus, Trivandrum, Kerala, India
Sabu M. Thampi
School of Engineering Science, Simon Fraser University, Burnaby, BC, Canada
Ljiljana Trajkovic
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Sushmita Mitra
Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
P. Nagabhushan
College of Computer Sciences and Engineering, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy
Engineering Academy of Serbia, University of Belgrade, Belgrade, Serbia
Zoran Bojkovic
Indian Institute of Space Science and Technology, Trivandrum, Kerala, India
Deepak Mishra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tandra, S., Nautiyal, A., Gupta, D. (2020). An Efficient Text Labeling Framework Using Active Learning Model. In: Thampi, S., et al. Intelligent Systems, Technologies and Applications. Advances in Intelligent Systems and Computing, vol 1148. Springer, Singapore. https://doi.org/10.1007/978-981-15-3914-5_11

Download citation

DOI: https://doi.org/10.1007/978-981-15-3914-5_11
Published: 06 May 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3913-8
Online ISBN: 978-981-15-3914-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics