Skip to main content

An Efficient Text Labeling Framework Using Active Learning Model

  • Conference paper
  • First Online:
Intelligent Systems, Technologies and Applications

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1148))

Abstract

Electronic medical discharge summaries provide a wealth of information. Extracting useful structured information from such unstructured text is challenging. However, supervised machine learning (ML) algorithms can achieve good performance in extracting useful relations between different entities. To use supervised ML techniques, huge annotated datasets are required. Annotating manually is very expensive and time taking due to the requirement of domain experts for annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. Active learning leverages the advantage of training the classifier with a limited number of samples but achieving maximum performance. This strategy not only saves time but also decreases the annotation cost involved. Active learning works well with datasets where annotation cost is high, and training a decent classifier with the available annotated dataset is a requirement. The key factor for an active learning model’s success is its selection of samples that needs annotation. The more informative the samples are, the less time it takes to train the supervised model with high accuracy. Thus, the query strategy in sample selection plays a vital role in the AL process. In this study, we aim to develop a novel query strategy to select the most informative samples from the dataset that can eventually accelerate the supervised model’s performance. The query strategy is designed using deep reinforcement learning techniques like actor-critic. The performance of the sample selection strategy is determined by finding the accuracy of the model after a predefined number of iterations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yadav, R., Gupta, D.: Annotation guidelines for hindi-english word alignment. In: Proceedings of the International Conference on Asian Language Processing, pp. 293–296 (2010)

    Google Scholar 

  2. Sanagar, S., Gupta, D.: Roadmap for polarity lexicon learning and resources: a survey. In: International Symposium on Intelligent Systems Technologies and Applications, pp. 647–663 (2016)

    Google Scholar 

  3. Dligach, S., Palmer, M.: Good seed makes a good crop: accelerating active learning using language modeling. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: short papers, pp. 6–10 (2011)

    Google Scholar 

  4. Chairi, I., Alaoui, S., Lyhyaouier, A.: Sample selection based active learning for imbalanced data. In: Tenth International Conference on Signal-Image Technology & Internet-Based Systems (2014)

    Google Scholar 

  5. Vu, V.-V., Labroche, N.: Active seed selection for constrained clustering. In: Intelligent Data Analysis. IOS Press, pp. 537–552 (2017)

    Google Scholar 

  6. Xu, Y., Hong, K., Tsujii, J., Chang, E.I.-C.: Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. J. Am. Med. Inf. Assoc. JAMIA 195, 824–832 (2012)

    Article  Google Scholar 

  7. Siddhant, A., Lipton, Z.: Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study. ArXIV (2019)

    Google Scholar 

  8. Fang, M., Li, Y., Cohn, T.: Learning how to Active Learn: A Deep Reinforcement Learning Approach. ArXIV (2017)

    Google Scholar 

  9. Narasimhan, K., Yala, A., Barzilay, R.: Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning. ArXIV (2016)

    Google Scholar 

  10. Chalapathy, R., Borzeshi, E.Z., Piccardi, M.: Bidirectional LSTM-CRF for Clinical Concept Extraction. ArXIV (2016)

    Google Scholar 

  11. Zhu, H., Paschalidis, I.C., Tahmasebi, A.: Clinical Concept Extraction with Contextual Word Embedding. ArXIV (2018)

    Google Scholar 

  12. Unanue, I.J., Borzeshi, E.Z., Piccardi, M.: Recurrent Neural Networks with Specialized Word Embeddings for Health-Domain Named-Entity Recognition. ArXIV (2018)

    Google Scholar 

  13. Ling, Y., Hasan, S.A., Datla, V., Qadir, A., Lee, K., Liu, J., Farri, O.: Learning to diagnose: assimilating clinical narratives using deep reinforcement learning. In: Proceedings of the 8th International Joint Conference on Natural Language Processing, pp. 895–905 (2017)

    Google Scholar 

  14. Millan, C., Fernandes, B., Cruz, F.: Human feedback in continuous actor-critic reinforcement learning. In: Proceedings of the 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 661–666 (2019)

    Google Scholar 

  15. Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: 32nd International Conference on Machine Learning, vol. 2, pp. 957–966 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sulochana Tandra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tandra, S., Nautiyal, A., Gupta, D. (2020). An Efficient Text Labeling Framework Using Active Learning Model. In: Thampi, S., et al. Intelligent Systems, Technologies and Applications. Advances in Intelligent Systems and Computing, vol 1148. Springer, Singapore. https://doi.org/10.1007/978-981-15-3914-5_11

Download citation

Publish with us

Policies and ethics