Does Active Learning Reduce Human Coding?: A Systematic Comparison of Neural Network with nCoder

Choi, Jaeyoon; Ruis, Andrew R.; Cai, Zhiqiang; Eagan, Brendan; Shaffer, David Williamson

doi:10.1007/978-3-031-31726-2_3

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1785))

Included in the following conference series:

International Conference on Quantitative Ethnography

411 Accesses
3 Citations

Abstract

In quantitative ethnography (QE) studies which often involve large datasets that cannot be entirely hand-coded by human raters, researchers have used supervised machine learning approaches to develop automated classifiers. However, QE researchers are rightly concerned with the amount of human coding that may be required to develop classifiers that achieve the high levels of accuracy that QE studies typically require. In this study, we compare a neural network, a powerful traditional supervised learning approach, with nCoder, an active learning technique commonly used in QE studies, to determine which technique requires the least human coding to produce a sufficiently accurate classifier. To do this, we constructed multiple training sets from a large dataset used in prior QE studies and designed a Monte Carlo simulation to test the performance of the two techniques systematically. Our results show that nCoder can achieve high predictive accuracy with significantly less human-coded data than a neural network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arastoopour, G., et al.: Teaching and assessing engineering design thinking with virtual internships and epistemic network analysis. Int. J. Eng. Educ. 32(3), 1492–1501 (2016)
Google Scholar
Bakharia, A.: On the equivalence of inductive content analysis and topic modeling. In: Eagan, B., Misfeldt, M., Siebert-Evenstone, A. (eds.) ICQE 2019. CCIS, vol. 1112, pp. 291–298. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33232-7_25
Chapter Google Scholar
Baradwaj, B.K., Pal, S.: Mining educational data to analyze students’ performance. ArXiv Prepr. ArXiv12013417 (2012)
Google Scholar
Bull, L., et al.: Active learning for semi-supervised structural health monitoring. J. Sound Vib. 437, 373–388 (2018)
Article Google Scholar
Cai, Z., et al.: Neural recall network: A neural network solution to low recall problem in regex-based qualitative coding. In: Proceedings of the 15th International Conference on Educational Data Mining (2022)
Google Scholar
Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W.: Using topic modeling for code discovery in large scale text data. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2021. CCIS, vol. 1312, pp. 18–31. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_2
Chapter Google Scholar
Chesler, N.C., et al.: A novel paradigm for engineering education: virtual internships with individualized mentoring and assessment of engineering thinking. J. Biomech. Eng. 137, 2, 024701 (2015). https://doi.org/10.1115/1.4029235
Cho, J., et al.: How much data is needed to train a medical image deep learning system to achieve necessary high accuracy? ArXiv Prepr. ArXiv151106348 (2015)
Google Scholar
Eagan, B.R., et al.: Can We Rely on IRR? Testing the Assumptions of Inter-Rater Reliability, vol. 4 (2017)
Google Scholar
González-Carvajal, S., Garrido-Merchán, E.C.: Comparing BERT against traditional machine learning text classification (2021). http://arxiv.org/abs/2005.13012
Goudjil, M., Koudil, M., Bedda, M., Ghoggali, N.: a novel active learning method using SVM for text classification. Int. J. Autom. Comput. 15(3), 290–298 (2016). https://doi.org/10.1007/s11633-015-0912-z
Article Google Scholar
Hartmann, J., et al.: Comparing automated text classification methods. Int. J. Res. Mark. 36(1), 20–38 (2019)
Article Google Scholar
Harwell, M.R.: Summarizing Monte Carlo results in methodological research. J. Educ. Stat. 17(4), 297–313 (1992)
Article Google Scholar
Hernández-Blanco, A., et al.: A systematic review of deep learning approaches to educational data mining. Complexity 2019 (2019)
Google Scholar
Holton, J.A.: The coding process and its challenges. Sage Handb. Grounded Theory. 3, 265–289 (2007)
Article Google Scholar
Jelodar, H., et al.: Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools Appl. 78(11), 15169–15211 (2018). https://doi.org/10.1007/s11042-018-6894-4
Article Google Scholar
Khandkar, S.H.: Open coding. Univ. Calg. 23, 2009 (2009)
Google Scholar
Larson, S., Popov, V., Ali, A.M., Ramanathan, P., Jung, S.: Healthcare professionals’ perceptions of telehealth: analysis of tweets from pre- and during the COVID-19 pandemic. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2021. CCIS, vol. 1312, pp. 390–405. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_27
Chapter Google Scholar
Miles, M.B., Huberman, A.M.: Qualitative data analysis: an expanded sourcebook. Sage (1994)
Google Scholar
Ramezan, C.A., et al.: Effects of training set size on supervised machine-learning land-cover classification of large-area high-resolution remotely sensed data. Remote Sens. 13, 3, 368 (2021)
Google Scholar
Scott, C., Medaugh, M.: Axial coding. Int. Encycl. Commun. Res. Methods. 10, 9781118901731 (2017)
Google Scholar
Settles, B.: Active Learning Literature Survey 47
Google Scholar
Shaffer, D.W., Ruis, A.R.: How we code. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2021. CCIS, vol. 1312, pp. 62–77. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_5
Chapter Google Scholar
Yu, D., et al.: Active learning and semi-supervised learning for speech recognition: a unified framework using the global entropy reduction maximization criterion. Comput. Speech Lang. 24(3), 433–444 (2010). https://doi.org/10.1016/j.csl.2009.03.004
Article Google Scholar
Prodigy · An annotation tool for AI, Machine Learning & NLP. https://prodi.gy. Accessed 23 May 2022

Download references

Acknowledgements

This work was funded in part by the National Science Foundation (DRL-1661036, DRL-1713110, DRL-2100320), the Wisconsin Alumni Research Foundation, and the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison. The opinions, findings, and conclusions do not reflect the views of the funding agencies, cooperating institutions, or other individuals.

Author information

Authors and Affiliations

University of Wisconsin-Madison, Madison, WI, 53706, USA
Jaeyoon Choi, Andrew R. Ruis, Zhiqiang Cai, Brendan Eagan & David Williamson Shaffer

Authors

Jaeyoon Choi
View author publications
You can also search for this author in PubMed Google Scholar
Andrew R. Ruis
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Cai
View author publications
You can also search for this author in PubMed Google Scholar
Brendan Eagan
View author publications
You can also search for this author in PubMed Google Scholar
David Williamson Shaffer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jaeyoon Choi .

Editor information

Editors and Affiliations

University of Oslo, Oslo, Norway
Crina Damşa
Drexel University School of Education, Philadelphia, PA, USA
Amanda Barany

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Choi, J., Ruis, A.R., Cai, Z., Eagan, B., Shaffer, D.W. (2023). Does Active Learning Reduce Human Coding?: A Systematic Comparison of Neural Network with nCoder. In: Damşa, C., Barany, A. (eds) Advances in Quantitative Ethnography. ICQE 2022. Communications in Computer and Information Science, vol 1785. Springer, Cham. https://doi.org/10.1007/978-3-031-31726-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-31726-2_3
Published: 29 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31725-5
Online ISBN: 978-3-031-31726-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics