Abstract
In quantitative ethnography (QE) studies which often involve large datasets that cannot be entirely hand-coded by human raters, researchers have used supervised machine learning approaches to develop automated classifiers. However, QE researchers are rightly concerned with the amount of human coding that may be required to develop classifiers that achieve the high levels of accuracy that QE studies typically require. In this study, we compare a neural network, a powerful traditional supervised learning approach, with nCoder, an active learning technique commonly used in QE studies, to determine which technique requires the least human coding to produce a sufficiently accurate classifier. To do this, we constructed multiple training sets from a large dataset used in prior QE studies and designed a Monte Carlo simulation to test the performance of the two techniques systematically. Our results show that nCoder can achieve high predictive accuracy with significantly less human-coded data than a neural network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arastoopour, G., et al.: Teaching and assessing engineering design thinking with virtual internships and epistemic network analysis. Int. J. Eng. Educ. 32(3), 1492–1501 (2016)
Bakharia, A.: On the equivalence of inductive content analysis and topic modeling. In: Eagan, B., Misfeldt, M., Siebert-Evenstone, A. (eds.) ICQE 2019. CCIS, vol. 1112, pp. 291–298. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33232-7_25
Baradwaj, B.K., Pal, S.: Mining educational data to analyze students’ performance. ArXiv Prepr. ArXiv12013417 (2012)
Bull, L., et al.: Active learning for semi-supervised structural health monitoring. J. Sound Vib. 437, 373–388 (2018)
Cai, Z., et al.: Neural recall network: A neural network solution to low recall problem in regex-based qualitative coding. In: Proceedings of the 15th International Conference on Educational Data Mining (2022)
Cai, Z., Siebert-Evenstone, A., Eagan, B., Shaffer, D.W.: Using topic modeling for code discovery in large scale text data. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2021. CCIS, vol. 1312, pp. 18–31. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_2
Chesler, N.C., et al.: A novel paradigm for engineering education: virtual internships with individualized mentoring and assessment of engineering thinking. J. Biomech. Eng. 137, 2, 024701 (2015). https://doi.org/10.1115/1.4029235
Cho, J., et al.: How much data is needed to train a medical image deep learning system to achieve necessary high accuracy? ArXiv Prepr. ArXiv151106348 (2015)
Eagan, B.R., et al.: Can We Rely on IRR? Testing the Assumptions of Inter-Rater Reliability, vol. 4 (2017)
González-Carvajal, S., Garrido-Merchán, E.C.: Comparing BERT against traditional machine learning text classification (2021). http://arxiv.org/abs/2005.13012
Goudjil, M., Koudil, M., Bedda, M., Ghoggali, N.: a novel active learning method using SVM for text classification. Int. J. Autom. Comput. 15(3), 290–298 (2016). https://doi.org/10.1007/s11633-015-0912-z
Hartmann, J., et al.: Comparing automated text classification methods. Int. J. Res. Mark. 36(1), 20–38 (2019)
Harwell, M.R.: Summarizing Monte Carlo results in methodological research. J. Educ. Stat. 17(4), 297–313 (1992)
Hernández-Blanco, A., et al.: A systematic review of deep learning approaches to educational data mining. Complexity 2019 (2019)
Holton, J.A.: The coding process and its challenges. Sage Handb. Grounded Theory. 3, 265–289 (2007)
Jelodar, H., et al.: Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools Appl. 78(11), 15169–15211 (2018). https://doi.org/10.1007/s11042-018-6894-4
Khandkar, S.H.: Open coding. Univ. Calg. 23, 2009 (2009)
Larson, S., Popov, V., Ali, A.M., Ramanathan, P., Jung, S.: Healthcare professionals’ perceptions of telehealth: analysis of tweets from pre- and during the COVID-19 pandemic. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2021. CCIS, vol. 1312, pp. 390–405. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_27
Miles, M.B., Huberman, A.M.: Qualitative data analysis: an expanded sourcebook. Sage (1994)
Ramezan, C.A., et al.: Effects of training set size on supervised machine-learning land-cover classification of large-area high-resolution remotely sensed data. Remote Sens. 13, 3, 368 (2021)
Scott, C., Medaugh, M.: Axial coding. Int. Encycl. Commun. Res. Methods. 10, 9781118901731 (2017)
Settles, B.: Active Learning Literature Survey 47
Shaffer, D.W., Ruis, A.R.: How we code. In: Ruis, A.R., Lee, S.B. (eds.) ICQE 2021. CCIS, vol. 1312, pp. 62–77. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67788-6_5
Yu, D., et al.: Active learning and semi-supervised learning for speech recognition: a unified framework using the global entropy reduction maximization criterion. Comput. Speech Lang. 24(3), 433–444 (2010). https://doi.org/10.1016/j.csl.2009.03.004
Prodigy · An annotation tool for AI, Machine Learning & NLP. https://prodi.gy. Accessed 23 May 2022
Acknowledgements
This work was funded in part by the National Science Foundation (DRL-1661036, DRL-1713110, DRL-2100320), the Wisconsin Alumni Research Foundation, and the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin-Madison. The opinions, findings, and conclusions do not reflect the views of the funding agencies, cooperating institutions, or other individuals.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Choi, J., Ruis, A.R., Cai, Z., Eagan, B., Shaffer, D.W. (2023). Does Active Learning Reduce Human Coding?: A Systematic Comparison of Neural Network with nCoder. In: DamÅŸa, C., Barany, A. (eds) Advances in Quantitative Ethnography. ICQE 2022. Communications in Computer and Information Science, vol 1785. Springer, Cham. https://doi.org/10.1007/978-3-031-31726-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-31726-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31725-5
Online ISBN: 978-3-031-31726-2
eBook Packages: Computer ScienceComputer Science (R0)