Abstract
Educators, institutions, and certification agencies often want to know if students are being evaluated appropriately and completely with regard to a standard. To help educators understand if examinations are well-balanced or topically correct, we explore the challenge of classifying exam questions into a concept hierarchy.
While the general problems of text-classification and retrieval are quite commonly studied, our domain is particularly unusual because the concept hierarchy is expert-built but without actually having the benefit of being a well-used knowledge-base.
We propose a variety of approaches to this “small-scale” Information Retrieval challenge. We use an external corpus of Q&A data for expansion of concepts, and propose a model of using the hierarchy information effectively in conjunction with existing retrieval models. This new approach is more effective than typical unsupervised approaches and more robust to limited training data than commonly used text-classification or machine learning methods.
In keeping with the goal of providing a service to educators for better understanding their exams, we also explore interactive methods, focusing on low-cost relevance feedback signals within the concept hierarchy to provide further gains in accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Even most standardized tests require test-takers to sign agreements not to distribute or mention the questions, even after the exam is taken.
- 2.
User Fiire; http://chemistry.stackexchange.com/questions/4250. This example displayed in lieu of the proprietary ACS data.
- 3.
- 4.
The beta version of chemistry.stackexchange.com.
References
Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 163–222. Springer, New York (2012)
Balog, K., Azzopardi, L., de Rijke, M.: A language modeling framework for expert finding. Inf. Process. Manage. 45(1), 1–19 (2009)
Banerjee, S., Ramanathan, K., Gupta, A.: Clustering short texts using wikipedia. In: SIGIR 2007, New York, NY, USA. ACM (2007)
Bekkerman, R., Raghavan, H., Allan, J., Eguchi, K.: Interactive clustering of text collections according to a user-specified criterion. In: Proceedings of IJCAI, pp. 684–689 (2007)
de Melo, G., Weikum, G.: Taxonomic data integration from multilingual wikipedia editions. Knowl. Inf. Syst. 39(1), 1–39 (2014)
Dumais, S., Chen, H.: Hierarchical classification of web content. In: SIGIR 2000, pp. 256–263. ACM, New York, NY, USA (2000)
Efron, M., Organisciak, P., Fenlon, K.: Improving retrieval of short texts through document expansion. In: SIGIR 2012, pp. 911–920. ACM (2012)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7, 1606–1611 (2007)
Ganesan, P., Garcia-Molina, H., Widom, J.: Exploiting hierarchical domain structure to compute similarity. ACM Trans. Inf. Syst. 21(1), 64–93 (2003)
Hoi, S.C., Jin, R., Lyu, M.R.: Large-scale text categorization by batch mode active learning. In: WWW 2006, pp. 633–642. ACM (2006)
Holme, T.: Comparing recent organizing templates for test content between ACS exams in general chemistry and AP chemistry. J. Chem. Edu. 91(9), 1352–1356 (2014)
Holme, T., Murphy, K.: The ACS exams institute undergraduate chemistry anchoring concepts content Map I: general chemistry. J. Chem. Edu. 89(6), 721–723 (2012)
Järvelin, K., Kekäläinen, J.: IR evaluation methods for retrieving highly relevant documents. In: SIGIR 2000, pp. 41–48. ACM (2000)
Klimt, B., Yang, Y.: Introducing the enron corpus. In: CEAS (2004)
Lee, J.H., Kim, M.H., Lee, Y.J.: Information retrieval based on conceptual distance in IS-A hierarchies. J. Doc. 49(2), 188–207 (1993)
Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: SIGIR 2004, pp. 186–193. ACM, New York, NY, USA (2004)
Luxford, C.J., Linenberger, K.J., Raker, J.R., Baluyut, J.Y., Reed, J.J., De Silva, C., Holme, T.A.: Building a database for the historical analysis of the general chemistry curriculum using ACS general chemistry exams as artifacts. J. Chem. Edu. 92, 230–236 (2014)
Metzler, D., Croft, W.: Analysis of statistical question classification for fact-based questions. Inf. Retr. 8(3), 481–504 (2005)
Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: SIGIR 2005, pp. 472–479. ACM (2005)
Omar, N., Haris, S.S., Hassan, R., Arshad, H., Rahmat, M., Zainal, N.F.A., Zulkifli, R.: Automated analysis of exam questions according to bloom’s taxonomy. Procedia - Soc. Behav. Sci. 59, 297–303 (2012)
Petkova, D., Croft, W.B.: Hierarchical language models for expert finding in enterprise corpora. Int. J. Artif. Intell. Tools 17(01), 5–18 (2008)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998, pp. 275–281. ACM (1998)
Ren, Z., Peetz, M.-H., Liang, S., van Dolen, W., de Rijke, M.: Hierarchical multi-label classification of social text streams. In: SIGIR 2014, pp. 213–222. ACM, New York, NY, USA (2014)
Settles, B.: Active learning literature survey. Technical report, University of Wisconsin-Madison, Computer Sciences Technical report 1648, January 2010
Singhal, A., Pereira, F.: Document expansion for speech retrieval. In: SIGIR 1999, pp. 34–41. ACM (1999)
Sun, X., Wang, H., Yu, Y.: Towards effective short text deep classification. In: SIGIR 2011, pp. 1143–1144. ACM, New York, NY, USA (2011)
Tao, T., Wang, X., Mei, Q., Zhai, C.: Language model information retrieval with document expansion. In: NAACL 2006, pp. 407–414. ACL (2006)
Wang, P., Hu, J., Zeng, H.-J., Chen, Z.: Using wikipedia knowledge to improve text classification. Knowl. Inf. Syst. 19(3), 265–281 (2009)
Xue, G.-R., Xing, D., Yang, Q., Yu, Y.: Deep classification in large-scale text hierarchies. In: SIGIR 2008, pp. 619–626. ACM, New York, NY (2008)
Zhang, D., Lee, W.S.: Question classification using support vector machines. In: SIGIR 2003, pp. 26–32. ACM, New York, NY, USA (2003)
Acknowledgments
The authors thank Prof. Thomas Holme of Iowa State University’s Department of Chemistry for making the data used in this study available and Stephen Battisti of UMass’ Center for Educational Software Development for help accessing and formatting the data.
This work was supported in part by the Center for Intelligent Information Retrieval and in part by NSF grant numbers IIS-0910884 and DUE-1323469. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the sponsors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Foley, J., Allan, J. (2016). Retrieving Hierarchical Syllabus Items for Exam Question Analysis. In: Ferro, N., et al. Advances in Information Retrieval. ECIR 2016. Lecture Notes in Computer Science(), vol 9626. Springer, Cham. https://doi.org/10.1007/978-3-319-30671-1_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-30671-1_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30670-4
Online ISBN: 978-3-319-30671-1
eBook Packages: Computer ScienceComputer Science (R0)