Skip to main content

Automated Grading of Exam Responses: An Extensive Classification Benchmark

  • Conference paper
  • First Online:
Discovery Science (DS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12986))

Included in the following conference series:

Abstract

Automated grading of free-text exam responses is a very challenging task due to the complex nature of the problem, such as lack of training data and biased ground-truth of the graders. In this paper, we focus on the automated grading of free-text responses. We formulate the problem as a binary classification problem of two class labels: low- and high-grade. We present a benchmark on four machine learning methods using three experiment protocols on two real-world datasets, one from Cyber-crime exams in Arabic and one from Data Mining exams in English that is presented first time in this work. By providing various metrics for binary classification and answer ranking, we illustrate the benefits and drawbacks of the benchmarked methods. Our results suggest that standard models with individual word representations can in some cases achieve competitive predictive performance against deep neural language models using context-based representations on both binary classification and answer ranking for free-text response grading tasks. Lastly, we discuss the pedagogical implications of our findings by identifying potential pitfalls and challenges when building predictive models for such tasks.

Supported by the AutoGrade project of Stockholm University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use TfidfVectorizer for feature extraction with all parameters set to default.

  2. 2.

    https://commoncrawl.org/.

  3. 3.

    The size of flatten restricts us from running the models for more repetitions.

  4. 4.

    https://github.com/dsv-data-science/autograde_DS2021.

References

  1. Anderson, R.C., Biddle, W.B.: On asking people questions about what they are reading. In: Psychology of Learning and Motivation, vol. 9, pp. 89–132. Elsevier (1975)

    Google Scholar 

  2. Basu, S., Jacobs, C., Vanderwende, L.: Powergrading: a clustering approach to amplify human effort for short answer grading. TACL 1, 391–402 (2013)

    Article  Google Scholar 

  3. Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. IJAIED 25(1), 60–117 (2014). https://doi.org/10.1007/s40593-014-0026-8

    Article  Google Scholar 

  4. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: ACL, pp. 8440–8451 (2020)

    Google Scholar 

  5. Horbach, A., Pinkal, M.: Semi-supervised clustering for short answer scoring. In: International Conference on Language Resources and Evaluation (2018)

    Google Scholar 

  6. Karpicke, J., Roediger, H.: The critical importance of retrieval for learning. Science 319, 966–968 (2008)

    Article  Google Scholar 

  7. Kim, S.W., Gil, J.M.: Research paper classification systems based on TF-IDF and LDA schemes. Hum. Cent. Comput. Inf. Sci. 9, 30 (2019). https://doi.org/10.1186/s13673-019-0192-7

    Article  Google Scholar 

  8. Kudo, T.: Subword regularization: improving neural network translation models with multiple subword candidates (2018)

    Google Scholar 

  9. Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing (2018)

    Google Scholar 

  10. Kumar, S., Chakrabarti, S., Roy, S.: Earth mover’s distance pooling over Siamese LSTMs for automatic short answer grading. In: IJCAI, pp. 2046–2052 (2017)

    Google Scholar 

  11. Lample, G., Conneau, A.: Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291 (2019)

  12. Leacock, C., Chodorow, M.: C-rater: Automated scoring of short-answer questions. Comput. Humanit. 37(4), 389–405 (2003)

    Article  Google Scholar 

  13. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  14. McDaniel, M., Anderson, J.L., Derbish, M.H., Morrisette, N.: Testing the testing effect in the classroom. Eur. J. Cogn. Psychol. 19, 494–513 (2007)

    Article  Google Scholar 

  15. Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: ACL, pp. 752–762 (2011)

    Google Scholar 

  16. Nandini, V., Uma Maheswari, P.: Automatic assessment of descriptive answers in online examination system using semantic relational features. J. Supercomput. 76(6), 4430–4448 (2018). https://doi.org/10.1007/s11227-018-2381-y

    Article  Google Scholar 

  17. Ouahrani, L., Bennouar, D.: AR-ASAG an Arabic dataset for automatic short answer grading evaluation. In: LREC, pp. 2634–2643 (2020)

    Google Scholar 

  18. Padó, U.: Get semantic with me! The usefulness of different feature types for short-answer grading. In: COLING, pp. 2186–2195 (2016)

    Google Scholar 

  19. Pavlopoulos, J., Malakasiotis, P., Androutsopoulos, I.: Deep learning for user comment moderation. In: WOAH, pp. 25–35. ACL (2017)

    Google Scholar 

  20. Pedersen, T., Patwardhan, S., Michelizzi, J., et al.: WordNet: similarity-measuring the relatedness of concepts. In: AAAI, vol. 4, pp. 25–29 (2004)

    Google Scholar 

  21. Picard, R.R., Cook, R.D.: Cross-validation of regression models. J. Am. Stat. Assoc. 79, 575–583 (1984)

    Article  MathSciNet  Google Scholar 

  22. Rodrigues, F., Oliveira, P.: A system for formative assessment and monitoring of students’ progress. Comput. Educ. 76, 30–41 (2014)

    Article  Google Scholar 

  23. Saha, S., Dhamecha, T.I., Marvaniya, S., Sindhgatta, R., Sengupta, B.: Sentence level or token level features for automatic short answer grading?: use both. In: AIED, pp. 503–517 (2018)

    Google Scholar 

  24. Sung, C., Dhamecha, T.I., Mukhi, N.: Improving short answer grading using transformer-based pre-training. In: AIED, pp. 469–481 (2019)

    Google Scholar 

  25. Süzen, N., Gorban, A.N., Levesley, J., Mirkes, E.M.: Automatic short answer grading and feedback using text mining methods. Procedia Comput. Sci. 169, 726–743 (2020)

    Article  Google Scholar 

  26. Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 6000–6010 (2017)

    Google Scholar 

  27. Williamson, D., Xi, X., Breyer, F.: A framework for evaluation and use of automated scoring. Educa. Meas. Issues Pract. 31, 2–13 (2012)

    Article  Google Scholar 

  28. Willis, A.: Using NLP to support scalable assessment of short free text responses. In: BEA, pp. 243–253 (2015)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the AutoGrade project (https://datascience.dsv.su.se/projects/autograding.html) of the Dept. of Computer and Systems Sciences at Stockholm University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jimmy Ljungman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ljungman, J. et al. (2021). Automated Grading of Exam Responses: An Extensive Classification Benchmark. In: Soares, C., Torgo, L. (eds) Discovery Science. DS 2021. Lecture Notes in Computer Science(), vol 12986. Springer, Cham. https://doi.org/10.1007/978-3-030-88942-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88942-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88941-8

  • Online ISBN: 978-3-030-88942-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics