Abstract
Over the past decade, diagnostic classification models (DCMs) have become an active area of psychometric research. Despite their use, the reliability of examinee estimates in DCM applications has seldom been reported. In this paper, a reliability measure for the categorical latent variables of DCMs is defined. Using theory-and simulation-based results, we show how DCMs uniformly provide greater examinee estimate reliability than IRT models for tests of the same length, a result that is a consequence of the smaller range of latent variable values examinee estimates can take in DCMs. We demonstrate this result by comparing DCM and IRT reliability for a series of models estimated with data from an end-of-grade test, culminating with a discussion of how DCMs can be used to change the character of large scale testing, either by shortening tests that measure examinees unidimensionally or by providing more reliable multidimensional measurement for tests of the same length.
Similar content being viewed by others
References
ACKERMAN, T. (2009), "Using Confirmatory MIRT Modeling to Provide Diagnostic Information in Large Scale Assessment”, paper presented at the April 2009 meeting of the National Council for Measurement in Education, San Diego CA.
AMERICAN EDUCATIONAL RESEARCH ASSOCIATION, AMERICAN PSYCHOLOGICAL ASSOCIATION, and NATIONAL COUNCIL ON MEASUREMENT IN EDUCATION (1999), Standards for Educational and Psychological Testing, Washington DC: Authors.
BIRNBAUM, A. (1968), “Some Latent Trait Models and Their Use in Inferring an Examinee’s Ability”, in Statistical Theories of Mental Test Scores, eds. F.M. Lord and M.R. Novick, Reading MA: Addison-Wesley, pp. 397–479.
DE AYALA, R.J. (2009), Theory and Practice of Item Response Theory, New York: Guilford.
HABERMAN, S.J., VON DAVIER, M., and LEE, Y.-H. (2008), “Comparison of Multidimensional Item Response Models: Multivariate Normal Ability Distributions Versus Multivariate Polytomous Ability Distributions”, Research Report 08–45, Princeton NJ: Educational Testing Service.
HAERTEL, E. (1989), “Using Restricted Latent Class Models to Map the Skill Structure of Achievement Items”, Journal of Educational Measurement, 26, 333–352.
HAMBLETON, R.K., SWAMINATHAN, H., and ROGERS, H.J. (1991), Fundamentals of Item Response Theory, Newbury Park CA: Sage.
HARTZ, S.M. (2002), A Bayesian Framework for The Unified Model for Assessing Cognitive Abilities: Blending Theory with Practicality, unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL.
HENSON, R., TEMPLIN, J., and WILLSE, J. (2009a), “Defining a Family of Cognitive Diagnosis Models Using Log-Linear Models with Latent Variables”, Psychometrika, 74, 191–210.
HENSON, R., TEMPLIN, J., and WILLSE, J. (2009b), “Ancillary Random Effects: A Way to Obtain Diagnostic Information from Existing Large Scale Tests”, paper presented at the April 2009 meeting of the National Council for Measurement in Education, San Diego CA.
JUNKER, B.W., and SIJTSMA, K. (2001), “Cognitive Assessment Models with Few Assumptions, and Connections with Nonparametric Item Response Theory”, Applied Psychological Measurement, 25, 258–272.
LEIGHTON, J.P., and GIERL, M.J. (Eds.) (2007), Cognitive Diagnostic Assessment for Education: Theory and Practices, Cambridge: Cambridge University Press.
LORD, F.M. (1980), Applications of Item Response Theory to Practical Testing Problems”, Hillsdale NJ: Erlbaum.
MACREADY, G.B., and DAYTON, C.M. (1977), “The Use of Probabilistic Models in the Assessment of Mastery”, Journal of Educational Statistics, 2, 99–120.
MARIS, E. (1999), “Estimating Multiple Classification Latent Class Models”, Psychometrika, 64, 197–212.
MAYDEU-OLIVARES, A., and JOE, H. (2005), “Limited- and Full-Information Estimation and Goodness-of-Fit Testing in 2n Contingency Tables: A Unified Framework”, Journal of the American Statistical Association, 100, 1009–1020.
MISLEVY, R.J., BEATON, A.E., KAPLAN, B., and SHEEHAN, K.M. (1992), “Estimating Population Characteristics from Sparse Matrix Samples of Item Responses”, Journal of Educational Measurement, 29, 133–161.
MUTHÉN, L.K., and MUTHÉN, B.O. (2010), “Mplus User’s Guide” (Version 5.21, Computer software and manual), Los Angeles CA: Muthén and Muthén.
ROUSSOS, L., DIBELLO, L., STOUT, W., HARTZ, S., HENSON, R., and TEMPLIN, J. (2007), “The Fusion Model Skills Diagnosis System”, in Cognitive Diagnostic Assessment in Education, eds. J. Leighton and M. Gierl, New York NY: Cambridge University Press, pp. 275–318.
RUPP, A., and TEMPLIN, J. (2008), “Unique Characteristics of Diagnostic Models: A Review of the Current State-of-the-Art”, Measurement, 6, 219–262.
RUPP, A., TEMPLIN, J., and HENSON, R. (2010), Diagnostic Measurement: Theory, Methods, and Applications, New York: Guilford.
SINHARAY, S., and HABERMAN, S. J. (2009), “How Much Can We Reliably Know About What Examinees Know?”, Measurement, 7, 49–53.
TEMPLIN, J. (2004), Generalized Linear Mixed Proficiency Models for Cognitive Diagnosis, unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL.
TEMPLIN, J. (2006), CDM User's Guide, Lawrence KS: University of Kansas.
TEMPLIN, J., and HENSON, R. (2006), “Measurement of Psychological Disorders Using Cognitive Diagnosis Models”, Psychological Methods, 11, 287–305.
VON DAVIER, M. (2005), “A General Diagnostic Model Applied to Language Testing Data”, ETS Research Report RR-05-16.
Author information
Authors and Affiliations
Corresponding author
Additional information
We would like to thank Terry Ackerman, Allan Cohen, Jeff Douglas, Robert Henson, John Poggio, and John Willse for their helpful comments and critiques of the concepts and text presented in this paper. Complete syntax for running all analyses herein and resulting program output are available at the first author’s website.
This research was funded by National Science Foundation grants DRL-0822064; SES-0750859; and SES-1030337. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Rights and permissions
About this article
Cite this article
Templin, J., Bradshaw, L. Measuring the Reliability of Diagnostic Classification Model Examinee Estimates. J Classif 30, 251–275 (2013). https://doi.org/10.1007/s00357-013-9129-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-013-9129-4