Abstract
This study reports a preliminary investigation into the construct validity of an analytic rating scale developed for a school-based English speaking test. Informed by the theory of interpretative validity argument, this study examined the plausibility and accuracy of three warrants which were deemed essential to the construct validity of the rating scale. Methodologically, this study utilized Many-Facets Rasch Model (MFRM) and Structural Equation Modeling (SEM) in conjunction to examine the three warrants and their respective rebuttals. Though MFRM analysis largely supported the first two warrants, the results indicated that the category structure of the rating scale did not function as intended, and hence needed further revisions. In SEM analysis, multitrait multimethod (MTMM) confirmatory factor analysis (CFA) model was employed, whereby four MTMM models were specified, evaluated, and compared. The results lent support to the third warrant, but raised legitimate concerns over common method bias. The study has implications for the future revisions of the rating scale and the speaking assessment in the interest of improved validity. Meanwhile, this study has methodological implications for performance assessment constructors and rating scale validators.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
CFI: Comparative Fit Index; GFI: Goodness of Fit Index; SRMR: Standardized Root Mean Residual; RMSEA: Root Mean Square Error of Approximation.
- 2.
The numbers in brackets are indicative of acceptable goodness of fit between the model and the empirical data.
- 3.
Typical annual undergraduate enrollment at FDU is around 3000.
References
Adams, R. J., Wilson, M. R., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–24.
Bachman, L. F., Lynch, B. K., & Mason, M. (1995). Investigating variability in tasks and rater judgements in a performance test of foreign language speaking. Language Testing, 12(2), 238–257.
Bachman, L. F., & Palmer, A. S. (1996). Language assessment in practice: Designing and developing useful language tests. Oxford: Oxford University Press.
Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford: Oxford University Press.
Batty, A. O. (2015). A comparison of video-and audio-mediated listening tests with Many-Facets Rasch modeling and differential distractor functioning. Language Testing, 32(1), 3–20.
Bentler, P. M., & Wu, E. J. (2005). EQS 6.1 for Windows. Encino, CA: Multivariate Software.
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences: New York: Routledge.
Byrne, B. M. (2006). Structural equation modeling with EQS: Basic concepts, applications, and programming (2nd ed.). Mahwah, New Jersey: Psychology Press.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81–105.
Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2008). Building a validity argument for the Test of English as a Foreign Language. New York and London: Routledge, Taylor & Francis Group.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255.
Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facets Rasch analysis. Language Assessment Quarterly: An International Journal, 2(3), 197–221.
Eckes, T. (2011). Introduction to many-facets Rasch measurement. Frankfurt: Peter Lang.
Fan, J. (2014). Chinese test takers’ attitudes towards the Versant English Test: A mixed-methods approach. Language Testing in Asia, 4(6), 1–17.
Fan, J., & Ji, P. (2013). Exploring the validity of the Fudan English Test (FET): Test data analysis. Foreign Language Testing and Teaching, 3(2), 45–53.
Fan, J., & Ji, P. (2014). Test candidates’ attitudes and their test performance: The case of the Fudan English Test. University of Sydney Papers in TESOL, 9, 1–35.
Fan, J., Ji, P., & Song, X. (2014a). Washback of university-based English language tests on students’ learning: A case study. The Asian Journal of Applied Linguistics, 1(2), 178–192.
Fan, J., Ji, P., & Yu, L. (2014b). Another perspective on language test validation: The factor structure of language tests. Theory and Practice in Foreign Language Teaching, 4, 34–40.
FDU Testing Team. (2014). The FET Test Syllabus. Shanghai: Fudan University Press.
Fulcher, G. (1996). Does thick description lead to smart tests? A data-based approach to rating scale construction. Language Testing, 13(2), 208–238.
Gu, L. (2014). At the interface between language testing and second language acquisition: Language ability and context of learning. Language Testing, 31(1), 111–133.
Han, B., Dan, M., & Yang, L. (2004). Problems with College English Test as emerged from a survey. Foreign Languages and Their Teaching, 179(2), 17–23.
In’nami, Y., & Koizumi, R. (2012). Factor structure of the revised TOEFL test: A multi-sample analysis. Language Testing, 29(1), 131–152.
In’nami, Y., & Koizumi, R. (2011). Structural equation modeling in language testing and learning research: A review. Language Assessment Quarterly, 8(3), 250–276.
Kane, M. T. (2012). Validating score interpretations and uses. Language Testing, 29(1), 3–17.
Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.). New York: Guilford Press.
Knoch, U. (2011). Rating scales for diagnostic assessment of writing: What should they look like and where should the criteria come from? Assessing Writing, 16(2), 81–96.
Kondo-Brown, K. (2002). A FACET analysis of rater bias in measuring Japanese second language writing performance. Language Testing, 19, 3–31.
Kunnan, A. J. (1995). Test taker characteristics and test performance: A structural modeling approach Cambridge: Cambridge University Press.
Kunnan, A. J. (1998). An introduction to structural equation modeling for language assessment research. Language Testing, 15(3), 295–332.
Linacre, M. (2013). A user’s guide to FACETS (3.71.0). Chicago: MESA Press.
Linacre, M. (2004). Optimal rating scale category effectiveness. In E. V. Smith & R. M. Smith (Eds.), Introduction to Rasch measurement (pp. 258–278). Maple Grove, MN: JAM Press.
Llosa, L. (2007). Validating a standards-based classroom assessment of English proficiency: A multitrait-multimethod approach. Language Testing, 24(4), 489–515.
Lumley, T. (2002). Assessment criteria in a large-scale writing test: What do they really mean to the raters? Language Testing, 19(3), 246–276.
Luoma, S. (2004). Assessing speaking. Cambridge: Cambridge University Press.
Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and many-facets Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158–180.
McNamara, T. (1996). Measuring second language proficiency. London: Longman.
McNamara, T., & Knoch, U. (2012). The Rasch wars: The emergence of Rasch measurement in language testing. Language Testing, 29(4), 553–574.
North, B. (2000). The development of common framework scale of language proficiency. New York: Peter Lang.
North, B., & Jones, N. (2009). Further material on maintaining standards across languages, contexts and administrations by exploiting teacher judgment and IRT scaling. Strasbourg: Language Policy Division.
Ockey, G. J., & Choi, I. (2015). Structural equation modeling reporting practices for language assessment. Language Assessment Quarterly, 12(3), 305–319.
Oon, P. T., & Subramaniam, R. (2011). Rasch modelling of a scale that explores the take-up of Physics among school students from the perspective of teachers. In R. F. Cavanaugh & R. F. Waugh (Eds.), Applications of Rasch measurement in learning environments research (pp. 119–139). Netherlands: Sense Publishers.
Purpura, J. E. (1999). Learner strategy use and performance on language tests: A structural equation modeling approach. Cambridge: Cambridge University Press.
Sasaki, M., & Hirose, K. (1999). Development of an analytic rating scale for Japanese L1 writing. Language Testing, 16(4), 457–478.
Sato, T. (2012). The contribution of test-takers’ speech content to scores on an English oral proficiency test. Language Testing, 29(2), 223–241.
Sawaki, Y. (2007). Construct validation of analytic rating scale in speaking assessment: Reporting a score profile and a composite. Language Testing, 24(3), 355–390.
Sawaki, Y., Stricker, L. J., & Oranje, A. H. (2009). Factor structure of the TOEFL Internet-based test. Language Testing, 26(1), 5–30.
Shin, S.-Y., & Ewert, D. (2015). What accounts for integrated reading-to-write task scores? Language Testing, 32(2), 259–281.
Shohamy, E. (1994). The validity of direct versus semi-direct oral tests. Language Testing, 11(2), 99–123.
TOPE Project Team. (2013). Syllabus for Test of Oral Proficiency in English (TOPE). Beijing: China Renming University Press.
Tsinghua University Testing Team. (2012). Syllabus for Tsinghua English Proficiency Test (TEPT). Beijing: Tsinghua University Press.
Upshur, J. A., & Turner, C. E. (1995). Constructing rating scales for second language tests. ELT Journal, 49(1), 3–12.
Upshur, J. A., & Turner, C. E. (1999). Systematic effects in the rating of second-language speaking ability: test method and learner discourse. Language Testing, 16(1), 82–111.
Xie, Q., & Andrews, S. (2012). Do test design and uses influence test preparation? Testing a model of washback with Structural Equation Modeling. Language Testing, 30(1), 49–70.
Acknowledgments
The study reported in this chapter was supported by the National Social Sciences Fund of the People’s Republic of China under the project title of “Development and Validation of Standards in Language Testing” (Grant No: 13CYY032), and the Research Project of National Foreign Language Teaching in Higher Education under the project title of “Teacher-, Peer-, and Self-assessment in Translation Teaching: A Many-Facets Rasch Modeling Approach” (Grant No: 2014SH0008A). Part of this research was published in the third issue of Foreign Language Education in China (Quarterly) in 2015.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this paper
Cite this paper
Fan, J., Bond, T. (2016). Using MFRM and SEM in the Validation of Analytic Rating Scales of an English Speaking Assessment. In: Zhang, Q. (eds) Pacific Rim Objective Measurement Symposium (PROMS) 2015 Conference Proceedings. Springer, Singapore. https://doi.org/10.1007/978-981-10-1687-5_3
Download citation
DOI: https://doi.org/10.1007/978-981-10-1687-5_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-1686-8
Online ISBN: 978-981-10-1687-5
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)