Skip to main content

Combining Tests and Setting Standards

  • Chapter
  • First Online:
International Handbook of Research in Medical Education

Part of the book series: Springer International Handbooks of Education ((SIHE,volume 7))

Summary

In testing, the area of standards and standard-setting remains relatively unsettled. The chapter begins with definitions of scores and standards, and describes norm-referenced score interpretation, domain-referenced score interpretation, relative standards, and absolute standards. It then reviews the work related to the credibility of standards and outlines some of the more common standard-setting techniques used with MCQ-based tests and clinical examinations. Finally, because it is sometimes useful to combine scores from several related assessments, information is presented on when and how to do so.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 429.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 549.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 549.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Anastasi, A. (1988).Psychological testing(6thed.). New York: Macmillan.

    Google Scholar 

  • Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Eds.)Educational Measurement.Washington, DC: American Council on Education.

    Google Scholar 

  • Berk, R. A. (1986). A consumer’s guide to setting performance standards on criterion-referenced tests.Review of Educational Research 56137–172.

    Article  Google Scholar 

  • Brennan, R. L., & Lockwood, R. E. (1980). A comparison of the Nedelsky and Angoff cutting score procedures using generalizability theory.Applied Psychological Measurement 4219–240.

    Article  Google Scholar 

  • Busch, J. C., & Jaeger, R. M. (1990). Influence of type of judge, normative information, and discussion on standards recommended for the National Teachers examinations.Journal of Educational Measurement27, 145–163.

    Article  Google Scholar 

  • Clauser, B. E., & Clyman, S. G. (1994). A contrasting-groups approach to standard setting for performance assessments on clinical skills.Academic Medicine 69S42–S44.

    Article  Google Scholar 

  • Cronbach, L. J. (1990).Essentials of psychological testing(5thed.). New York: Harper Collins.

    Google Scholar 

  • Cross, L. H., Impara, J. C., Frary, R. B., & Jaeger, R. M. (1984). A comparison of three methods for setting standards on the National Teachers Examination.Journal of Educational Measurement 21113–129.

    Article  Google Scholar 

  • Cusimano, M.D. (1996). Standard setting in medical education.Journal of Educational Measurement 21113–129.

    Google Scholar 

  • Dawes, R. M., & Corrigan, R. (1974). Linear models in decision making.Psychological Bulletin 8195–106.

    Article  Google Scholar 

  • De Gruijter, D. N. M. (1985). Compromise models for establishing examination standards.Journal of Educational Measurement22, 263–269.

    Article  Google Scholar 

  • Ebel, R. L. (1979).Essentials of educational measurement.Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Fabrey, L., & Raymond, M. (1987). Congruence of standard-setting methods for a nursing certification examination. Paper presented at the annual meeting of the National Council on Measurement in Education, Washington, DC.

    Google Scholar 

  • Fitzpatrick, A. R. (1989). Social influences in standard setting.Review of Educational Research 59222–235.

    Article  Google Scholar 

  • Glass, G. V. (1978). Standards and criteria.Journal of Educational Measurement 15237–261.

    Article  Google Scholar 

  • Jaeger, R. M. (1989). Certification of student competence. In R. L. Linn (Ed.)Educational Measurement(pp. 485–514). New York: American Council on Education and Macmillan.

    Google Scholar 

  • Jaeger, R. M. (1995). Setting performance standards through two-stage judgmental policy capturing.Applied Measurement in Education 815–40.

    Article  Google Scholar 

  • Kane, M. (1987). On the use of IRT models with judgmental standard-setting procedures.Journal of Educational Measurement 24333–345.

    Article  Google Scholar 

  • Kane, M. (1994). Validating the performance standards associated with passing scores.Review of Educational Research 64425–461.

    Article  Google Scholar 

  • Kane, M., & Wilson, J. (1984). Errors of measurement and standard setting in mastery testing.Applied Psychological Measurement 8107–115.

    Article  Google Scholar 

  • Livingston, S. A., & Zeiky, M. J. (1982). Passing scores: A manual for setting standards of performance on educational and occupational tests. Educational Testing Service. Princeton, NJ.

    Google Scholar 

  • Meskauskas, J. A. (1976). Evaluation models for criterion-referenced testing: Views regarding mastery and standard-setting.Review of Educational Research 45133–158.

    Article  Google Scholar 

  • Mills, C. N., Jaeger, R. M., Plake, B. S., & Hambleton, R. K. (1998). An investigation of several new methods for establishing standards on complex performance assessments. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.

    Google Scholar 

  • Norcini, J. J. (1992). Approaches to standard-setting for performance-based examinations. In R. M. Harden, I. R. Hart, & M. A. Mulholland (Eds.)Approaches to the assessment of clinical competence Part 1(pp. 3237). Dundee, Scotland: Centre for Medical Education.

    Google Scholar 

  • Norcini, J. J. (1999). Standards and reliability: When piles of thumb don’t apply.Academic Medicine 741088–1090.

    Article  Google Scholar 

  • Norcini, J. J., & Shea, J. A. (1992). The reproducibility of standards over groups and occasions.Applied Measurement in Education 563–72.

    Article  Google Scholar 

  • Norcini, J. J., & Shea, J. A. (1997). The credibility and comparability of standards.Applied Measurement in Education 1039–59.

    Article  Google Scholar 

  • Norcini, J. J., Lipner, R. S., Langdon, L. O.&Strecker, C. A. (1987) A comparison of three variations on a standard-setting method.Journal of Educational Measurement 2456–64.

    Article  Google Scholar 

  • Norcini, J. J., Maihoff, N. A., Day, S. C., & Benson, Jr., J. A. (1989). Trends in medical knowledge as assessed by the certifying examination in internal medicine.Journal of the American Medical Association 2622402–2404.

    Article  Google Scholar 

  • Norcini, J. J., Shea, J. A., & Kanya, D. T. (1988). The effect of various factors on standard-setting.Journal of Educational Measurement 2557–65.

    Article  Google Scholar 

  • Norcini, J. J., Shea, J. A., & Ping, J. C. (1988). A note on application of multiple matrix sampling to standard-setting.Journal of Educational Measurement 25159–164.

    Article  Google Scholar 

  • Norcini, J. J., Shea, J. A.&Webster, G. D. (1986) Perceptions of the certification standards of the American Board of Internal Medicine.Journal of General Internal Medicine 1166–169.

    Article  Google Scholar 

  • Orr, N. A., & Nungester, R.L. (1991). Assessment of constituency opinion about NBME examination standards.Academic Medicine 66465–470.

    Article  Google Scholar 

  • Petersen, N. S., Kolen, M. J.&Hoover, H. D. (1989) Scaling norming and equating. In R. L. Linn (Ed.)Education measurement(pp. 221–262). New York: American Council on Education and Macmillan.

    Google Scholar 

  • Plake, B. S, Impara, J. C.&Potenza, M. T. (1994) Content specificity of expert judgments in a standard-setting study.Journal of Educational Measurement 31339–347.

    Article  Google Scholar 

  • Popham, W. J. (1978). As always provocative.Journal of Educational Measurement 15297–300.

    Article  Google Scholar 

  • Putnam, S. E., Pence, P., & Jaeger, R. M. (1995). A multi-stage dominant profile method for setting standards on complex performance assessments.Applied Measurement in Education 857–84.

    Article  Google Scholar 

  • Ramsey, P. G., Carline, J. D., Inui, T. S., Larson, E. B., LoGerfo, J. P., & Wenrich, M. D. (1989). Predictive validity of certification by the American Board of Internal Medicine.Annals of Internal Medicine 110719–726.

    Article  Google Scholar 

  • Shea, J. A., Reshetar, R. A., Dawson, B. D.&Norcini, J. J. (1994) Sensitivity of the modified Angoff standard-setting method to variations in item content.Teaching and Learning in Medicine 6288–292.

    Article  Google Scholar 

  • Shepard, L. A. (1980). Standard setting issues and methods.Applied Psychological Measurement 4447–467.

    Article  Google Scholar 

  • Shepard, L. A. (1984). Setting performance standards. In R. A. Berk (Ed.), Aguide to criterion-referenced test construction(pp. 169–198). Baltimore: Johns Hopkins Press.

    Google Scholar 

  • Shimberg, B. (1981). Testing for licensure and certification.American Psychologist 361138–1146.

    Article  Google Scholar 

  • Smith, R. L., & Smith, J. K. (1988). Differential use of item information by judges using Angoff and Nedelsky procedures.Journal of Educational Measurement 25259–274.

    Article  Google Scholar 

  • Van der Linden, W. J. (1982). A latent trait method for determining intrajudge inconsistency in the Angoff and Nedelsky procedures.Journal of Educational Measurement19, 295–308.

    Article  Google Scholar 

  • Wang, M. W., & Stanley, J. C. (1970). Differential weighting: A review of methods and empirical studies.Review of Educational Research 4663–705.

    Article  Google Scholar 

  • Wainer, H. (1976). Estimating coefficients in liner models: It don’t make no nevermind.Psychological Bulletin 83(2)213–217.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Norcini, J., Guille, R. (2002). Combining Tests and Setting Standards. In: Norman, G.R., et al. International Handbook of Research in Medical Education. Springer International Handbooks of Education, vol 7. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0462-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0462-6_30

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-3904-8

  • Online ISBN: 978-94-010-0462-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics