Fit for purpose and modern validity theory in clinical outcomes assessment

Edwards, Michael C.; Slagle, Ashley; Rubright, Jonathan D.; Wirth, R. J.

doi:10.1007/s11136-017-1644-z

Fit for purpose and modern validity theory in clinical outcomes assessment

Special Section: Test Construction (by invitation only)
Published: 07 July 2017

Volume 27, pages 1711–1720, (2018)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Michael C. Edwards ORCID: orcid.org/0000-0003-2824-7585^1,4,
Ashley Slagle²,
Jonathan D. Rubright³ &
…
R. J. Wirth⁴

1327 Accesses
18 Citations
3 Altmetric
Explore all metrics

“In casual terms, we can define validity as measuring the right thing, and reliability as measuring the thing right.” [1] (p. 11).

Abstract

Purpose

The US Food and Drug Administration (FDA), as part of its regulatory mission, is charged with determining whether a clinical outcome assessment (COA) is “fit for purpose” when used in clinical trials to support drug approval and product labeling. In this paper, we will provide a review (and some commentary) on the current state of affairs in COA development/evaluation/use with a focus on one aspect: How do you know you are measuring the right thing? In the psychometric literature, this concept is referred to broadly as validity and has itself evolved over many years of research and application.

Review

After a brief introduction, the first section will review current ideas about “fit for purpose” and how it has been viewed by FDA. This section will also describe some of the unique challenges to COA development/evaluation/use in the clinical trials space. Following this, we provide an overview of modern validity theory as it is currently understood in the psychometric tradition. This overview will focus primarily on the perspective of validity theorists such as Messick and Kane whose work forms the backbone for the bulk of high-stakes assessment in areas such as education, psychology, and health outcomes.

Conclusions

We situate the concept of fit for purpose within the broader context of validity. By comparing and contrasting the approaches and the situations where they have traditionally been applied, we identify areas of conceptual overlap as well as areas where more discussion and research are needed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Health Measurement, Industry, and Science

Patient-Reported Outcome Measures: Development and Psychometric Evaluation

Beyond study participants: a framework for engaging patients in the selection or development of clinical outcome assessments for evaluating the benefits of treatment in medical product development

Article 19 April 2017

Notes

What a test measures goes by many names: construct, trait, latent variable, dimension, or domain. We use “construct” throughout the remainder of this document as the generic referent to what tests measure. It is a commonly used term and nicely conveys the core idea that what we are trying to measure is a theoretical construction.
We use terms like assessment, scale, inventory, and test interchangeably in this paper. While “test” is the dominant term in the educational arena (from where much validity theory has emanated) it is generic with respect to the larger points being made here.

References

Thissen, D., & Wainer, H. (2001). Test scoring. Mahwah, NJ: Lawrence Erlbaum Associates.
Book Google Scholar
U.S. Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research, Center for Biologics Evaluation and Research, Center for Devices and Radiological Health. (2009). Guidance for industry patient-reported outcome measures: Use in medical product development to support labeling claims. Retrieved January 30, 2017, from http://www.fda.gov/downloads/Drugs/Guidances/UCM193282.pdf. Published December 2009
FDA-NIH Biomarker Working Group. (2016). BEST (Biomarkers, EndpointS, and other Tools) Resource. Retrieved January 30, 2017, from https://www.ncbi.nlm.nih.gov/books/NBK338448/
Patrick, D. L., Burke, L. B., Gwaltney, C. J., Kline Leidy, N., Martin, M. L., Molsen, E., et al. (2011). Content validity— establishing and reporting the evidence in newly-developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: Part 1—eliciting concepts for a new PRO instrument. Value in Health, 14, 967–977.
Article PubMed Google Scholar
Patrick, D. L., Burke, L. B., Gwaltney, C. J., Kline Leidy, N., Martin, M. L., Molsen, E., et al. (2011). Content validity—establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: Part 2—assessing respondent understanding. Value in Health, 14, 978–988.
Article PubMed Google Scholar
U.S. Department of Health and Human Services, Food and Drug Administration. (2016). Clinical outcome assessment (COA): Frequently asked questions. Retrieved January 30, 2017, from http://www.fda.gov/Drugs/DevelopmentApprovalProcess/DrugDevelopmentToolsQualificationProgram/ucm370261.htm
U.S. Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research. (2015). Gastroparesis: Clinical Evaluation of Drugs for Treatment Guidance for Industry. Retrieved January 30, 2017, from https://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM455645.pdf
American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Google Scholar
Thorndike, E. L. (1918). The nature, purposes, and general methods of measurements of educational products. In G. M. Whipple (Ed.), The measurement of educational products. Seventeenth yearbook of the National Society for the Study of Education, Part II (pp. 16–24). Bloomington, IL: Public School Publishing Company.
American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin Supplement, 51(2), 1–38.
Article Google Scholar
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.
Article PubMed CAS Google Scholar
Pitoniak, M. J., Sireci, S. G., & Luecht, R. M. (2002). A multitrait-multimethod validity investigation of scores from a professional licensure examination. Educational and Psychological Measurement, 62(3), 498–516.
Article Google Scholar
Ebel, R. L. (1956). Obtaining and reporting evidence on content validity. Educational and Psychological Measurement, 16(3), 269–282.
Article Google Scholar
Sireci, S. G. (1998). The construct of content validity. Social Indicators Research, 45, 83–117.
Article Google Scholar
Messick, S. (1975). The standard program: Meaning and values in measurement and evaluation. American Psychologist, 30, 955–966.
Article Google Scholar
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Google Scholar
Messick, S. (1988). The once and future issues of validity. Assessing the meaning and consequences of measurement. In H. Wainer and H. Braun (Eds.), Test validity (pp. 33–45). Hillsdale, NJ: Lawrence Erlbaum.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York, NY: American Council on Education and Macmillan.
Google Scholar
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.
Article Google Scholar
Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319–342.
Article Google Scholar
Cronbach, L. J. (1980). Selection theory for a political world. Public Personnel Management, 9(1), 37–50.
Article Google Scholar
House, E. R. (1980). Evaluating with validity. Beverly Hills, CA: Sage.
Google Scholar
Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer & H. Braun (Eds.), Test validity (pp. 3–17). Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Kane, M. T. (1992). An argument-based approach to validation. Psychological Bulletin, 112, 527–535.
Article Google Scholar
Kane, M. T. (2013). Validating the Interpretations and Uses of Test Scores. Journal of Educational Measurement, 50(1), 1–73.
Article Google Scholar
Kane, M. (2006). Validation. In R. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: American Council on Education and Praeger.
Google Scholar
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071.
Article PubMed Google Scholar
Hays, R. D., & Hadorn, D. (1992). Responsiveness to change: An aspect of validity, not a separate dimension. Quality of Life Research, 1, 73–75.
Article PubMed CAS Google Scholar
Terwee, C. B., Dekker, F. W., Wiersinga, W. M., Prummel, M. F., & Bossuyt, P. M. (2003). On assessing responsiveness of health-related quality of life instruments: Guidelines for instrument evaluation. Quality of Life Research, 12(4), 349–362.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Arizona State University, PO Box 871104, Tempe, AZ, 85287-1104, USA
Michael C. Edwards
Aspen Consulting, LLC, 619 S 11th St, Philadelphia, PA, 19147, USA
Ashley Slagle
National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA, 19104, USA
Jonathan D. Rubright
Vector Psychometric Group, LLC, 847 Emily Lane, Chapel Hill, NC, 27516, USA
Michael C. Edwards & R. J. Wirth

Authors

Michael C. Edwards
View author publications
You can also search for this author in PubMed Google Scholar
Ashley Slagle
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan D. Rubright
View author publications
You can also search for this author in PubMed Google Scholar
R. J. Wirth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael C. Edwards.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants performed by the authors.

Additional information

Ashley Slagle is a former FDA employee. The regulatory perspective offered in this manuscript is her own and, while reflecting her experience with FDA, is not intended to present any official FDA position.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Edwards, M.C., Slagle, A., Rubright, J.D. et al. Fit for purpose and modern validity theory in clinical outcomes assessment. Qual Life Res 27, 1711–1720 (2018). https://doi.org/10.1007/s11136-017-1644-z

Download citation

Accepted: 01 July 2017
Published: 07 July 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s11136-017-1644-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fit for purpose and modern validity theory in clinical outcomes assessment