Abstract
Purpose
In developing item banks for patient reported outcomes (PROs), nonparametric techniques are often used for investigating empirical item response curves, whereas final banks usually use parsimonious parametric models. A flexible approach based on monotonic polynomials (MP) provides a compromise by modeling items with both complex and simpler response curves. This paper investigates the suitability of MPs to PRO data.
Method
Using PROMIS Wave 1 data (N = 15,725) for Physical Function, we fitted an MP model and the graded response model (GRM). We compared both models in terms of overall model fit, latent trait estimates, and item/test information. We quantified possible GRM item misfit using approaches that compute discrepancies with the MP. Through simulations, we investigated the ability of the MP to perform well versus the GRM under identical data collection conditions.
Results
A likelihood ratio test (p < 0.001) and AIC (but not BIC) indicated better fit for the MP. Latent trait estimates and expected test scores were comparable between models, but we observed higher information for the MP in the lower range of physical functioning. Many items were flagged as possibly misfitting and simulations supported the performance of the MP. Yet discrepancies between the MP and GRM were small.
Conclusion
The MP approach allows inclusion of items with complex response curves into PRO item banks. Information for the physical functioning item bank may be greater than originally thought for low levels of physical functioning. This may translate into small improvements if an MP approach is used.
Similar content being viewed by others
Data availability
As described in the method section, data used in this manuscript are available in the public domain.
Code availability
Examples of estimation of the monotonic polynomial model are available in Supplementary Materials.
Notes
For a recent discussion on the merits of collapsing categories, see Harel and Steele [25].
Estimation options were changed slightly to increase computational speed and are described in Supplementary Materials.
References
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.
Fries, J. F., Bruce, B., & Cella, D. (2005). The promise of PROMIS: Using item response theory to improve assessment of patient-reported outcomes. Clinical and Experimental Rheumatology, 23(5 Suppl 39), S53–S57.
Choi, S. W., Schalet, B., Cook, K. F., & Cella, D. (2014). Establishing a common metric for depressive symptoms: Linking the BDI-II, CES-D, and PHQ-9 to PROMIS depression. Psychological Assessment, 26, 513–527. https://doi.org/10.1037/a0035768
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometric Monographs. https://doi.org/10.1002/j.2333-8504.1968.tb00153.x
Samejima, F. (1972). A general model of free-response data. Psychometric Monographs No. 18. Psychometric Society.
Samejima, F. (2010). The general graded response model. In M. Nering & R. Ostini (Eds.), Handbook of polytomous item response theory models: Developments and applications (pp. 77–107). Taylor & Francis.
Rose, M., Bjorner, J. B., Gandek, B., Bruce, B., Fries, J. F., & Ware, J. E. (2014). The PROMIS physical function item bank was calibrated to a standardized metric and show to improve measurement efficiency. Journal of Clinical Epidemiology, 67, 516–526. https://doi.org/10.1016/j.jclinepi.2013.10.024
Meijer, R. R., & Baneke, J. J. (2004). Analyzing psychopathology items: A case for nonparametric item response theory modeling. Psychological Methods, 9, 354–368. https://doi.org/10.1037/1082-989X.9.3.354
Patient-Reported Outcomes Measurement Information System (2013). PROMIS instrument development and validation scientific standards version 2.0. Retrieved from, http://www.healthmeasures.net/images/PROMIS/PROMISStandards_Vers2.0_Final.pdf
Falk, C. F., & Cai, L. (2016). Semi-parametric item response functions in the context of guessing. Journal of Educational Measurement, 53, 229–247. https://doi.org/10.1111/jedm.12111
Wells, C. S., & Bolt, D. M. (2008). Investigation of a nonparametric procedure for assessing goodness-of-fit in item response theory. Applied Measurement in Education, 21, 22–40. https://doi.org/10.1080/08957340701796464
Falk, C. F. (2019). Model selection for monotonic polynomial item response models. Quantitative psychology: The 83rd Annual Meeting of the Psychometric Society, New York, NY, 2018 (pp. 75–85). Springer. https://doi.org/10.1007/978-3-030-01310-3_7
Falk, C. F. (2020). The monotonic polynomial graded response model: Implementation and a comparative study. Applied Psychological Measurement, 44, 465–481. https://doi.org/10.1177/0146621620909897
Falk, C. F., & Cai, L. (2016). Maximum marginal likelihood estimation of a monotonic polynomial generalized partial credit model with applications to multiple group analysis. Psychometrika, 81, 434–460. https://doi.org/10.1007/s11336-014-9428-7
Liang, L., & Browne, M. W. (2015). A quasi-parametric method for fitting flexible item response functions. Journal of Educational and Behavioral Statistics, 40, 5–34. https://doi.org/10.3102/1076998614556816
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Lawrence Erlbaum Associates.
Feuerstahler, L. M. (2016). Exploring alternate latent trait metrics with filtered monotonic polynomial IRT models (PhD thesis). Department of Psychology, University of Minnesota.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443–459. https://doi.org/10.1007/BF02293801
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195. https://doi.org/10.1007/BF02293979
Feuerstahler, L. M. (2019). Metric transformations and the filtered monotonic polynomial item response model. Psychometrika, 84, 105–123. https://doi.org/10.1007/s11336-018-9642-9
Choi, S. W., Reise, S. P., Pilkonis, P., Hays, R. D., & Cella, D. (2010). Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms. Quality of Life Research, 19, 125–136. https://doi.org/10.1007/s11136-009-9560-5
Cella, D. (2015). PROMIS 1 wave 1. Harvard Dataverse. https://doi.org/10.7910/DVN/0NGAKG.
Liu, H. H., Cella, D., Gershon, R., Shen, J., Morales, L. S., Riley, W., & Hays, R. D. (2010). Representativeness of the PROMIS internet panel. Journal of Clinical Epidemiology, 63, 1169–1178. https://doi.org/10.1016/j.jclinepi.2009.11.021
Harel, D., & Steele, R. J. (2018). An information matrix test for the collapsing of categories under the partial credit model. Journal of Educational and Behavioral Statistics, 43, 721–750.
Santor, D. A., Ramsay, J. O., & Zuroff, D. C. (1994). Nonparametric item analyses of the Beck depression inventory: Evaluating gender item bias and response option weights. Psychological Assessment, 6, 255–270. https://doi.org/10.1037/1040-3590.6.3.255
Rose, M., Bjorner, J. B., Becker, J., Fries, J. F., & Ware, J. E. (2008). Evaluation of a preliminary physical function item bank supported the expected advantages of the patient-reported outcomes measurement information system (PROMIS). Journal of Clinical Epidemiology, 61, 17–33. https://doi.org/10.1016/j.jclinepi.2006.06.025
Sijtsma, K., & van der Ark, L. A. (2003). Investigation and treatment of missing item scores in test and questionnaire data. Multivariate Behavioral Research, 38, 505–528. https://doi.org/10.1207/s15327906mbr3804_4
van der Ark, L. A., & Sijtsma, K. (2005). The effect of missing data imputation on Mokken scale analysis. In L. A. van der Ark, M. A. Croon, & K. Sijtsma (Eds.), New developments in categorical data analysis for the social and behavioral sciences (pp. 147–166). Lawrence Erlbaum.
van Ginkel, J. R., van der Ark, L. A., & Sijtsma, K. (2007). Multiple imputation of item scores in test and questionnaire data, and influence on psychometric results. Multivariate Behavioral Research, 42, 387–414. https://doi.org/10.1080/00273170701360803
Wind, S. A., & Patil, Y. J. (2018). Exploring incomplete rating designs with Mokken scale analysis. Educational and Psychological Measurement, 78, 319–342. https://doi.org/10.1177/0013164416675393
Neale, M. C., Hunter, M. D., Pritikin, J. N., Zahery, M., Brick, T. R., Kickpatrick, R. M., Estabrook, R., Bates, T. C., Maes, H. H., & Boker, S. M. (2016). OpenMx 2.0: Extended structural equation and statistical modeling. Psychometrika, 81, 535–549. https://doi.org/10.1007/s11336-014-9435-8
Pritikin, J. N., Hunter, M. D., & Boker, S. M. (2015). Modular open-source software for item factor analysis. Educational and Psychological Measurement, 75, 458–475. https://doi.org/10.1177/0013164414554615
Pritikin, J. N. (2016). Rpf: Response probability functions. Retrieved from https://CRAN.R-project.org/package=rpf
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431–444. https://doi.org/10.1177/014662168200600405
Chalmers, R. P. (2018). Model-based measures for detecting and quantifying response bias. Psychometrika, 83, 696–732. https://doi.org/10.1007/s11336-018-9626-9
Chalmers, R. P., Counsell, A., & Flora, D. B. (2016). It might not make a big DIF: Improved differential test functioning statistics that account for sampling variability. Educational and Psychological Measurement, 76, 114–140. https://doi.org/10.1177/0013164415584576
Edelen, M. O., Stucky, B. D., & Chandra, A. (2015). Quantifying “problematic” DIF within an IRT framework: Application to a cancer stigma index. Quality of Life Research, 24, 95–103. https://doi.org/10.1007/s11136-013-0540-4
Organization for Economic Cooperation and Development. (2017). PISA 2015 technical report. Organization for Economic Cooperation and Development.
Waller, N. G., & Feuerstahler, L. (2017). Bayesian modal estimation of the four-parameter item response model in real, realistic, and idealized data sets. Multivariate Behavioral Research, 52, 350–370. https://doi.org/10.1080/00273171.2017.1292893
Feuerstahler, L. M. (2018). Sources of error in IRT trait estimation. Applied Psychological Measurement, 42, 359–375. https://doi.org/10.1177/0146621617733955
Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15, 113–141. https://doi.org/10.1207/S15324818AME1502_01
Douglas, J., & Cohen, A. (2001). Nonparametric item response function estimation for assessing parametric model fit. Applied Psychological Measurement, 25, 234–243. https://doi.org/10.1177/01466210122032046
Liang, T., & Wells, C. S. (2009). A model fit statistic for generalized partial credit model. Educational and Psychological Measurement, 69, 913–928. https://doi.org/10.1177/0013164409332222
Liang, T., & Wells, C. S. (2015). A nonparametric approach for assessing goodness-of-fit of IRT models in a mixed format test. Applied Measurement in Education, 28, 115–129. https://doi.org/10.1080/08957347.2014.1002918
Maydeu-Olivares, A. (2005). Further empirical results on parametric versus nonparametric IRT modeling of Likert-type personality data. Multivariate Behavioral Research, 40, 261–279.
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
Acknowledgements
We acknowledge the support of a research Grant from the Fonds de recherche du Quebec—Nature et technologies [2019-NC-255344] to the first author.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Falk, C.F., Fischer, F. More flexible response functions for the PROMIS physical functioning item bank by application of a monotonic polynomial approach. Qual Life Res 31, 37–47 (2022). https://doi.org/10.1007/s11136-021-02873-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-021-02873-7