INTRODUCTION

SymTrak-23 is a measure recently validated for identifying and monitoring clinically actionable symptoms and impairments in patients attending primary care with multimorbidity.1,2 Multiple chronic conditions (MCCs),3 complex health care needs,4 and symptoms (physical, emotional, and cognitive)5,6 are common clinical presentations among elderly patients. Symptoms account for over half of all US outpatient visits annually7 and predict health care utilization and costs, quality of life, work disability, and mortality.5,6,8,9

SymTrak-23 was developed to target the most prevalent and disabling symptoms and functional impairments experienced by older adults, including SPADE symptoms (sleep disturbance, pain, anxiety, depression, [low] energy/fatigue) as well as impairments related to mobility, cognition, and vision or hearing. Moreover, these domains were selected because they are potentially clinically actionable in terms of evaluation and management. Both patient and caregiver versions of SymTrak-23 were validated1,2 because up to 57% of elderly patients are accompanied by an informal caregiver,10 and some older patients are unwilling or unable to complete self-report measures, meriting proxy measures.11

The present paper develops and validates an abbreviated 8-item SymTrak scale (SymTrak-8).1,2 Internal and test–retest reliability as well as criterion validity, convergent validity, and sensitivity to change were assessed. Our goal was to address the same domains covered by SymTrak-23. Thus, the total score for SymTrak-8 and SymTrak-23 is a measure of the overall burden of symptoms and functional impairments associated with multimorbidity in older adults. Secondarily, the total score can be used as an indicator of health-related quality of life (HRQOL) by health care systems and clinical researchers.

METHODS

Setting and Sample

The 600 participants (200 patient-caregiver dyads and 200 non-dyadic patients without an identified caregiver) were recruited from an academic-affiliated primary care network of clinics. The study was approved by the institutional review board and all participants provided written informed consent. Patient inclusion criteria were as follows: (1) age ≥ 65 years, (2) ≥ one primary care visit in the past 12 months, (3) ≥ one chronic condition according to medical records, and (4) for those participants who had an identified informal caregiver, the caregiver had to be ≥ 21 years of age and willing to participate in the study. Patient exclusion criteria were as follows: permanent residency at a long-term care facility or the presence of a severe mental illness such as bipolar disorder or schizophrenia.

Measures

A brief survey, consisting of demographics, SymTrak-23, and the Health Utility Index Mark 3 (HUI3)12 was completed by participants (N = 600) by interview at baseline and 3 months post-baseline. A subsample (n = 180) consisting of 60 patient-caregiver dyads and 60 non-dyadic individual patients completed an interview 24 h after baseline. All interviews were telephone-administered by research assistants.

The HUI3 is a preference-based measure of HRQOL; the HUI3 overall utility score is a continuous variable ranging from 0 to 1, where 0 represents death and 1 represents perfect health.12 SymTrak-23 consists of 23 items that measure symptoms and functional impairments. The item-response options are as follows: 0 = never, 1 = sometimes, 2 = often, 3 = always. Two general health items (Poor, Fair, Good, Very Good, Excellent), rated separately for physical and emotional health, were also administered to assess construct validity. All scales (SymTrak-23, HUI3, general health ratings) were self-reported by patients and proxy-reported by caregivers about their dyadic patient’s health status.

Analysis

Item-level psychometrics were examined, including floor and ceiling effects, item-total correlations, and item-level validity correlations with the criterion of HUI3 overall utility score. Item-level 24-h test–retest reliability was estimated for ordinal items using weighted kappa with Fleiss–Cohen quadratic weights. Confirmatory factor analysis (CFA) of the hypothesized one-factor model was performed using MPLUS software (8th Edition)13 with the nonlinear probit link for ordinal categorical items and weighted least square mean and variance adjusted (WLSMV) estimation. The following CFA fit indices and thresholds were used for indicating good fit14: root mean square error of approximation (RMSEA) < 0.06, comparative fit index (CFI) > 0.95, and standardized root mean square residual (SRMR) < 0.08. The chi-square goodness-of-fit test was also conducted.

Concurrent predictive criterion validity was assessed by using linear regression to test the association between the baseline SymTrak total scale score (independent variable) and the baseline HUI3 overall HRQOL utility score (dependent variable) while adjusting for covariates. The standardized regression coefficient (STB) was reported as an effect size. Cronbach’s alpha was used to assess internal consistency reliability. The absolute-agreement intra-class correlation coefficient (ICC) was used to assess scale score test–retest reliability, while specifying occasions as a random effect.

Sensitivity to change was examined using baseline and 3-month data. The within-group mean change and standardized response mean (SRM) effect size (mean change divided by SD of change) were computed for SymTrak total scores, separately for each of three “reliable” HRQOL change groups. These HRQOL criterion groups (declined, stable, or improved) were formulated based on ± 1 standard error of measurement (SEM) in the HUI3 overall utility score. The relevant 1 SEM value (0.089) for HUI3 for this analysis was calculated using the temporal stability approach, based on 24-h test–retest reliability and the SD of 3-month change scores.2 A general linear model (GLM) was used to compare SymTrak change scores. If the GLM omnibus F test (for the overall difference between all three HUI3 change groups) was significant, all pairwise differences (e.g., declined vs stable) were tested using Fisher’s protected least significant difference method which controls experiment-wise type I error at 0.05 when the number of groups is only three.15 All tests were conducted as two-sided using 0.05 alpha.

RESULTS

Participant Characteristics

Demographic characteristics of patients and caregivers have been previously reported.2 Briefly, patients had a median age of 73 (range, 65 to 95), and 73% were women. Patients and caregivers were diverse demographically (e.g., 49% African American). Most caregivers were either a child (43%) or a spouse/partner (36%).

Content Validity

Content validity for the development of SymTrak-23 is described elsewhere.2 The process of shortening a scale should be undertaken with domain experts to guide the narrowing of content and to assess redundancy and usefulness for the intended purpose.16,17 Eight items (4 original and 4 that each contain bundled content from 2 original SymTrak-23 items) were considered ideal candidates by our content experts (all authors) for an abbreviated scale because each item (one for each domain) has the content validity for capturing 8 important domains: the SPADE symptom domains (sleep disturbance, pain, anxiety, depression, and low energy), cognition, mobility, and vision/hearing problems (Table 1, bolded items).

Table 1 SymTrak Item-Level Psychometrics (Baseline, Patients, N = 400)

Specifically, the SPADE symptom domains represent five of the most prevalent, chronic, disabling, and undertreated symptom domains; they also tend to co-occur in both the general population18 and in clinical practice.18,19,20,21,22,23,24,25,26 Cognitive impairment is an important 6th domain because of its prevalence in older adults, comorbidity and adverse effects on other illnesses, impact on poor adherence to prescribed therapies, and worse survival.11 Mobility-related functioning (7th domain) has important health and social consequences in multimorbidity populations and has value as a predictor of nursing home placement and mortality.11 The falling or tripping item had a high floor effect, a common limitation when measuring physical functioning;11 therefore, the more general and prevalent item, walking or moving around, was selected. Finally, the 8th domain (vision or hearing loss) comprises two sensory impairments that are prevalent in geriatric primary care and are highly treatable,27 often comorbid,28 and associated with poorer HRQOL.12,28,29

Each of the 4 bundled items included the essential content from 2 original SymTrak-23 items according to established principles.30 The score for each bundled item was calculated as the average of its two constituent item scores and then rounded to an integer (i.e., 0–3). In a sensitivity analysis, using the maximum produced similar psychometric results.

Item-Level Reliability and Validity

Item-level psychometric performance is shown for patient-reported data in Table 1. The 8 bolded items (i.e., SymTrak-8), initially selected based on content validity, demonstrated comparable psychometrics compared to the non-selected items, specifically, for test–retest reliability, item correlations with the original 23-item total, and criterion validity correlations with the HUI3 overall utility score (Table 1). The 4 bundled items demonstrated similar or better results for test–retest reliability, item-total correlations, and criterion validity (i.e., correlations with HUI3) compared to their original component items (Table 1). Importantly, bundled items exhibited lower ceiling effects and markedly lower floor effects than their original items. Results were similar for caregiver-reported data (Online Appendix 1).

Factorial Validity

The one-factor CFA model fit the 8 items reasonably well (Table 2). Perfect fit is often rejected by the chi-square test, as it was here, in non-small samples due to adequate power to detect minor misfit.14 The magnitude of misfit was acceptable based on fit indices.14 Two fit indices (CFI, SRMR) met or exceeded their “good fit” thresholds, and RMSEA was near its < 0.06 threshold. The loadings were above 0.45 for all items (except for the “vision or hearing” item for which loadings were above 0.30) for both patient- and caregiver-reported data (Table 2). MPLUS simulations, based on realistic threshold parameters estimated from the data, revealed power > 97% for one-factor CFA to detect 0.40 population loadings for both patient- and caregiver-reported data. The scree plot indicated that a single dominant factor explains SymTrak-8’s inter-item correlations reasonably well, supporting the validity of using the SymTrak-8 total score as an overall measure of symptoms and functional impairments (Fig. 1).

Table 2 Confirmatory Factor Analysis (CFA) of One-Factor Model
Fig. 1
figure 1

Factor analysis scree plot.

Criterion Validity

The SymTrak-8 total score was a strong cross-sectional predictor of overall HRQOL utility (HUI3), with results comparable to the SymTrak-23 total score (Table 3). After adjusting for patients’ baseline demographics (analysis 1), a 1 SD increase in SymTrak-8 and SymTrak-23, respectively, were associated with a 0.83 SD and 0.84 SD decrease in HUI3 utility score. After also adjusting for baseline medical comorbidities and medications (analysis 2), the abbreviated SymTrak remained nearly as strongly associated with HUI3 as the original SymTrak (STB = − 0.81 and − 0.84, respectively). Results for SymTrak-8 and SymTrak-23 were also strong for caregiver-reported data, with comparable strength for the two scales (Table 3). The SymTrak-8 total score had an approximately linear relationship with the HUI3 overall HRQOL utility score for both patients and caregivers (Online Appendix 2).

Table 3 SymTrak as a Concurrent Predictor of the Health Utility Index (HUI3) Criterion

Scale Distribution Features and Reliability

Very few participants scored the lowest (0.3 to 2.5%) or highest (0%) possible total score for SymTrak-8 or SymTrak-23, indicating no problems with floor or ceiling effects (Table 4). The percentage of respondents missing any item was low (i.e., ≤ 4%). The SymTrak-8 total score was approximately normally distributed (Online Appendix 3). Coefficient alpha for SymTrak-8 was 0.74 and 0.76 for patient- and caregiver-reported scores, respectively, and test–retest reliability was high, with ICCs of 0.83 and 0.87 (Table 4).

Table 4 SymTrak Scale Score Distribution Features and Reliability

Convergent Validity

The Pearson correlation between SymTrak-8 and SymTrak-23 total scores was 0.94, separately for caregivers and patients. Because the two scales contain overlapping item content, the Pearson correlation was also calculated between the SymTrak-8 total score and the “remaining” total score, the latter formed by summing 11 items from SymTrak-23 after excluding the 12 items that contributed content to SymTrak-8. As expected, the correlation between the brief and parent scales was reduced after excluding overlapping items but was still of moderate-to-high magnitude (0.75, separately for patients and caregivers). The relationships were markedly linear between SymTrak-8 total and both the SymTrak-23 total and the “remaining” total (Online Appendix 4). A generally monotonic decreasing pattern of overall symptom severity (measured by SymTrak-8 or SymTrak-23 total score) was observed across general health ratings (Online Appendix 5).

Sensitivity to Change

The SRM represents the magnitude of change in SymTrak-8 (and SymTrak-23 for comparative purposes) over 3 months (Table 5). The SRM was approximately centered near 0, as hypothesized, for the HUI3 Stable group, and in the anticipated negative and positive direction, respectively, for the groups of patients that declined or improved in HUI3-based HRQOL.

Table 5 Sensitivity to Change of SymTrak Scales

For patient- and caregiver-reported data, the change scores for both SymTrak-8 and SymTrak-23 were sensitive to detecting overall differences between the three HRQOL change groups (Table 5, omnibus F test, p < 0.001). For pairwise differences, the scales performed comparably, in general. When self-reported, both scales markedly differentiated the declined and improved groups (p < 0.0001), and also significantly (p < 0.05) distinguished the other two pairwise group differences. When proxy-reported by caregivers, both SymTrak-8 and SymTrak-23 significantly differentiated the declined and improved groups, and the stable and improved groups. However, there was one nuanced difference between the abbreviated and original scales. Caregiver-reported SymTrak-23 demonstrated stronger separation (p = 0.003) of the declined and stable groups than the marginal separation observed for caregiver-reported SymTrak-8 (p = 0.057; Table 5).

DISCUSSION

The SymTrak-8 Patient and Caregiver Forms (Online Appendices 6 and 7) have identical items and differ only in the opening stem. Consistent with SymTrak-23 findings,2 a good fit to the hypothesized one-factor model was confirmed for SymTrak-8. The total score for SymTrak-8, like that of SymTrak-23,2 was approximately normally distributed and showed an adequate range as well as negligible floor and ceiling effects. Internal and test–retest reliability were also good (> 0.70)31 for SymTrak-8, although as expected were slightly higher for SymTrak-23.

Regarding convergent validity, SymTrak-8 serves as a brief and reasonable approximation to its parent scale; they explain 88% of the variance in each other’s total score (r = 0.94) and their relationship is remarkably linear. The moderate-to-high correlation (r = 0.75) between SymTrak-8 total and the “remaining” total of SymTrak-23 supports construct validity. Specifically, the magnitude of shared variance (56%) suggests adequate conceptual similarity, which supports the use of SymTrak-8, while the amount of non-shared variance (44%) indicates sufficient uniqueness, which justifies SymTrak-23 for capturing additional information when response burden is not a concern. It should be noted that SymTrak-23 is itself relatively short compared to traditional instruments that capture multiple domains of symptoms and functional impairments.

HUI3-based criterion validity was previously reported for SymTrak-23 after adjusting for patient demographics.2 The present analyses found that both SymTrak-8 and SymTrak-23 demonstrate excellent HUI3-related criterion validity even after also adjusting for comorbidities and medications. SymTrak-8’s monotonic relationship with physical and emotional general health ratings were consistent with findings for SymTrak-23.2 Because the total score for both scales is a unidimensional measure of overall symptom burden and functional impairment, an analysis of even greater relevance, not previously published for SymTrak-23, is the relationship between these SymTrak total scores and the composite general health rating, for which both scales demonstrated strong monotonic relationships. This provides further support for the construct validity of both SymTrak-8 and SymTrak-23.

The SymTrak-8 total score also revealed good sensitivity to change. Its effect size (i.e., SRM) and significance for detecting HUI3-based HRQOL change were similar to those of SymTrak-23. However, SymTrak-23 was slightly more sensitive to detecting declining versus stable HRQOL when the patients’ symptoms were reported by caregivers, making SymTrak-23 possibly preferred over SymTrak-8 when longitudinal monitoring is captured through proxies, especially during periods of suspected declining health status and when the longer scale is not burdensome to proxy respondents.

Multimorbidity, an alternative term for MCCs, has been an area of increasing research.32,33,34,35 Particularly relevant in older adults, multimorbidity causes substantial impairment as well as increased health care utilization and costs.36,37,38 The substantial adverse impact of SPADE and other symptoms, as well as physical and cognitive functional impairments, which are captured by SymTrak-8, has been demonstrated among patients with multimorbidity in numerous studies.39,40,41,42,43,44,45,46,47,48,49,50,51,52,53 Interventions to optimize care in patients with multimorbidity have been recently emphasized,54,55 and to this end, SymTrak-8 and SymTrak-23 may also be useful.

A strength of this study was that we assessed several different aspects of reliability and validity.16 In particular, when scales are intended for longitudinal application, sensitivity to change is an essential source of validity evidence to be evaluated.16,56 Furthermore, only 20% of published articles pertaining to shortening scales tend to use confirmatory instead of exploratory factor analysis.16 Moreover, test–retest reliability is often not assessed due to the challenges of rapidly re-administering measures. Furthermore, when available, published retest subsamples are often smaller (e.g., n = 30) than those in our study (120 patients and 60 caregivers).

Limitations

Generalizability of SymTrak-8 and SymTrak-23 should be studied for younger (age < 65) patients with MCCs. Sensitivity to change for these abbreviated and parent scales should be investigated over a longer interval than 3 months to allow more time for (1) responsiveness to treatments prescribed during routine primary care or (2) accumulating deleterious effects of MCCs. Responsiveness to interventions in randomized controlled trials for populations with multimorbidity would further substantiate sensitivity to change.

All scales in this study were researcher-administered by telephone. Importantly, psychometrics were also strong for patient- and caregiver-reported SymTrak-23 when it was self-administered in clinics by paper and pencil during the pilot study, including high Cronbach’s alpha, high usability ratings, and brief administration time (average of 3 min).1 SymTrak-8 was developed using the same data set used to develop and validate SymTrak-23.1,2 The psychometrics of SymTrak-8 and SymTrak-23 should be further investigated in an independent sample.

Research and Clinical Implications

The SymTrak-8 total score, like that of SymTrak-23,2 is approximately normally distributed and has a remarkably linear relationship when used in linear regression to predict the HUI3-based HRQOL overall utility score. These are advantages in parametric models for satisfying normality when SymTrak total score is used as an outcome measure (dependent variable) or for satisfying linearity when SymTrak total score is used as an independent variable to predict HRQOL utility. Future research could determine whether SymTrak’s linear relationship with HUI3 holds with other HRQOL utility questionnaires such as the EQ-5D.57,58

Although SymTrak-23 can serve as the full measure in certain research and clinical settings, SymTrak-8 may have a broader reach in busy primary care practice settings, as well as research studies, when (1) response burden is a concern or (2) the aggregate effect of symptoms and deficits is either a secondary outcome or a covariate. Moreover, its sensitivity to change can be valuable in monitoring treatment outcomes in trials or practice. The cumulative effect of symptoms and functional impairments tapped by SymTrak total score (8- or 23-item) is relevant to intervention research and clinical practice among multimorbidity populations, given that some common treatments tend to synergistically affect multiple conditions and symptoms simultaneously.20,21,59,60