INTRODUCTION

Despite peaking in 2010, opioid analgesic prescribing remains at historically high levels.1,2 Long-term opioid prescribing is associated with increased risk of opioid-related harms including overdose death.3,4 In an attempt to mitigate this risk, clinicians have adopted urine drug tests (UDTs) as a monitoring strategy for patients prescribed opioids for chronic pain. Several opioid prescribing guidelines recommend UDT, including the Centers for Disease Control and Prevention (CDC) Guideline for Prescribing Opioids for Chronic Pain and the Department of Veterans Affairs and Department of Defense Clinical Practice Guideline for Opioid Therapy for Chronic Pain.5,6,7 However, these recommendations are based largely on expert opinion. Two systematic reviews identified only weak evidence to support use of UDT as a risk mitigation strategy.8,9

UDTs can detect potential substance misuse, including diversion, through identification of non-prescribed substances or lack of detection of a prescribed medication. However, commonly used immunoassay UDTs are challenging to interpret, and clinicians often make mistakes in interpreting them.10,11,12 For example, running out of medication early and infrequent use of a medication prescribed as needed may both result in non-detection on UDT. Another scenario is the incomplete cross-reactivity of certain medications with immunoassay tests as exemplified by inconsistent detection of oxycodone via opiate immunoassays. In prior studies, 25–54% of patients prescribed opioids for chronic pain had UDT results concerning for misuse or diversion.13,14,15,16 These studies dichotomized UDT results as being concerning for misuse or diversion or not, which fails to recognize and quantify the clinical uncertainty and complexity associated with UDT interpretation.

Our objective was to adjudicate UDT results as clinically not concerning, uncertain, or concerning for substance misuse among a cohort of patients prescribed opioids for chronic pain; uncertain results are non-diagnostic and require additional information or testing for interpretation. We analyzed data from a cluster-randomized trial of an intervention to improve adherence to opioid prescribing guidelines.

METHODS

Study Design and Data Source

We conducted a retrospective cohort study using data from Transforming Opioid Prescribing in Primary Care (TOPCARE), a cluster-randomized trial of a multicomponent intervention to improve adherence to opioid prescribing guidelines. The trial protocol and main study results have been described previously; this study was not a pre-planned secondary analysis.17,18 The study took place from January 2014 through March 2016 at four safety-net clinics in Boston, Massachusetts. Primary care clinicians (n=53) were randomized to the intervention arm (n=25) consisting of nurse care management, an electronic registry, academic detailing, and web-based decision support tools, or control arm receiving web-based decision support tools only. Nurse care managers collaborated with clinicians in the intervention arm to conduct assessments and monitoring. Nurse care managers facilitated UDT collection and interpretation and documented patient-reported substance use behaviors that are inconsistently identified in routine clinical care.19 We abstracted clinical data from the electronic health record (EHR). The Boston University Medical Campus Institutional Review Board approved this study.

Cohort Selection

We included patients who (i) were aged 18–89 years, (ii) had a primary care clinician randomized to the intervention arm, (iii) were receiving long-term opioid therapy for chronic pain, and (iv) received at least one UDT in the year following provider randomization. Long-term opioid therapy was defined as three or more opioid prescriptions written at least 3 days apart over a 6-month period in the 1 year prior to or following provider randomization. We excluded patients with malignancy other than non-melanoma skin cancer identified by ICD-9 diagnosis codes (Appendix Table 1) and three or more hematology/oncology visits in the year prior to or following randomization. A schematic of time windows to identify inclusion, exclusion, and baseline characteristics is included re-use the file that we have rejected or attempt to increase its resolution and re-save. It is originally poor, therefore, increasing the resolution will not solve the quality problem. We suggest that you provide us the original format. We prefer replacement figures containing vector/editable objects rather than embedded images. Preferred file formats are eps, ai, tiff and pdf. NOTE: I just captured the appendix 1 as continuation to figure 1 and named as figure 2. Please confirm if captured correctly. TY" --> as Appendix Figure 1.

Outcome

The analytic sample comprised all UDTs of patients meeting study inclusion criteria in the year following randomization. We included 8 immunoassay tests commonly performed at participating institutions during the study period: amphetamine, barbiturates, benzodiazepines, buprenorphine, cocaine, methadone, opiates, or oxycodone. Immunoassay tests, referred to as presumptive or screening tests, use antibodies to identify presence of substances in urine above a prespecified threshold. The detection thresholds for immunoassay tests, which varied by study site, are specified in Appendix Table 2. The primary study outcome was the categorization of each UDT result as not concerning, uncertain, or concerning through expert adjudication, as detailed below.

UDT Adjudication

We adjudicated UDT results in a two-stage process. First, we compared UDT results with those expected from reviewing the prescription history. We expected UDT to detect prescribed substances starting on the date written through the number of days supplied plus an additional 2 days to account for continued detection of substances in urine after most recent use. If the UDT result was discordant with the expected result, the result was flagged for further review. For example, a positive oxycodone test result for an individual with an active oxycodone prescription was not flagged for further review. Notably, oxycodone is a semi-synthetic opioid that may or may not cross-react with opiate immunoassays. Thus, if a patient’s UDT was positive for both oxycodone and opiates and the patient had an active oxycodone prescription but no other opioid that would react with the opiate assay, the opiate assay result was flagged for further adjudication. If no UDTs from a given sample were flagged, results were adjudicated as not concerning.

During the second stage, we abstracted detailed clinical data via chart review for those UDTs flagged for further review. These data included the following: (i) result of UDT immunoassays; (ii) date, strength, quantity, and number of days supplied of up to two opioid, benzodiazepine, barbiturate, and amphetamine prescriptions on or 180 days prior to the UDT collection date; (iii) all other medications; (iv) UDT collection information, including the reason for visit, patient-reported last use of medication, and any provider comments about urine collection; (v) follow-up notes and discussion of UDT results; (vi) most recent and up to two subsequent UDT results; (vii) collateral information including dosage changes, pill counts, active or past substance use disorder diagnoses, and other relevant clinical documentation within 6 months before or after the UDT; and (viii) results of definitive UDTs.

Two study physicians with expertise in chronic pain and addiction (ML and RC) reviewed abstracted data for each UDT and independently assessed each immunoassay test result as not concerning, uncertain, or concerning. We categorized UDT results as not concerning if the result was concordant with use of a prescribed medication as directed. If the UDT result was discordant, we assigned an outcome of concerning if supporting clinical data suggested misuse or diversion, and uncertain if the clinical data were inconclusive. We discussed discordant categorization between the two reviewers to attain consensus. We developed and continuously updated an adjudication document to guide outcome classification for specific scenarios and consulted a third study physician [JL] when specific guidance was unclear. We periodically updated the adjudication guide with input from the entire study team.

Other Variables

We abstracted demographic data, including age, sex, race/ethnicity, and insurance from the EHR. We identified comorbidities via visit diagnoses or the problem list in the 12 months preceding the study period using ICD-9-CM diagnosis codes. We included diagnoses previously found to be associated with opioid misuse categorized in four groups: alcohol use disorder, substance use disorder, tobacco use disorder, and mental health diagnoses (Appendix Table 1).20,21,22 We calculated the mean daily morphine milligram equivalents (MME) for opioid prescriptions written in the 90 days prior to the start of the study period using standard conversion factors from the CDC.23

Analysis

We present descriptive baseline demographic and clinical characteristics for the cohort. We used Cohen’s kappa to measure interrater reliability of the two physician adjudicators for each immunoassay test. We present outcomes for each UDT stratified by combinations of target drug and test result. We assigned each subject a mutually exclusive category summarizing UDT outcomes over the 1-year study period as not concerning if all UDTs were not concerning, concerning if at least one UDT was concerning unexpected, and uncertain if at least one UDT was uncertain, but none was concerning. We conducted a multivariable logistic regression to identify baseline characteristics associated with having at least one concerning UDT in the 1-year intervention period using generalized estimating equations to account for repeated measures within individuals. We conducted analyses in R (R Core Team, 2020), and the regression in SAS version 9.4 using PROC GENMOD (SAS Institute).

Results

We identified 638 patients who met inclusion criteria; 49% were over age 55 years and 48% were female (Table 1). The sample had racial and ethnic diversity: 42% of patients were non-Hispanic White, 39% non-Hispanic Black, and 8% Hispanic. A majority (60%) of patients had a mental health diagnosis, 17% a substance use diagnosis, and 14% an alcohol use diagnosis. Most patients had Medicaid (39%) or Medicare (38%) as their primary insurance.

Table 1 Baseline Demographic and Clinical Characteristics of Study Patients (n=638)

In the 90 days prior to the start of the study period, 12.4% of patients received more than 100 mg mean daily MED. During the intervention year, oxycodone was the most commonly prescribed opioid, received by 82% of patients, followed by morphine and hydrocodone received by 16% and 14%, respectively (Appendix Table 3). Just over one-fourth of patients received a benzodiazepine (27%), and a minority of patients received amphetamines (1.6%), barbiturates (1.6%), or buprenorphine (0.2%).

The analytic sample consisted of 2,218 UDT samples. Patients had a median of three UDT samples (interquartile range 2, 4). We identified 1,009 UDT samples (45%) with a UDT result discordant from prescription status that were flagged for further review. Definitive UDT results and time of last opioid dose were available for 25% and 38% of adjudicated UDTs, respectively. Interrater reliability of adjudication was categorized as moderate or better (kappa ≥ 0.60) for 14 of 17 immunoassay test/result combinations (Appendix Table 4).24 Interrater reliability was weak for negative opiate and oxycodone tests (kappa 0.49 and 0.59 respectively) reflecting challenges in interpreting negative test results for patients prescribed oxycodone.

We summarized the consensus criteria for UDT adjudication in Table 2. As an example, if a UDT did not detect a prescribed substance, we considered that result uncertain unless we had additional data from the EHR to support potential misuse or diversion within 6 months before or after the date of the UDT. Such data included patients reporting they used more medications than directed, medications were diverted or stolen, or the patient did not adhere to monitoring via urine drug tests or pill counts.

Table 2 Operational Definition and Illustrative Examples of UDT Outcome Classification as Not Concerning, Uncertain, and Concerning by UDT Test Result

Over the 1-year study period, 235 patients (37%) had at least one concerning UDT, and an additional 222 (35%) had at least one uncertain UDT (Fig. 1). Similar proportions of patients had concerning positive UDTs detecting a non-prescribed substance (156, 24%), and concerning negative UDTs that did not detect a prescribed substance (147, 23%), and 68 (11%) had both concerning positive and concerning negative UDTs.

Figure 1
figure 1

Patients’ UDT results summarized as concerning (one or more concerning UDT), uncertain (one or more uncertain, but none concerning), or not concerning (all not concerning) for all UDT results and by UDT result: a concerning positive result identifies presence of a substance without a prescription, and a concerning negative result identifies lack of detection of a prescribed substance.

UDT outcomes varied by immunoassay and immunoassay test result (Table 3). For cocaine, and the immunoassays targeting amphetamine, barbiturate, buprenorphine, and methadone, medications that were less commonly prescribed, more than 95% of tests were not concerning. For these five immunoassays, the majority of test results were negative and more than 99% of negative test results were not concerning. Positive test results were less common, but more likely to be identified as concerning or uncertain.

Table 3 Adjudicated UDT Results by Immunoassay Result, n (%)

For immunoassay tests targeting more commonly prescribed medications of oxycodone, opiates, and benzodiazepines, results were more mixed (Table 3). We identified only 3% of positive oxycodone test results as concerning, but 32% of negative oxycodone tests as concerning. For opiate tests, 6% of positive tests were concerning, compared with 3% of negative tests. For benzodiazepine tests, 18% of positive tests were concerning, compared with 2% of negative tests.

In multivariable analysis, several patient characteristics were associated with having a concerning UDT (Table 4). Patients aged 18–34 years were more likely than those over 65 to have one or more concerning UDTs (adjusted odds ratio [AOR] 4.8 [95% confidence interval [CI] 1.9–12.1)]. Patients with mental health diagnoses (AOR 1.6 [95% CI 1.1–2.3]) and substance use disorder diagnoses (AOR 2.3 [95% CI 1.5–3.6]) were more likely to have a concerning UDT. We did not find an association between higher MME and odds of a concerning UDT.

Table 4 Association of Baseline Patient Characteristics with Having Any Concerning UDT Result, Any Concerning Positive Result, and Any Concerning Negative Result Over 1 Year (Results Presented Are Adjusted Odds Ratios and 95% Confidence Intervals)a,b

Predictors of identifying a non-prescribed substance (concerning positive UDT) and not detecting a prescribed substance (concerning negative UDT) differed in some ways (Table 4). Substance use disorder diagnosis was associated with detection of a non-prescribed substance (AOR 3.5 [95% CI 2.2–5.5]), but not failure to detect a prescribed substance (AOR 1.3 [95% CI 0.8–2.1]). Having a mental health diagnosis was associated with non-detection of a prescribed substance (AOR 2.1 [95% CI 1.3–3.3]) but not detection of a non-prescribed substance (AOR 1.4 [95% CI 0.9–2.2]).

Discussion

In a sample of 638 patients receiving opioids for chronic non-cancer pain, we identified 37% with one or more concerning UDT results suggestive of misuse or diversion over a 1-year period. We identified an additional 35% with one or more uncertain UDT results. These data suggest that UDTs may often provide actionable information to monitor patients prescribed opioids for chronic pain. The high frequency of uncertain UDTs highlights that these test results are not definitive; clinicians should consider UDTs as only one component of a comprehensive patient-centered approach to safer opioid prescribing.

Our results build on prior studies in several ways. First, we analyzed detailed clinical data, including nurse care manager documentation of patient-reported use behaviors at the time of and follow-up to UDT collection. We found that additional clinical context, including patient-reported use behaviors, was important for adjudicating results. For example, when prescribed opioids were not detected on UDT, some patients reported intermittent low volume use which we adjudicated as uncertain; and, some endorsed running out of medication early, which we adjudicated as concerning. Second, we included uncertain as an adjudication category to identify the clinical uncertainty inherent to UDTs.

The opiate and oxycodone UDTs that target frequently prescribed medications in our cohort had the highest yield in identifying concerning results, most often for absence of the prescribed medication. Benzodiazepines and cocaine were the most common assays to identify concerning use of a non-prescribed substance. UDT panels should be periodically reviewed and updated to reflect local prescribing and substance use patterns. This strategy would permit detection of emerging threats such as fentanyl, a synthetic opioid, not detected by standard opiate immunoassay tests.25 Removing UDTs for substances with low local use patterns may reduce unintended consequences associated with false positive tests.

The adjudication process led to the development and refinement of operational definitions for how to interpret potentially ambiguous UDT results. While adjudication concordance between reviewers in our study was good, it was not uncommon for the reviewing clinicians with expertise in pain and addiction to have different initial categorizations of a UDT result that needed reconciliation. It is clear that errors in UDT interpretation are common.10,11 In one recent study, 28% of providers documented UDT interpretations that were discordant from expert laboratory toxicologist interpretation.26 Another underappreciated challenge in UDT interpretation is the detection threshold — the drug concentration at which a UDT will be reported as positive. Notably immunoassay test detection thresholds varied by clinical site in our study, meaning results from the same sample could vary by site.

Whether or not UDTs improve patient outcomes remains unclear. This study supports that UDTs may identify substance misuse or diversion, but also confirms that UDT interpretation is challenging. Misinterpretation of UDT results may negatively impact clinical decision-making — either from failing to identify concerning behaviors or incorrectly interpreting a result as a concerning behavior. Decision support tools may improve UDT interpretation; however, once identified, evidence-based approaches to respond to concerning use behaviors are lacking. An expert panel recommended clinicians discuss concerning use behaviors with patients and consider actions to mitigate risk.27 Strategies may include more frequent monitoring with UDTs or pill counts, shorter prescription length, or referral for substance use treatment. Tapering opioid therapy when risks outweigh benefits should be managed cautiously due to increasing evidence of harms from suicide or overdose associated with abrupt opioid discontinuation.28,29

Our study has several limitations. First, data were not prospectively collected for UDT adjudication. However, the availability of the nurse care manager as part of the TOPCARE trial led to more documentation than is typical. Second, we did not have access to Prescription Drug Monitoring Program (PDMP) data for this study due to limited access for research purposes in Massachusetts. Some uncertain test results may have been resolved with verification of prescriptions from providers from other health care systems available in the PDMP. Third, our findings may not be generalizable to other clinical settings, geographic areas, or time periods with different patterns of opioid analgesic prescribing or illicit substance use. The TOPCARE trial was conducted at a safety net academic medical center and affiliated community health centers that serve patients with higher baseline rates of substance use than other health systems. While patients were not consented or directly informed of the TOPCARE trial, their interaction with the Nurse Care Manager may have impacted their substance use behaviors. Further, the study was contemporaneous with the emergence of fentanyl in the illicit opioid supply, but fentanyl was not yet included in routine UDT at these institutions. Fourth, comorbidities identified via diagnosis codes from problem lists or clinical encounters may not distinguish between active versus past conditions, and providers may use codes for opioid use, abuse, or dependence to document physiologic opioid dependence rather than an opioid use disorder. Finally, this study focused on commonly available immunoassay UDTs and findings do not inform use of more expensive definitive UDTs based on gas or liquid chromatography that may provide more specific information.

In conclusion, in a large cohort of patients receiving opioids for chronic pain, 1 in 3 patients had UDT results concerning for misuse or diversion in a year-long period. An additional 1 in 3 patients had one or more UDT results adjudicated as uncertain, highlighting the clinical uncertainty associated with UDTs. From these data, UDTs appear to provide actionable data for monitoring patients prescribed opioids for chronic pain. However, their effectiveness in mitigating opioid-related harms is yet to be determined.