INTRODUCTION

Type 1 diabetes mellitus (T1DM) is an autoimmune disease that gradually destroys beta cells in the pancreas leading to absolute insulin deficiency.1 Insulin replacement therapy mimics beta cell function by replacing both basal and prandial insulin secretion.2 Basal insulin replacement can be achieved with intermediate-acting insulin (e.g., isophane insulin Neutral Protamine Hagedorn; NPH, zinc insulin Lente),3 long-acting or ultra-long-acting insulin analogues (e.g., glargine, detemir,4 degludec).5 Regular human insulin, insulin analogues, and their biosimilar counterparts are complex biological molecules made from a similar manufacturing process.6 Structurally identical to their reference insulin analogues, biosimilar basal insulins were intended to function similarly to their reference insulin analogues6,7,8,9,10 and welcomed into the market for their potential cost-savings.11 However, biosimilar manufacturing has also been associated with differences in pharmacokinetic and pharmacodynamic properties, thereby potentially influencing insulin efficacy and safety,6,7,8,9,10 and the impact of this is unclear. We aimed to update our prior systematic review12 including biosimilars to evaluate the comparative efficacy and safety of ultra-long-/long-/intermediate-acting insulin compared to each other and biosimilar insulin.

METHODS

Protocol

Policy-makers from Health Canada and the Canadian Agency for Drugs and Technologies in Health (CADTH) commissioned the review, which was informed by the World Health Organization (WHO) insulin access initative.13

A protocol was prepared using the Preferred Reporting Items for Systematic Review and Meta-analysis Protocols (PRISMA-P)14 and the Cochrane Handbook15 and registered with PROSPERO (CRD42017077051). Results are reported using the PRISMA-NMA16 and International Society for Pharmacoeconomics and Outcomes Research (ISPOR-NMA) tool.17

Eligibility Criteria

Patients

Adults (≥ 16 years of age) with T1DM for any duration of time, excluding animal studies.

Interventions

Ultra-long-/long-/intermediate-acting basal/bolus type of insulin therapy, with basal (taken between meals) and bolus (taken at mealtime) administered separately. Bolus insulin had to be specified and administered to all participants in all intervention and control groups. Bolus insulin included rapid- or short-acting insulin, while basal insulin included ultra-long, long- and intermediate-acting insulin. Insulin pumps or pre-mixed insulin preparations (e.g., long- and short-acting insulin combined) were excluded.

Comparator(s)/control

Ultra-long-/long-/intermediate-acting insulin, biosimilar insulin, no treatment.

Primary outcomes

Efficacy: glycemic control (glycated hemoglobin [A1c], FPG).

Secondary outcomes

Efficacy: all-cause mortality, diabetes-related morbidity (macrovascular, microvascular), health-related quality of life. Safety: weight change, hypoglycemia (all-cause, serious, minor, nocturnal), incident cancer, total adverse events (AEs), serious AEs, dropouts due to AEs.

Study designs

Experimental (randomized controlled trials [RCTs], non-randomized controlled trials, quasi-randomized trials), quasi-experimental (interrupted time series, controlled before and after studies), cohort studies.

Data Sources and Searches

Literature searches were developed by an experienced information scientist and peer-reviewed by a second using Peer Review of Electronic Search Strategies (PRESS)18 and executed in MEDLINE, EMBASE, and Cochrane Central Register of Controlled Trials (CENTRAL) (inception until March 27, 2019). Grey (i.e., difficult to locate/unpublished) literature was identified19 by searching the following: public health websites, drug regulatory websites, conference abstracts, and clinical trial registries. Reference lists of previous reviews and included studies were scanned. No restrictions on date, duration, or language were imposed.

Study Selection

Pairs of reviewers independently screened literature search results using Synthesi.SR software.20 A calibration exercise was conducted on 50 titles/abstracts and 80% agreement was achieved. Two team members independently screened remaining citations with 90% agreement. Two calibration exercises were completed using 20 eligible full-text articles each, until 65% agreement was achieved. Remaining full-text articles were screened independently by two reviewers with 81% agreement. Conflicts were resolved through discussion or by a third reviewer.

Quality Assessment

Risk-of-bias (ROB) of included studies was appraised independently by two reviewers using the Cochrane ROB tool for RCTs, Cochrane Effective Practice and Organization of Care (EPOC) tool for non-randomized controlled trials, and the Newcastle-Ottawa Scale for cohort studies. Conflicts were resolved by discussion or by a third reviewer.

Data Items and Abstraction

Data were abstracted on study characteristics, patient characteristics, and outcome results (e.g., A1c) including their definitions for the longest duration of follow-up.15 A draft data abstraction form was created and a calibration exercise was completed for five studies. Subsequently, two reviewers independently abstracted relevant outcome data for each included study. A third reviewer resolved conflicts. Authors were emailed for missing data or data clarifications.

Data Analysis

Pairwise Meta-Analysis

Due to the small number of studies available for meta-analyses (MAs),21 fixed-effect MAs were conducted for direct pairwise comparisons of treatments using the odds ratio (OR) effect measure for dichotomous data and the mean difference (MD) for continuous data,15 and adjusted for the effect of the bolus covariate (rapid versus short) using meta-regression.15

For a cross-over RCT to contribute outcome data, the trial had to account for the paired nature (i.e., repeated, subject-level measurements) of the study design in conducting and reporting the study-specific effect estimate and its corresponding measure of uncertainty (e.g., confidence interval [CI], standard error),22 which was verified by a statistician (ZB). For studies with dichotomous outcomes that reported zero events in one treatment arm, Stata23 automatically added 0.5 to the numerator and one to the denominator. Studies reporting that participants in both treatment arms experienced 0% events or 100% events were excluded from the analysis.15 Heterogeneity was assessed using the I2 statistic.24

NMA

For a connected network with the number of studies exceeding the treatment nodes, a network diagram was drawn and random-effects network meta-analysis (NMA) was performed.25, 26 Treatment nodes27, 28 were selected by clinicians to capture the major insulin class, origin of the insulin, and administration frequency. Two sets of analyses were conducted; one based on basal insulin class and the other based on insulin class/origin/frequency; Table 1. A common within-network estimate for the heterogeneity parameter across comparisons was estimated with the restricted maximum likelihood (REML) method. Transitivity assumption was assessed by examining distributions of treatment effect modifiers across comparisons.25, 29,30,31 Global consistency was assessed between direct and indirect evidence across the entire network using the design-by-treatment interaction model.32, 33 Statistically significant global inconsistency led to local consistency being assessed via the loop-specific method.34 Treatment effect estimates, 95% CIs, and 95% predictive intervals (PrI) were calculated.16 To assess the presence of small-study effects, a comparison-adjusted funnel plot was drawn per outcome.29, 35 Within each plot, comparison-adjusted treatment effect estimates were ordered from earliest to most recent according to Health Canada/Federal Drug Administration approval dates. Analyses were performed using Stata version 15.1 using the following packages: metan, metareg, mvmeta, and network commands.23, 35, 36 Additional analysis methods are available in Appendix Methods 1.

Table 1 List of Basal Insulin Analogues Included in the Review

Role of the Funding Source

The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit for publication.

RESULTS

Literature Search

The literature search identified 21,346 titles/abstracts, of which 1170 were potentially relevant (Fig. 1). Sixty-five unique studies and 13 companion reports were included (Appendix File 1); 27 and 37 studies were included in the basal insulin class and insulin class/origin/frequency analyses, respectively. Six37,38,39,40,41,42 non-English studies and two protocols43, 44 were included. Authors of 89 studies were emailed and responses were received for 18 studies; one45 provided additional data for the analysis.

Figure 1
figure 1

PRISMA flow diagram.

Patient Characteristics

Across the included studies, sample sizes ranged from eight to 749, with a total of 14,200 patients. The proportion of females ranged from 0 to 100%, average age ranged from 23 to 54 years, average baseline A1c ranged from 7 to 10%, average body mass index (BMI) ranged from 22 to 28, and duration of T1DM ranged from 8 to 27 years (Table 2, Appendix Table 1).

Table 2 Study, Patient, Intervention, and Outcome Characteristics

Study Characteristics

Publication years of the 65 included studies ranged from 1984 to 2018. Sixty-four studies were RCTs, of which 41 (64%) were parallel RCTs and 23 (36%) were cross-over RCTs. One study was a non-randomized controlled trial. Most of the studies took place across multiple centers in Europe and North America, with few studies from low-and-middle-income countries (LMICs) (Table 2, Appendix Table 2).

ROB and Quality Appraisal Results

A score of unclear/high ROB was given for the majority of RCTs regarding allocation concealment (75%), blinding of participants and personnel (78%), blinding of outcome assessment (44%), incomplete outcome data (28%), selective reporting (63%), and “other” bias (e.g., funding bias, 92%) (Appendix Table 3, Appendix Figure 1). A single non-randomized controlled trial was assessed using the Cochrane EPOC ROB Tool, which scored unclear for 7/9 items and high ROB for random sequence generation and incomplete outcome data (Appendix Table 4).

NMA Results

All statistically significant results of NMA for each outcome are provided (Tables 3 and 4). In Table 3, a summary of all treatment comparisons and outcomes is provided and whether the results were statistically significant or not. Unless otherwise noted, the type of insulin was human. All results whether statistically significant or not can be found in Appendix Data 1, Appendix Tables 517, and Appendix Figures 29.

Table 3 Summary of pooled clinical outcomes
Table 4 Statistically significant treatment comparisons

Primary Outcomes

A1c

For basal insulin class, NMA for the A1c outcome included 25 RCTs and 8327 patients. Long-acting insulin had a greater A1c reduction compared to intermediate-acting insulin (MD - 0.14, 95% CI: - 0.22 to - 0.06) (Appendix Tables 58, Appendix Figures 34). In addition, ultra-long-acting insulin had a greater A1c reduction compared to intermediate-acting insulin (MD - 0.08, 95% CI: - 0.25 to 0.10) but not long-acting insulin (MD 0.06, 95% CI: - 0.10 to 0.22).

For a specific type of insulin, NMA was conducted on the A1c outcome and included 34 RCTs and 11,654 patients and 9 treatment nodes (Appendix Table 14, Appendix Figure 7). Transitivity assumption was verified and there was no evidence of inconsistency (Appendix Table 13, Appendix Figure 6), yet there might be bias associated with small-study effects (Appendix Figure 8). There were 36 treatment comparisons and the following demonstrated a statistically significant difference:

Intermediate-acting (human) once a day (od) was inferior to:

  1. 1.

    Intermediate-acting (human) twice a day, bid (MD 0.11, 95% CI: 0.01 to 0.21)

Intermediate-acting (human) four-times a day (qid) was inferior to:

  1. 2.

    Intermediate-acting (animal and human), bid (MD 0.31, 95% CI: 0.06 to 0.56)

  2. 3.

    Intermediate-acting (human), od (MD 0.32, 95% CI: 0.12 to 0.53)

  3. 4.

    Intermediate-acting (human), bid (MD 0.43, 95% CI: 0.24 to 0.63)

Long-acting (human) od was superior to:

  1. 5.

    Intermediate-acting (animal and human), bid (MD - 0.19, 95% CI - 0.36 to - 0.02)

  2. 6.

    Intermediate-acting (human), od (MD - 0.18, 95% CI - 0.27 to - 0.08)

  3. 7.

    Intermediate-acting (human), qid (MD - 0.50, 95% CI - 0.68 to - 0.32)

Long-acting (human) bid was superior to:

  1. 8.

    Intermediate-acting (animal), bid (MD - 1.27, 95% CI - 2.53 to - 0.01)

  2. 9.

    Intermediate-acting (animal and human), bid (MD - 0.19, 95% CI - 0.37 to - 0.01)

  3. 10.

    Intermediate-acting (human), od (MD - 0.18, 95% CI - 0.29 to - 0.07)

  4. 11.

    Intermediate-acting (human), bid (MD - 0.07, 95% CI - 0.13 to - 0.01)

  5. 12.

    Intermediate-acting (human), qid (MD - 0.50, 95% CI - 0.70 to - 0.30)

Long-acting (biosimilar) od was superior to:

  1. 13.

    Intermediate-acting (human), od (MD - 0.14, 95% CI - 0.27 to - 0.01)

  2. 14.

    Intermediate-acting (human), qid (MD - 0.46, 95% CI - 0.66 to - 0.26)

  3. 15.

    Ultra-long-acting (human) od was superior to:

  4. 16.

    Intermediate-acting (human), od (MD - 0.14, 95% CI - 0.28 to - 0.01)

  5. 17.

    Intermediate-acting (human), qid (MD - 0.47, 95% CI - 0.67 to - 0.26)

Several sensitivity analyses (Appendix Table 15) were conducted to examine the impact of imputing missing standard deviations (SDs) on the results, controlling for studies involving Lente insulin or cross-over RCTs. This resulted in the exclusion of 7 to 8 trials for each sensitivity analysis. The direction of the effect was maintained; however, all statistically significant pairwise treatment comparisons reported above were no longer statistically significant, likely because of the small number of remaining trials. Sub-group analyses (Appendix Table 16) were conducted for bolus type, follow-up duration, study design, ROB associated with random sequence generation and allocation concealment, age, proportion of women, duration of diabetes, and A1c level (mild: <8%, severe: ≥8%); none of the results was statistically significant.

Fasting Plasma Glucose (FPG)

For basal insulin class, NMA for the FPG outcome included 21 RCTs, 7685 patients, and three treatment nodes. Long-acting insulin had a greater FPG reduction compared to intermediate-acting insulin (MD - 1.03, 95% CI: - 1.33 to - 0.73) and ultra-long-acting insulin had a greater FPG reduction compared to intermediate-acting insulin (MD - 1.45, 95% CI: - 2.12 to - 0.79) and long-acting insulin (MD - 0.42, 95% CI: - 1.02 to 0.18) (Appendix Tables 67, Appendix Figures 34).

For a specific type of insulin, NMA was conducted on the FPG outcome and included 29 RCTs, 10,290 patients, and 8 treatment nodes (Appendix Table 14, Appendix Figure 7). Transitivity assumption was verified and there was no evidence of inconsistency (Appendix Table 13, Appendix Figure 6), yet there might be bias associated with small-study effects (Appendix Figure 8). There were 28 treatment comparisons and the following demonstrated a statistically significant difference:

Long-acting (human) od was superior to:

  1. 1.

    Intermediate-acting (human), od (MD - 1.15, 95% CI: - 1.79 to - 0.50)

  2. 2.

    Intermediate-acting (human), bid (MD - 1.26, 95% CI: - 1.65 to - 0.87)

  3. 3.

    Intermediate-acting (human), qid (MD - 0.90, 95% CI: - 1.79 to - 0.01)

  4. 4.

    Long-acting (human), bid (MD - 0.44, 95% CI: - 0.81 to - 0.06)

Long-acting (human) bid was superior to:

  1. 5.

    Intermediate-acting (human), bid (MD - 0.82, 95% CI: - 1.20 to - 0.44)

Long-acting (biosimilar) od was superior to:

  1. 6.

    Intermediate-acting (human), od (MD - 1.05, 95% CI: - 1.95 to - 0.16)

  2. 7.

    Intermediate-acting (human), bid (MD - 1.16, 95% CI: - 1.91 to - 0.42)

Ultra-long-acting (human) od was superior to:

  1. 8.

    Intermediate-acting (human), od (MD - 1.44, 95% CI: - 2.31 to - 0.58)

  2. 9.

    Intermediate-acting (human), bid (MD - 1.55, 95% CI: - 2.22 to - 0.89)

  3. 10.

    Intermediate-acting (human), qid (MD - 1.20, 95% CI: - 2.26 to - 0.13)

  4. 11.

    Long-acting (human), bid (MD - 0.73, 95% CI - 1.36 to - 0.11)

Secondary Outcomes

Weight Change

For basal insulin class, NMA was conducted on weight change with 15 RCTs, 5908 patients, and three treatment nodes. Long-acting insulin reduced weight gain compared to intermediate-acting insulin (MD - 0.70, 95% CI: - 1.08 to - 0.32) (Appendix Table 6, Appendix Figure 3). Ultra-long-acting insulin reduced weight gain compared to intermediate-acting insulin (MD - 0.53, 95% CI: - 1.25 to 0.18) but not long-acting insulin (MD 0.17, 95% CI: - 0.44 to 0.77). Four studies were removed in sensitivity analysis due to the potential for bias associated with small-study effects; ultra-long-acting insulin was statistically superior to intermediate-acting insulin (MD - 0.80, 95% CI: - 1.29 to - 0.32) (Appendix Table 7, Appendix Figure 4).

For a specific type of insulin, NMA was conducted on weight change with 20 RCTs, 7938 patients, and 7 treatment nodes (Appendix Table 14, Appendix Figure 7). Transitivity assumption was verified and there was no evidence of inconsistency (Appendix Table 13, Appendix Figure 6), yet there might be bias associated with small-study effects (Appendix Figure 8). There were 21 treatment comparisons and the following demonstrated a statistically significant difference:

Long-acting (human) od was inferior to:

  1. 1.

    Long-acting (human), bid (MD 0.58, 95% CI: 0.05 to 1.10)

Long-acting (human) bid was superior to:

  1. 2.

    Intermediate-acting (human), od (MD - 1.22, 95% CI: - 2.11 to - 0.32)

  2. 3.

    Intermediate-acting (human), bid (MD - 0.86, 95% CI: - 1.23 to - 0.48)

  3. 4.

    Long-acting (biosimilar), od (MD - 0.90, 95% CI: - 1.67 to - 0.12)

Major or Serious Hypoglycemia

For basal insulin class, NMA was conducted on the major or serious hypoglycemia outcome (defined in Appendix Table 12) with 16 RCTs, 6900 patients, and three treatment nodes. Long-acting insulin was associated with a reduced incidence of major or serious hypoglycemic episodes compared to intermediate-acting insulin (OR 0.63, 95% CI: 0.51 to 0.79) (Appendix Tables 67, Appendix Figures 34). Ultra-long-acting insulin reduced major or serious hypoglycemic episodes compared to intermediate-acting insulin (OR 0.71, 95% CI: 0.43 to 1.17) but not compared to long-acting insulin (OR 1.12, 95% CI: 0.71 to 1.77).

For a specific type of insulin, NMA was conducted on major or serious hypoglycemia with 20 RCTs, 8240 patients, and 6 treatment nodes (Appendix Table 14, Appendix Figure 7). Transitivity assumption was verified and there was no evidence of inconsistency (Appendix Table 13, Appendix Figure 6), yet there might be bias associated with small-study effects (Appendix Figure 8). There were 15 treatment comparisons and the following demonstrated a statistically significant difference:

Long-acting (human) od was associated with less incidence when compared to:

  1. 1.

    Intermediate-acting (human), od (OR 0.61, 95% CI: 0.40 to 0.94)

  2. 2.

    Intermediate-acting (human), bid (OR 0.56, 95% CI: 0.39 to 0.80)

Long-acting (human) bid was associated with less incidence when compared to:

  1. 3.

    Intermediate-acting (human), bid (OR 0.70, 95% CI: 0.52 to 0.93)

Nocturnal Hypoglycemia

For basal insulin class, NMA was conducted for nocturnal hypoglycemia (defined in Appendix Table 12) with 13 RCTs, 5423 patients, and three treatment nodes. Long-acting insulin (OR 0.74, 95% CI: 0.58 to 0.94) and ultra-long-acting insulin (OR 0.64, 95% CI: 0.41 to 0.99) lowered the incidence of nocturnal hypoglycemic episodes compared to intermediate-acting insulin (Appendix Tables 67, Appendix Figures 34). In addition, ultra-long-acting insulin was associated with a lower risk of nocturnal hypoglycemic episodes compared to long-acting insulin (OR 0.86, 95% CI: 0.60 to 1.24).

For a specific type of insulin, NMA was conducted on nocturnal hypoglycemia with 16 RCTs, 6318 patients, and 6 treatment nodes (Appendix Table 14, Appendix Figure 7). Transitivity assumption was verified and there was no evidence of inconsistency (Appendix Table 13, Appendix Figure 6). However, there might be bias associated with small-study effects (Appendix Figure 8). Across 15 treatment comparisons, only one was statistically significant and long-acting administered bid was associated with less incidence when compared to intermediate-acting administered bid (OR 0.61, 95% CI: 0.43 to 0.87).

Other Secondary Outcomes

No statistically significant results were found across treatment comparisons where NMA and or MA was done for the following outcomes: mortality, any vascular complications, microvascular complications, macrovascular complications, quality-of-life, all-cause hypoglycemia, minor or mild hypoglycemia, incident cancers, any AEs, serious AEs, and dropout due to AEs.

Rank-Heat Plots

Across basal insulin class NMAs, the results suggest that long-acting insulin has the greatest likelihood of being the most effective and safest (Appendix Figure 5). Across the insulin class/origin/frequency NMAs, the results suggested that long-acting biosimilar insulin administered once daily had the greatest likelihood of being the most effective and safe (Appendix Figure 9).

DISCUSSION

For basal insulin classes, long-acting insulin was superior to intermediate-acting insulin across the outcomes including A1c, FPG, weight, major or serious hypoglycemia, and nocturnal hypoglycemia. In addition, ultra-long-acting insulin was statistically superior to intermediate-acting insulin for FPG and nocturnal hypoglycemia. For fasting blood glucose, long-acting od was superior to long-acting bid and ultra-long-acting od was superior to long-acting bid. For weight change, long-acting od was inferior to long-acting bid and long-acting bid was superior to long-action biosimilar od. These results are inconsistent with recent clinical practice guidelines that recommend ultra-long-acting insulin,2 likely because the guidelines included patients with T1DM and type 2 diabetes.46 We included the same two studies looking at T1DM,47, 48 as well as an additional 10 studies.49,50,51,52,53,54,55,56,57,58 The rank-heat plots suggest that long-acting insulin had the greatest likelihood of being the most effective and safest compared with intermediate-acting insulin and ultra-long-acting insulin. For the specific types of insulin, long-acting biosimilar od had the greatest likelihood of being the most effective and safest according to the rank-heat plot. Only one statistically significant difference was observed between the biosimilar insulin and human/animal insulin, which was for weight change, and is inconsistent with previous research on biosimilars.7

In LMICs, food insecurity might hamper the regular alternation of food needed when NPH is used to provide the basal concentration of insulin. NPH is associated with more severe hypoglycemia events. Furthermore, access to glucagon, a high-cost product, in case of severe hypoglycemia, is limited in many low-resource settings.59 Long-acting analogues of insulin, even when administered od, allow for more flexibility in the number and timing of meals and can be associated with better compliance when compared to NPH. The WHO set a target to reduce the risk of premature noncommunicable disease death by 25% by 2025.60 To achieve this, the focus must now be on the implementation strategies of insulins (and other interventions), and reducing uncertain access related to affordability and prices. Various mechanisms like bulk purchasing contracts, biosimilar availability, increasing tender competition and identifying reasonable rebates can protect financial stability for both patients and countries.

Our systematic review has several strengths, including following the Cochrane Handbook15 and ISPOR guidance.17 However, our results should be interpreted with caution, as many of the NMAs had a small number of included studies and many had high ROB on several items. Other limitations include that we were unable to abstract data for cardiovascular-related mortality and healthcare utilization and that our results may have been impacted by the inclusion of a large number of cross-over studies, animal bolus insulin studies, and assumption that the bolus insulin was human if it was not specified in the study.

In conclusion, ultra-long-acting and long-acting insulin were superior to intermediate-acting insulin. Furthermore, long-acting od is more effective than long-acting bid and ultra-long-acting od is more effective than long-acting bid for fasting blood glucose. For weight change, long-acting od was less effective than long-acting bid and long-acting bid was more effective than long-action biosimilar od. Our results can be used by patients and healthcare providers to tailor their choice of insulin treatment according to a desired outcome. To attain the WHO goal, policy-makers must activate policies supporting access to insulins by making them accessible and affordable.