figure a

Introduction

Gestational diabetes mellitus (GDM), defined as glucose intolerance first recognised during pregnancy, affects up to14% of pregnancies worldwide [1, 2]. Although the cause remains uncertain, GDM is suspected to arise from the diminished capacity of the pancreas to produce sufficient insulin and impaired insulin action related to pregnancy. GDM pregnancy increases maternal complications [3] and infants of mothers with GDM are at significantly higher risk of obesity, dyslipidaemia and type 2 diabetes [4]. While maternal glucose tolerance generally returns to normal after delivery, GDM is associated with persistent long-term metabolic dysfunction and elevated risk of overt diabetes [5]. Up to 50% of women with GDM may progress to type 2 diabetes within 5–10 years postpartum [6, 7]. These women develop type 2 diabetes at a relatively younger age (e.g. <40 years) than the general population and have a higher risk of cardiovascular disease, non-alcoholic fatty liver, renal disease and early mortality [8,9,10,11,12,13,14,15]. The underlying cause of the transition from GDM to type 2 diabetes and the accurate prediction of this transition are therefore critical.

The ADA recommends that all women with GDM undergo screening for type 2 diabetes via a 2 h 75 g OGTT at 6–12 weeks postpartum followed by subsequent screening every 1–3 years via fasting plasma glucose (FPG) measurement and 2 h 75 g OGTT [16]. The discriminating power (AUC) of 2 h plasma glucose in the OGTT is at best 65–77% across studies [17,18,19]. Moreover, the compliance with ADA recommendations among this group for screening via an OGTT is very low (~19%) in many settings [19, 20]. This low compliance could in part be due to the time-consuming and/or unpleasant nature of the tests or healthcare system limitations [19, 21,22,23,24]. A simplified and more accurate prognostic test would be desirable to reclassify glucose tolerance after pregnancy and predict future type 2 diabetes progression following GDM pregnancy.

It is well known that the elevation in blood glucose in type 2 diabetes occurs long after the underlying metabolic changes that promote disease development. Thus, discovery-based metabolomics is considered a promising approach for both the early prediction and the identification of underlying pathways of future type 2 diabetes onset. This methodology has led to the identification of several biomarkers for future type 2 diabetes incidence [25,26,27]. Our group previously identified metabolic biomarkers of subsequent type 2 diabetes onset among women with recent GDM enrolled in the Study of Women, Infant Feeding and Type 2 Diabetes after GDM (SWIFT) prospective cohort [19]. Using clinical variables combined with metabolic biomarkers, including lipid species, we developed a simple four-structure metabolic signature—phosphatidylcholine (PC) aeC40:5, hexoses, branched-chain amino acids (BCAAs) and sphingomyelin (SM) (OH)C14:1—that predicted type 2 diabetes incidence with 83% discrimination power (AUC) in a nested pair-matched (1:1) case–control study of 244 SWIFT participants, where 12% of 1010 women with GDM progressed to type 2 diabetes within about 2 years post-delivery [19]. A smaller nested case–control study of metabolomics (lipidomics), targeting >300 lipid species in blood samples taken from 104 women with GDM at 12 weeks post-delivery, of whom 21 (20%) progressed to type 2 diabetes within 12 years, showed 83.6% accuracy in type 2 diabetes prediction based on three lipids—phosphatidylethanolamine (PE) P-36:2, phosphatidylserine (PS) 38:4 and cholesteryl ester (CE) 20:4—in combination with six other risk factors (age, BMI, pregnancy fasting glucose, postpartum fasting glucose, total triacylglycerols [TAGs] and total cholesterol) [28]. These promising findings provide evidence that novel metabolite markers combined with other factors can facilitate the prediction of type 2 diabetes risk.

Metabolomic studies can also be used to illuminate the pathophysiology of type 2 diabetes and its progression. Both stearoylcarnitine and BCAA levels increased in those who developed type 2 diabetes [29, 30], possibly linked to impaired pancreatic beta cell function [31]. Several specialised lipid metabolites (sphingomyelins [SMs], phosphatidylcholines [PCs] and lysophosphatidylcholines [LPCs]) were inversely associated with type 2 diabetes risk [32]. Our previous metabolomics study in the SWIFT cohort of women with GDM also showed decreased levels of several specialised lipid metabolites (sphingolipids and PCs) in the transition from GDM to type 2 diabetes [19]. These lipid metabolites are known core components of cell membranes and may be linked to type 2 diabetes progression [32, 33].

There is substantial evidence to suggest that lipid imbalances both predict and cause type 2 diabetes. Given the apparent links between lipid biosynthesis, metabolism and beta cell dysfunction leading to type 2 diabetes, the role of lipids has been collectively understudied with respect to diabetes risk. Herein, we used lipidomics to screen a large and broad spectrum of lipid metabolites in relation to subsequent type 2 diabetes development. This lipidomic study sought to identify lipid biomarkers and putative early-stage pathophysiology that may predict and influence future progression to type 2 diabetes in women after GDM pregnancy.

Methods

Study population

The prospective SWIFT cohort enrolled a racially and ethnically diverse group of 1035 women, with GDM (age 20–45 years), who delivered singleton pregnancies at ≥35 weeks of gestation at Kaiser Permanente Northern California (KPNC) hospitals between 2008 and 2011 [34, 35]. Each participant provided informed consent at the in-person examination at 6–9 weeks postpartum (baseline) before collection of blood specimens from a 2 h 75 g OGTT, completion of surveys, anthropometric and body composition measurements, and annual in-person follow-up examinations for 2 years. The KPNC Institutional Review Board approved the study protocol. The study recruitment, selection criteria, methodologies and other detailed information have been described previously [34,35,36]. At each 2 h 75 g OGTT, trained research staff collected fasting blood samples and processed and stored plasma samples at −80°C for future studies.

Study design

For this study, we selected the incident diabetes cases among Hispanic and Asian groups, and pair-matched (1:1.5) them to control women without progression to diabetes during the 2 year follow-up by age (±2 years), race and ethnicity (completely matched), pre-pregnancy BMI (±0.96 kg/m2) and glucose tolerance at 6–9 weeks postpartum (completely matched). We selected only matched pairs of Hispanic (n = 90) and Asian (n = 50) women to ensure homogeneity of race and ethnic groups. The nested case–control design with pair-matching greater than 1:1 does not allow direct comparisons of incidence rates among the ethnic and racial groups for this subset analysis. The fasting plasma samples were collected from these 140 women at the baseline examination (at 6–9 weeks postpartum), all confirmed not to have type 2 diabetes at the baseline exam via the 2 h 75 g OGTT. Details of the SWIFT prospective cohort design and follow-up are published elsewhere [30, 37,38,39,40]. For women who progressed to type 2 diabetes during the 2 years follow-up period (n = 55), termed here as the ‘follow-up’ time point, the newly diagnosed incident type 2 diabetes was referred to as ‘case’. Women who did not develop type 2 diabetes during the follow-up period (n = 85) are referred to as ‘control’ (Fig. 1). Please see electronic supplementary materials (ESM) Methods for details.

Fig. 1
figure 1

The schematic flow diagram of the study design. This was a nested case–control study within the SWIFT study, a prospective cohort of 1035 women diagnosed with GDM and followed up to 2 years postpartum. A total of 140 women were selected out of the 1035 SWIFT participants. These women did not have type 2 diabetes mellitus (T2DM) at 6–9 weeks postpartum (study baseline) based on 2 h 75 g OGTT. Of the 140 selected, 55 women were diagnosed as having T2DM, via 2 h 75 g OGTTs, within 2 years post baseline. This group was termed as ‘case’. The remaining 85 women did not develop T2DM based on the results of the 2 h 75 g OGTTs within 2 years post baseline. This group was termed ‘control’ (non-T2DM). The fasting plasma from the baseline examination was used for LC-MS-based targeted lipidomics aimed at finding the relation in terms of a predictive signature and the earlier stage pathophysiology of T2DM prospectively within the 2 year follow-up period

Targeted lipid profiling (targeted-lipidomics analysis)

Fasting plasma samples collected at 6–9 weeks postpartum during the SWIFT study were sent to Metabolon (Morrisville, NC, USA) for a single-blind targeted-lipidomics analysis of 1100 lipid species on each plasma sample. For details of lipidomics see ESM Methods.

Data preparation and statistical analysis of the quality of the final dataset

A stringent protocol was followed to prepare the final dataset, which was further scrutinised for quality in terms of the presence of confounding factors and the certainty of the class separation through principal component analysis (PCA) and a partial least squares-discriminant analysis (PLS-DA), respectively, using MetaboAnalyst 3.0 (https://www.metaboanalyst.ca/) in default setting (e.g. tenfold cross-validation). For details of this protocol, see ESM Methods.

Differential expression analysis and pathway analysis

A non-parametric test (Wilcoxon–Mann–Whitney test, α value set at p < 0.05) followed by multiple comparisons with false discovery rate (FDR) analysis (α value set at p < 0.05) was carried out to identify the differentially expressed lipid metabolites between the case and control. These differentially expressed lipid metabolites were used for the pathway analysis by adopting two approaches: (1) a direct approach where differentially expressed lipid metabolites were used in both over-representation pathway analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG; Kanehisa Laboratories, Kyoto, Japan) pathways and metabolite set enrichment pathways (MSEP) analysis and (2) an in silico approach where the interacting proteins with the differentially expressed lipid metabolites were used. All analyses were carried out using one of the following platforms (or a combination of them): MetaboAnalyst 3.0, MBrole 2.0 (Madrid, Spain), and String 10.5 platforms (https://string-db.org). For details of the pathway analyses, see ESM Methods.

Predictive analytics

The biomarker analysis module of MetaboAnalyst 3.0 was used for univariate receiver operating characteristic (ROC) analysis. In the multivariate ROC analysis, the stepwise (both ways) multiple logistic regression (MLR) was carried out in R-studio (Boston, MA, USA) using the ‘glm’ function under the removal of data redundancy protocol and significant contributor calculation (R-script is available in ESM Methods). Machine learning analyses were carried out through WEKA 3.8 (University of Waikato, Hamilton, NZ). The final classifier was further optimised for balancing between the chance of data overfitting, higher ROC possibility and F-score (a measurement of a test’s accuracy based on precision and sensitivity). Optimisation was carried out by applying K-fold cross-validation, confident threshold 1.0 and binary output selection. A series of cross-validation up to K = 100 was conducted to test the stress tolerability of the signature. Forty-five-fold cross-validation (K = 45) was chosen as per ‘one standard error rule’ for final reporting. High confidence threshold (1.0) ensures the proper cleaning of bias from the final signature. Binary output selection further protects the signature from data overfitting and bias selection. The discriminating power of ROC analysis is presented in the form of an AUC. See ESM Methods for details.

In vivo and in vitro functional studies

Animal care

C57BL/6 J male mice were obtained from Charles River (Sherbrook, QB, Canada) at the age of 8 weeks for both in vivo and in vitro islets studies. Mice were housed in the Division of Comparative Medicine facility, University of Toronto. All mouse procedures and maintenance were conducted in compliance with protocols approved by the Animal Care Committee at the University of Toronto and the guidelines of the Canadian Council of Animal Care.

Intraperitoneal injections and monitoring

The mice were injected intraperitoneally either by using 1 mg kg−1 day−1 fumonisin B1 (FB1) (Cayman, Michigan, USA) or vehicle (DMSO–saline [154 mmol/l NaCl]) for 3 weeks. Weight gain and blood glucose were monitored on a weekly basis.

Insulin tolerance test and IPGTT

Both ITTs and GTTs were conducted using standard protocols that are described elsewhere [41].

Sphingolipid profiling and insulin staining of pancreas

After 3 weeks of treatment, mice were euthanised to collect plasma and pancreatic tissue. Plasma samples (n = 3) were subjected to sphingolipid profiling through LC-MS/MS at the Analytical Facility for Bioactive Molecules, SickKids, Toronto. The pancreases (n = 7) were fixed for insulin staining by using the standard protocol [37] of the Centre for Phenogenomics (TCP), Sinai Health System Institute, Toronto. The 40× images of pancreatic slices were produced at TCP and analysed by Aperio ImageScope software package (Wetzlar, Germany).

In vitro glucose-stimulated insulin secretion

Glucose-stimulated insulin secretion (GSIS) was assessed, as previously described [41], in both Min6 K8 cells (a gift from S. Seino [Kobe University, Kobe, Japan] and J. Miyazaki [Osaka University, Suita, Japan] and isolated male murine C57BL/6 islets in vitro after treatment with either 1 μmol/l FB1 or 50 nmol/l myriocin (Cayman, Ann Arbor, MI, USA) for 24 h.

Results

Baseline sociodemographic and clinical characteristics of participants

This nested pair-matched case–control study included a subset of 140 Asian and Hispanic women from the SWIFT cohort (Fig. 1). Sociodemographic and clinical characteristics of case and control groups are summarised in Table 1. There were no statistically significant differences observed in either pre-pregnancy or baseline (6–9 weeks postpartum) BMI, total energy intake or physical activity. Baseline FPG (p < 0.01), 2 h plasma glucose (p < 0.001), fasting insulin (p < 0.01) and fasting TAG (p < 0.05) measurements and median HOMA-IR (p < 0.01) were significantly higher in the type 2 diabetes case group. The case group was more likely than matched control participants to have been treated with insulin or oral medications during pregnancy and were more likely to have a family history of diabetes.

Table 1 Prenatal and baseline (6–9 weeks postpartum) characteristics of incident type 2 diabetes cases and matched control (no diabetes) within 2 years post baseline among women with GDM

Statistical analysis of the quality of the final dataset from lipidomics

The final dataset was composed of 626 detectable lipid metabolites. The unsupervised PCA showed two major principal components, with the first comprising 32.8% of the total study population and the second comprising 11.8%. Since the lipidomic analysis was performed at baseline before the earliest diagnosis, it would be overly optimistic to get a higher value for the major principal components. Other components were small contributors in the separation of the study population (ESM Fig. 1a). The supervised PLS-DA, where the groups were pre-identified as control and case, showed distinguishable separation and was presented in a two-dimensional score plot (ESM Fig. 1b). A cross-validation analysis determined that the performance of PLS-DA had a 63% and 64% accuracy for these two clusters, respectively, based on R2 and Q2 (ESM Fig. 1c). Furthermore, the empirical Bayes estimation (here with 1000 random permutations) was applied to confirm that the distinct separation between the two groups found in PLS-DA was not due to random chance. The empirical p value was significant (0.014; ESM Fig. 1d), indicating that the separation was true for 986 times out of 1000. The distribution of (quantile) normalised and log2-transformed data is showed in ESM Fig. 1e.

Univariate and multivariate ROC analysis and predictive capability of metabolites to predict future type 2 diabetes

The strategies for predictive biomarker discovery are illustrated in Fig. 2a. FPG, HOMA-IR and 2 h post-load glucose in 75 g OGTT are frequently used for diagnostic purposes and their values for cases vs controls already showed a significant difference at baseline (p < 0.01, p < 0.01 and p < 0.001, respectively). In addition, the total fasting TAG levels were significantly higher in cases vs controls (p < 0.05). However, the ROC-AUCs of FPG, HOMA-IR, 2 h glucose and total fasting TAGs were 0.64, 0.65, 0.71 and 0.61 respectively (Fig. 2b–e, ROC analyses) in classic univariate ROC analyses. These low AUC values indicated a relatively weak ability to predict type 2 diabetes. Although mean differences were statistically significant (p < 0.01, p < 0.01, p < 0.001 and p < 0.05, respectively) (Fig. 2b–e, box plots), low AUC scores led to limitations. Each lipid metabolite was also subjected to classic univariate ROC analysis to find the lipid metabolite with the highest predictive capability for future type 2 diabetes status. Among all lipid metabolites, TAG 54:0-FA 16:0 scored the highest AUC of 0.69 (Fig. 2f). Although its mean difference for cases vs controls was statistically significant (p < 0.001) (Fig. 2f, box plot), its relatively low ROC-AUC score indicated weak predictability. The low ROC-AUC of TAG 54:0-FA 16:0 was in part due to high heterogeneity in the distribution of its concentration within the population. The low AUCs in univariate ROC analyses suggested that one analyte-based diagnostic would not be the best approach to predict type 2 diabetes incidence.

Fig. 2
figure 2

Predictive signatures/biomarkers for progression to type 2 diabetes. (a) Schematic flow diagram of the predictive signatures/biomarkers. (b) Univariate ROC analysis and box plot for FPG. The FPG value at 5.7 mmol/l (red circle) is the optimal cut-off for the mean AUC 0.64 within the 95% CI. (c) Univariate ROC analysis and box plot for HOMA-IR. The HOMA-IR value at −0.17 (red circle) is the optimal cut-off and provides the mean AUC 0.65 within the 95% CI. (d) Univariate ROC analysis and the box plot for 2 h post-load glucose in 75 g OGTT (2 h Glu). The 2 h glucose value at 6.58 mmol/l (red circle) is the optimal cut-off and provides the mean AUC 0.71 within the 95% CI. (e) Univariate ROC analysis and box plot for total fasting TAGs (T-TAG). The T-TAG value at 1.12 mmol/l (red circle) is the optimal cut-off and provides the mean AUC 0.61 within the 95% CI. (f) Univariate ROC analysis and box plot for the top AUC exhibiting lipid metabolite TAG54:0-FA16:0. The value at −0.03 mmol/l (red circle) is the optimal cut-off and provides the mean AUC 0.69 within the 95% CI. In the box plots (b–f), the distribution of population (case and control) based on FPG, HOMA-IR, 2 h glucose, T-TAG and TAG54:0-FA16:0 is shown, with the y-axis in mmol/l, except for HOMA-IR (unitless). The bottom and top of the box are the Q1 and Q3 (25th and 75th percentile), respectively, and the central band is the median (Q2 or 50th percentile). The bottom whisker is located within 1.5 IQR of the lower quartile, and the upper whisker is located within 1.5 IQR of the upper quartile. Outliers are presented in the outside of whiskers. The red line in each box plot shows the point that separates the whole population into two groups, case and control, to provide maximum class separation. A two-tailed, paired t test was carried out for each comparison; unadjusted p values: *p<0.05, **p<0.01, ***p<0.001 vs control. (g) In stepwise MLR with clinical variables, the signature with three variables (2 h glucose, FPG and family history of diabetes) provides the mean AUC 77%. (h) In stepwise MLR with lipid metabolites, the signature with 12 variables (lipids, shown on the right) provides the mean AUC 84%

Since type 2 diabetes is a multifactorial disease, multivariate analyses could have better strength in predicting future type 2 diabetes onset. Thus, a popular multivariate ROC analysis, stepwise multiple (both ways) logistic analysis [38, 39], was carried out here to select a signature panel (containing multiple variables) to improve the discrimination power (AUC). In the stepwise MLR analysis with both statistically significant biochemical clinical variables (FPG, 2 h glucose, HOMA-IR and total TAG) and clinical factors (family history of diabetes and type of GDM treatment), a panel of three clinical variables (FPG, 2 h glucose and family history of diabetes) produced an AUC of 77% (95% CI 69%, 85%) (Fig. 2g). In the stepwise MLR analysis with lipids, a panel of 12 lipid metabolites produced an AUC of 84% (95% CI 77%, 90%) (Fig. 2h).

The predictive signatures/biomarkers in machine learning approach and comparison with other methods

The artificial intelligence-assisted machine learning algorithms were further employed using Weka 3.8 to find a predictive signature with a better predictability than the multivariate signature panel. The highest ROC-AUC was found in the filtered classifier algorithm. The ROC-AUC of this panel was 0.92 for both case and control participants (Fig. 3a, b) with 91% accuracy (Fig. 3e). It revealed a predictive signature consisting of seven lipid metabolites with a decision tree having 17 nodes (branching points) and nine leaves (decision points) (Fig. 3c). Although both biochemical and historical clinical variables (total TAGs, FPG, 2 h glucose, HOMA-IR, family history of diabetes and type of GDM treatment) were evaluated with the lipid dataset, they did not appear in the predictive signature, indicating the superior predictive power of lipid metabolites over these clinical variables as well as matching variables (age, race/ethnicity and BMI) in this nested case–control study sample. This signature was validated through a rigorous cross-validation protocol, where a 45-fold cross-validation was selected by adopting one standard error calculation (Fig. 3d). The K = 45 cross-validated model showed no significant difference in misclassification errors in comparison with the K = 20- to 100-fold cross-validated models, having relatively lower standard mean errors and no overfitting due to being outside of the saturation of accuracy (K = 60 to 90). K = 85 cross-validation, which produced the lowest misclassification errors (or highest accuracy), was the most over-fitted model. The K = 45 cross-validated model was further optimised under confidence threshold 1.0 and binary output selection criteria. Altogether, this ensured the signature did not suffer from data overfitting and bias selection. The comparison among the best signatures found using different approaches is summarised in Fig. 3e. Comparisons were made in terms of accuracy, sensitivity, specificity, precision and AUC. The machine learning approach-derived signature had an AUC of 0.92, an accuracy of 91% (correctly predicted 127 out of 140 participants), a sensitivity of 87% (correctly predicted 48 cases out of 55) and a specificity of 93% (predicted 79 controls correctly out of 85).

Fig. 3
figure 3

The machine learning approach in predictive signature discovery. (a, b) ROC curve for type 2 diabetes (T2DM) cases (a) and control participants (b) in the filtered classifier algorithm. The mean AUC was 0.92 for both case (a) and control (b) within the 95% CI. (c) The decision tree generated from the filtered classifier algorithm. (d) The selection of cross-validation through the ‘one standard error’ rule where K=45 was selected. (e) Comparison table for the top biomarkers found using the different approaches

Differential expression and putative pathway analysis based on lipidomics

A total of 75 lipid metabolites were differentially expressed significantly between the case and control groups (Table 2). The putative pathway analysis (Fig. 4a) involved both a direct approach (based on differentially expressed lipids) and an in silico approach (based on the interacting putative proteins of the differentially expressed lipids). In the case group, 46 lipid metabolites were significantly upregulated and 29 were significantly downregulated (Fig. 4b). The significantly upregulated lipid metabolites were predominantly TAG lipid species whereas the significantly downregulated lipid metabolites consisted of CE, ceramide (Cer), NEFA, lactosylceramide (LCer), LPC, lysophosphatidylethanolamine (LPE), PE and SM lipid species (Fig. 4b). The volcano plot for all lipid metabolites and heat map for the differentially expressed lipid metabolites are presented in ESM Fig. 2a, b. The volcano plot showed a subtle fold change between the two groups at this stage before type 2 diabetes development. The heat map of differentially expressed significant lipid metabolites showed the heterogenicity over the studied population.

Table 2 Significantly altered lipids
Fig. 4
figure 4

The putative pathway analysis for the development of type 2 diabetes. (a) Schematic flow diagram of the putative pathway analysis. (b) The distribution of the differentially expressed lipid species (75) within the final dataset (626); the bar graphs show the binary logarithm of fold changes (case/control) of all significant metabolites with ± SEM. (c) Pathway analysis: metabolite set enrichment (MSE) analysis based on FDR <0.05 (−log10 of FDR <1.3) and KEGG pathway analysis based on FDR <0.05 (−log10 of FDR <1.3). Red bars, upregulation; green bars, downregulation. HCer, hexosylceramide

To identify lipid pathways associated with altered lipid metabolites, KEGG pathway analysis was carried out. A significant downregulation of sphingolipid metabolism (FDR 0.009) and upregulation of fatty acid biosynthesis (FDR 0.005) (Fig. 4c) was observed. To understand the predicted consequence of such modulation, metabolite set enrichment analysis was performed. The analysis identified the upregulation of α-linolenic acid and linoleic acid metabolism (FDR 0.002) as the predicted net consequence of upregulated fatty acid biosynthesis (Fig. 4c). The lipid metabolites belonging to the identified different pathways are summarised in ESM Fig. 3a. The upregulated fatty acid synthesis was identified due to the significantly higher concentrations of myristic acid (C14:0), palmitic acid (C16:0), stearic acid (C18:0) and oleic acid (C18:1). The discovery of upregulated α-linolenic acid and linoleic acid metabolism was based on the significantly higher concentrations of linoleic acid (C18:2), dihomo-γ-linoleic acid (C20:3), eicosapentaenoic acid (C20:5) and docosahexaenoic acid (C22:5). In the case of downregulated sphingolipid metabolism, a number of significantly decreased ceramides [Cer(16:0), Cer(20:0), Cer(22:0) and Cer(24:1)], lactosylceramides [LCer(16:0), LCer(24:1)] and sphingomyelin [SM(20:1)] species were identified. The specific alterations in these pathways were linked to increased type 2 diabetes risk (ESM Fig. 3a).

Using an in silico approach employing KEGG pathway mapping (ESM Fig. 3b), we identified the upregulation of specific inflammation pathways (loci-1) and the downregulation of sphingolipid metabolism and related pathways (loci-4) as the dominant changes associated with future type 2 diabetes status. Loci-2, the upregulated fatty acid biosynthesis, was found between the connectomes of loci-1 and loci-4. Additionally, the downregulated glycosylphosphatidylinositol (GPI) anchor biosynthesis (loci-3) represents an island locus. GPI proteins are essential for Cer-remodelling and transportation of Cers from the endoplasmic reticulum to the Golgi apparatus where glycosphingolipids and sphingomyelins are formed [40].

In vivo inhibition of sphingolipid metabolism

Our population-based lipidomics data indicate that a number of Cers, SMs and LCers are significantly downregulated years before type 2 diabetes onset (Fig. 4b), suggesting that the downregulation of sphingolipid metabolism could be in part responsible for the future onset of type 2 diabetes among women with previous GDM. To investigate this possibility, an approach was taken to inhibit sphingolipid metabolism. FB1, a pharmacological inhibitor of sphingolipid biosynthesis, was used to induce overall downregulation of sphingolipid metabolism in C57BL/6 mice (n ≥ 14). Due to the very short half-life of FB1 (liver 4.07 h, kidney 7.07 h, plasma 3.15 h [42]), our treatment could only transiently block sphingolipid metabolism. This transient downregulation of sphingolipid metabolism was chosen to depict the very early stage of type 2 diabetes pathophysiology. Figure 5a illustrates the sphingolipid metabolism pathway as a target of these inhibitors, with FB1 (1 mg/kg) being delivered intraperitoneally to mice as depicted in Fig. 5b. Serum samples were collected at the end of the treatment and sphingolipid species were profiled by MS (n = 3 per group). The FB1-treated mice showed significant accumulation of sphingosine (So) species So(d18:1) (Fig. 5c, d). In the SWIFT cohort lipidomics study, four Cers—Cer(16:0), Cer(20:0), Cer(22:0) and Cer(24:1)—were found to be significantly downregulated. In the FB1-treated mice, although levels of these four lipid metabolites decreased, the decrease was statistically significant only for Cer (16:0) (Fig. 5e).

Fig. 5
figure 5

In vivo functional studies. (a) Schematic flow diagram of the sphingolipid metabolism pathway showing targets of FB1 (pharmacological inhibitor). (b) The in vivo study design (n≥14): the control group of mice was injected with vehicle while the treatment group was injected with FB1 (1 mg/kg) daily. Every week, the weight gain and the FPG were monitored. At the end of the third week, GTT and ITT were performed. Finally, all mice were euthanised to collect whole pancreases and plasma. (c) So concentration in control and FB1-treated mice (n=3). (d) Representative chromatogram of So. (e) Comparison of the four Cer species found to significantly differ in the SWIFT cohort (values were mean-centred [n=3] and divided by the SD of each variable). In the boxplots (c, e), the bottom and top of the box are the Q1 and Q3 (25th and 75th percentile), respectively, and the central band is the median (Q2 or 50th percentile). The bottom whisker is located within 1.5 IQR of the lower quartile, and the upper whisker is located within 1.5 IQR of the upper quartile. (f) GTT single time point comparison between control (black line) and FB1 group (green line) at the end of 3 weeks treatment (n≥7). (g) ITT single time point comparison between control (black line) and FB1 group (green line) at the end of 3 weeks treatment (n≥7); inset shows AUC (mmol/l × min). (h, i) Representative insulin-stained pancreas (5 μm thickness, longitudinally sectioned through the pancreatic head-to-tail axis) from control (h) and FB1-treated mice (i); scale bars, 3 mm; insets show ×40 magnification. (j) Insulin-positive area in pancreases of control and FB1-treated mice (n≥5). A two-tailed, unpaired t test was carried out for each comparison. Data are presented as mean ± SEM; unadjusted p values: *p<0.05 vs control

Effects of downregulation of sphingolipid metabolism on glucose homeostasis

At the end of the 3 weeks of treatment, mice (n ≥ 14) were evaluated for weight gain, FPG, fasting insulin and OGTT and ITT were performed. No significant difference were observed between control and treatment groups for weight gain, FPG and fasting insulin (ESM Fig. 4a–c). During the GTT, no difference in blood glucose was observed when comparing control and FB1-treated mice (Fig. 5f). During the ITT, the treatment group (FB1) showed overall reduced responsiveness to insulin in comparison with the control group, most notably (significant) during the later stages of the ITT (Fig. 5g). Interestingly, the islets in the pancreas of FB1-treated mice (n ≥ 5) displayed a small but significant reduction in the insulin-positive area compared with the control mouse islets (Fig. 5h–j).

Pancreatic beta cell function in vitro in response to sphingolipid metabolism downregulation

To assess the effects of downregulated sphingolipid metabolism on beta cell function and insulin secretion more directly, murine (C57B/L6) islets and Min6 K8 cells were treated in vitro with either FB1 (1 μmol/l) or a second inhibitor myriocin (50 nmol/l) and GSIS was assessed (Fig. 6). In Min6 K8 cells, both inhibitors significantly decreased GSIS without affecting basal (low glucose) insulin secretion (Fig. 6a–d). The inhibitors also significantly decreased insulin secretion in response to cell depolarisation with KCl (Fig. 6e, g) and decreased total insulin content in Min6 K8 cells (Fig. 6f, h). In murine islets, both inhibitors significantly decreased GSIS (Fig. 6j, l). Moreover, myriocin caused a significant increase in basal insulin secretion (Fig. 6k). In murine islets, neither KCl-stimulated insulin secretion nor total insulin content were significantly altered by either treatment (data not shown).

Fig. 6
figure 6

GSIS studies in vitro. (ah) In Min6 K8 cells, FB1 treatment (green) did not alter basal (LG) insulin secretion (a) but significantly decreased GSIS (high glucose [HG]-stimulated) (b). Myriocin treatment (pink) did not alter basal insulin secretion (c) but significantly decreased GSIS (d). FB1 treatment significantly decreased KCl-stimulated insulin secretion (e) and total insulin (f). Myriocin treatment significantly decreased both KCl-stimulated insulin secretion (g) and total insulin (h). In Min6 K8 cells, 0 mmol/l glucose was used for LG and 10 mmol/l glucose was used in HG stimulation. For KCl stimulation, 25 mmol/l KCl was added to HG solution. (il) In murine islets, FB1 treatment significantly decreased both basal insulin secretion (i) and GSIS (j). Myriocin treatment significantly increased basal insulin secretion (k) and significantly decreased GSIS (l). In murine islets, 2.8 mmol/l glucose was used for LG and 16.7 mmol/l glucose was used in HG stimulation. For KCl stimulation, 25 mmol/l KCl was added to HG solution. Vehicle included 0.04% (v/v) DMSO for FB1 treatments (blue) or 0.0001 (v/v) DMSO for myriocin treatments (white). Data are presented as mean ± SEM (n=3 for FB1 in Min6 cells, n=5 for myriocin in Min6 cells, n≥6 for FB1 in C57BL/6 murine islets, n=3 for myriocin in C57BL/6 murine islets). A two-tailed, unpaired t test was carried out for each comparison (unadjusted p values: *p<0.05, **p<0.01, ***p<0.001 vs vehicle)

Discussion

By employing artificial intelligence-based machine learning, we identified a predictive signature with an overall discriminating power (AUC) of 0.92 with 91% accuracy. The accuracy of this predictive signature is not compromised by either sensitivity (87%) or specificity (93%). This accuracy is better than that provided by well-known clinical diagnostics, including fasting glucose, 2 h post-load glucose in 75 g OGTT, HOMA-IR, family history of diabetes and type of GDM treatment, as well as that reported in some recently published metabolomics-based diagnostic studies [19, 43, 44]. Moreover, unlike other signatures [19, 28], a strength of our predictive signature is that it does not rely on clinical variables since case and control participants were matched on early postpartum glucose tolerance (normal or impaired), age and BMI to reduce confounding of metabolite prediction by these clinical risk factors. The strong suit of the signature was the 45-fold cross-validation under a high confidence threshold (1.0) and binary output, which together minimise the chance of data overfitting and bias selection. This protocol ensures the reliability of this signature in making a predictive decision for any unknown blood sample. However, this predictive signature applies specifically to Hispanic and Asian women in predicting early progression to type 2 diabetes within 2 years following GDM pregnancy. Only two racial and ethnic groups were selected for this study, to achieve sample homogeneity. In future, these analyses may be extended to other race groups in the SWIFT cohort, in order to test the signature’s ability to predict progression to overt diabetes after GDM pregnancy within a much longer follow-up period of 10 years.

For the first time in a population-based study, we identified downregulation of sphingolipid metabolism as an antecedent early-stage event in women with previous GDM who developed type 2 diabetes (Fig. 4), together with other known pathways (e.g. upregulated fatty acid biosynthesis and upregulated α-linolenic acid and linoleic acid metabolism). Downregulated sphingolipid metabolism was identified based on a number of significantly downregulated nodes in the pathway (Table 2 and Fig. 4b). However, several cross-sectional clinical studies have shown that Cer levels (a single upstream node of the whole pathway) are higher in obese individuals with type 2 diabetes [45, 46]. These studies evaluated obesity as a covariant in their analyses. However, in this study, obesity was controlled by pair-matching of BMI between groups. Moreover, we employed a prospective postpartum GDM cohort, leaving open the possibility that some nodes of sphingolipid metabolism may arise after disease onset.

To understand the role of sphingolipid metabolism in the early-stage pathophysiology of type 2 diabetes, we used FB1 to inhibit de novo sphingolipid biosynthesis transiently in mice without high-fat diet intervention. The in vivo studies showed that transient inhibition of sphingolipid metabolism has no significant effect on insulin sensitivity, except in the late-phase (indicating disrupted hepatic glucose uptake and/or high gluconeogenesis) in the treatment group. However, this modulation of sphingolipid metabolism appeared to reduce pancreatic beta cell area. Further studies are required to determine whether this impairment of insulin biosynthesis will eventually lead to glucose intolerance in the long term.

The role of downregulated sphingolipid metabolism in overt type 2 diabetes phenotypes has been studied. Park et al [47] showed that Cer synthase 2 null mice with impaired synthesis of sphingolipids C22-24 develop glucose intolerance due to abrogated Akt phosphorylation of the insulin receptor in the liver. Alexaki et al [48] showed that adipocyte-specific Sptlc1-knockout mice exhibit insulin resistance with age-dependent loss of adipose tissue, increased macrophage infiltration and tissue fibrosis. Furthermore, Lee et al [49] showed that adipocyte-specific Sptlc2-knockout mice display systemic insulin resistance and hyperglycaemia. Taken together with our observations, chronic sphingolipid metabolism downregulation could thus potentially interfere with liver, muscle, adipose and beta cell function, contributing to type 2 diabetes onset.

The inhibition of sphingomyelin synthase in INS-1 beta cells significantly reduced insulin exocytosis [50]. Kavishwar and Moore [51] identified sphingolipid patches on the surfaces of pancreatic beta cells as a predictor of their functional capacity; the patches decreased in diabetes, suggesting the importance of sphingolipids in this cell type. In this study, both FB1 and myriocin decreased GSIS. Moreover, myriocin treatment yielded significantly increased basal insulin secretion in murine islets. Furthermore, downregulation of sphingolipid metabolism reduced insulin content. Although both FB1 and myriocin showed similar effects on GSIS in vitro, potential noise from off-target effects of these two inhibitors cannot be ruled out. Stanford et al [52] reported similar results (i.e. decreased GSIS in Min6 cells and murine islets) after inhibiting specific components of sphingolipid metabolism. Recently Ye et al [53] showed that during diet-induced obesity, mice with knockout of pancreatic beta cell-specific LDL receptor-related protein 1 (a pleiotropic mediator of cholesterol, insulin, energy metabolism and other cellular processes) were unable to compensate beta cell function partly due to downregulation of sphingolipid metabolism. Therefore, downregulated sphingolipid metabolism may play a causal role in pancreatic beta cell dysfunction.