Introduction

Chronic liver disease can be caused by a wide spectrum of etiologies including viral hepatitis, metabolic disease, and autoimmune disorders. The most common chronic liver disease, non-alcoholic fatty liver disease (NAFLD), is estimated to affect 11–46% of the USA population [1, 2].

Regardless of the etiology of liver disease, presence and severity of hepatic fibrosis are the most important prognostic factors used to determine the urgency of treatment and predict disease progression and clinical outcomes [3]. In addition, individuals with advanced fibrosis will require screening for HCC and endoscopic screening for esophageal varices [4, 5].

Liver biopsy is the gold standard for diagnosing hepatic fibrosis. However, biopsy is associated with potential complications such as hemorrhage. The invasive nature of biopsy also limits repeated performance at multiple follow-up visits after interventions, making comparison of treatment effect difficult. In addition, since biopsy only samples a small fragment of the entire liver, a discordance rate up to 24% has been reported [6]. As a result, development and verification of a noninvasive approach to detect hepatic fibrosis are needed.

Compared with other modalities such as serum markers or ultrasound-based transient elastography, MR elastography (MRE) has been demonstrated to have very high accuracy for detection and staging of liver fibrosis at both 1.5T and 3T [7]. MR elastography assesses tissue stiffness by imaging mechanical shear wave propagation using motion-encoding gradients [8]. Using a cutoff value of 2.93 kPa, MRE was shown to have 98% sensitivity and 99% specificity for detecting stage ≥ F1 fibrosis [9]. MRE has also shown high diagnostic accuracy (AUC 0.90, 95% CI 0.84–0.94) for differentiating early fibrosis (F1–F2) from advanced fibrosis (F3–4) [10]. Similary, it has shown high accuracy for differentiating METAVIR F0–F1 (mild fibrosis) from METAVIR F2–F4 (moderate to severe/clinically significant fibrosis) [11, 12]. However, recently lower accuracy (AUC 0.70, 95% CI 0.59–0.81) was reported for children and young adults, with significantly lower accuracy in patients with steatosis (AUC 0.53, 95% CI 0.35–0.71) [13].

Most published MR elastography studies use a phase contrast gradient-recalled-echo (GRE) sequence with 95% success rate [14]. While the GRE-MRE sequence is robust and well verified, it has a few limitations [15]. First, serial breath-holds (one per slice, typically 4 × 19 s) are required. Secondly, higher failure rates of the GRE sequence are observed in patients with fatty liver disease and/or iron overload. The iron-mediated T2* effects on magnetic field homogeneity are more prominent at 3T, likely leading to high reported GRE-MRE failure rates of up to 15% for MR elastography performed at 3T [16].

An alternative to the GRE sequence is spin-echo (SE) echo-planar imaging (EPI). The utilization of a spin-echo pulse makes the echo dependent on T2 instead of T2*  thereby decreasing the sensitivity to iron overload and increasing the signal-to-noise ratio [17]. Several studies comparing GRE and SE-EPI MR elastography at 1.5T have reported equivalent liver stiffness measurements and lower failure rates in SE-EPI sequences [15, 17]. However, very few published data exist comparing GRE-MRE and SE-EPI MRE sequences at 3 Tesla [16, 18].

The liver stiffness (LS) in MR elastography can be measured by a single small circular Region Of Interest (ROI), averaging multiple small ROIs, or by a Freehand ROI covering the entire liver on the stiffness map. In our own experience, choosing the best ROI to report the liver stiffness can often be challenging given the heterogeneity of values in the stiffness maps. Lee et al. showed that mean LS measurements obtained from multiple ROIs had high intra- and inter-observer agreement (ICC 0.945) [19]. Toguchi et al. reported that small circular ROI measurements at 1.5T MR elastography had higher accuracy than Freehand ROI for detection of significant fibrosis (F2+) [20]. No published data exist on the optimal cutoff values for differentiating varying stages of fibrosis using Mean, Maximum, and Freehand ROI stiffness measurements.

Our aims were to (1) compare 2D-GRE and 2D-SE-EPI sequences to measure liver stiffness at 3 Tesla correlating with pathology and (2) evaluate inter-observer variability and accuracy of varying ROI measurements to assess stiffness measurements for both sequences.

Materials and methods

This retrospective single-center IRB-approved, HIPAA-compliant study included 387 consecutive patients older than 18 years with known or suspected liver disease who underwent MR elastography of liver using both 2D-GRE and prototype 2D-SE-EPI sequences between May 2017 and December 2017.

MRI and MR elastography

The liver MRI was performed at a 3 T system (MAGNETOM Skyra, Siemens Healthcare, Erlangen, Germany) using the standard spine and body matrix coils. The MR elastography utilized an acoustic driver system (Resoundant, Rochester, MN) with the passive driver placed at the patient’s right upper quadrant inducing shear waves at 60 Hz. Both 2D-GRE and 2D-SE-EPI sequences were acquired in each patient. The MRE parameters are summarized in Table 1. Data acquired with either sequence was processed by the same inversion algorithm. The prototype SE-EPI MRE implementation employed a modified background masking algorithm.

Table 1 MR Elastography pulse sequence parameters

Liver stiffness measurement

The liver stiffness was independently measured by an experienced radiology attending and a PGY2 radiology resident (Fig. 1). Three different methods were used for each patient: (1) Mean liver stiffness (LS), calculated by averaging 3 ROIs placed in the right lobe; (2) Maximum LS, calculated by placing an ROI of at least 3 cm2 in the area of the right lobe with coherent wave propagation, away from the liver edge, devoid of vessels, bile ducts and focal lesions; and (3) Freehand LS, calculated by placing a hand-drawn ROI in the entire measurable liver on the stiffness map. The corresponding magnitude images, anatomy images, and wave images were simultaneously evaluated to avoid areas just beneath the driver, bile ducts, blood vessels, and any other regions with incoherent wave propagation for maximum and freehand measurements as well. Measurements were restricted to areas where the confidence parameter of the stiffness reconstruction was above 95%.

Fig. 1
figure 1

Liver stiffness (LS) measurement on the stiffness map for GRE MRE (a) and SE-EPI MRE (b). Left: Mean LS, measured by averaging 3 ROIs placed in the right lobe; Middle: Maximum LS, measured by placing an ROI in the right lobe; Right: Freehand LS, measured by placing a hand-drawn ROI in the entire measurable liver avoiding the left lobe and area just underneath the driver

Pathological correlation

Ninety-seven patients in this study cohort underwent liver biopsy. After reviewing the pathology reports, the fibrosis stage for each patient was recorded and used as the reference to compare the diagnostic performance of SE-EPI MRE and GRE MRE.

Statistical analysis

Restricted maximum likelihood estimate of variance components in a random-effects model was used to estimate the intra-subject (inter-reader) and inter-subject (intra-reader) component of the overall variance in each measure. The estimated components were used to estimate the intra-class correlation (ICC) and intra-subject coefficient of variation (CV) to assess the inter-reader variation (reproducibility) of each measure. A Bland–Altman analysis was used to estimate the mean intra-subject difference between readers (bias), computed for each subject as the value from reader 1 minus the value from reader 2, and the 95% limits of agreement (LoA). The LoA constitute a 95% confidence interval for the difference between readers in terms of the measures they would provide for any one new randomly selected subject. A paired-sample t test was used to compare readers in terms of the measures they provided for the same patient; a significant result would imply that the bias was significantly different from zero.

Spearman rank correlations were used to assess the association of fibrosis stage with the results from each reader.

To assess the diagnostic utility of the measures for predicting high-stage fibrosis (variously defined as F1–F4, F2–F4 and F3–F4), ROC analyses were conducted. The Youden index was used to identify a cutoff value of each measure that was optimal for declaring patients test-positive for high stage fibrosis. Since reader-specific cutoff values would have negligible clinical utility, the data from the two readers were combined for the purpose of the ROC analysis in order to determine a single reader-independent optimal cutoff. The accuracy of each measure was characterized in terms of area under the ROC curve (AUC) and the sensitivity and specificity pooled over the two readers that was achieved using the same optimal criterion for both readers.

All statistical tests were conducted at the two-sided 5% significance level using SAS 9.3 software (SAS Institute, Cary, NC).

Results

Demographics

The study population consisted of 387 patients with known or suspected chronic liver disease, including 215 males (mean age 53.6 years) and 172 females (mean age 53.9 years). The common known etiologies included Hepatitis C, Hepatitis B, NAFLD, and alcoholic liver disease (Fig. 2). There were 11 patients with R2* values greater than 80 s−1, indicating hepatic iron overload [21].

Fig. 2
figure 2

Fibrosis etiologies and indications for MR elastography studies in the study cohort

Failure rate

Technical failure was defined as no liver pixel having a confidence value greater than 95%. This included cases where the background masking algorithm was erroneously removing the liver. The technical failure rate was 2.8% for 2D-SE-EPI (11/387) and 4.1% for 2D-GRE (16/387), as summarized in Table 2. The most common known cause of GRE-MRE failure includes high R2* values (> 80 s−1 in 9 patients) resulting in vanishing confidence intervals (3 patients, Fig. 3a), and failed background masking (5 patients, Fig. 3b), or both (1 patient). Three patients had failed background masking unrelated to iron overload (R2* < 80 s−1). Two patients with failed GRE MRE had a severely atrophic liver (Fig. 3c) that possibly altered the wave propagation resulting in poor SNR. GRE MRE in 1 patient failed because of interposed lung resulting in inadequate wave propagation. Ten out of 11 cases of SE-EPI MRE failure were due to severe fat ghosting (Fig. 3d). The cause for failure could not be determined in 2 patients (with GRE in 1 patient and with SE-EPI in 1 patient).

Table 2 Reasons for failed GRE and EPI sequences
Fig. 3
figure 3

Representative failed cases. a A case of failed GRE MRE in a patient with iron overload (R2* = 219 s−1) with vanishing confidence (left), and successful EPI sequence (right). b A case of failed background masking in GRE MRE (left) but not in EPI MRE (right) in a patient with minimal iron overload (R2* = 100). c Coronal T1-weighted contrast-enhanced image in a patient with severely atrophic liver (no iron overload) and failure of GRE sequence (not shown), possibly due to altered wave propagation. d A case of successful GRE sequence (left) and failed EPI sequence (right) due to fat ghosting

Reproducibility

Intra-class correlation (ICC) analysis (Table 3) showed excellent inter-reader agreement, ICC = 0.89 (Mean), 0.94 (Maximum), and 0.84 (Freehand) for GRE; ICC = 0.84 (Mean), 0.94 (Maximum), 0.84 (Freehand) for SE-EPI. Maximum ROI showed the highest intra-class correlation. The intra-subject coefficient of variation (CV) for almost all measures is higher than 15% (Table 3), indicating significant inter-reader variations. The bias for two independent readers is small for Maximum ROI (0.13 kPa or 3.3% of average LS for GRE and 0.10 kPa or 2.6% of average LS for SE-EPI), and Mean ROI (− 0.24 kPa or 6.6% of average LS for GRE and − 0.22 kPa or 5.8% of average LS for SE-EPI), but the bias is higher for Freehand ROI (0.61 kPa or 18% of average LS for GRE and 0.65 kPa or 18% of average LS for SE-EPI). There is also strong intra-reader correlation between measurements by SE-EPI and GRE sequences (Fig. 4). The strongest correlation is for Reader 1 in Mean ROI measurement (r = 0.96). The other measurements all have correlation in the range of 0.90–0.92.

Table 3 The overall mean, the CV and ICC as measures of reproducibility, the estimate of mean bias, p value from the t test of the bias, and the upper and lower 95% limits of agreement from a Bland–Altman analysis
Fig. 4
figure 4

Scatter plots and best fit of liver stiffness by GRE MRE and SE-EPI MRE measured by reader 1 (blue) and reader 2 (red) using Mean ROI (left), Maximum ROI (middle), and Freehand ROI (right)

Pathological correlation

The spearman analysis (Table 4) showed similar correlation between fibrosis stage and result from each reader for all measures except Mean ROI measurement of SE-EPI sequence, for which Reader 2 demonstrated higher correlation (0.47 for Reader 1 vs. 0.62 for Reader 2). The measurement with the most consistent high correlation is Freehand ROI for GRE (0.62–0.63) and Mean ROI for GRE (0.61–0.62).

Table 4 Spearman correlation between fibrosis stage and the results from each reader

Accuracy, sensitivity and specificity

ROC AUC analysis showed similar accuracy between different measurement methods in SE-EPI and GRE MRE (Fig. 5 and Table 5). AUCs for differentiating early fibrosis (F0–F2) from advanced fibrosis (F3–F4) for Mean, Maximum, and Freehand measurements were 0.87, 0.83, 0.88 for GRE MRE, and 0.81, 0.75, and 0.83 for SE-EPI MRE. Based on the ROC curve, the optimal cutoff for Mean, Maximum, and Freehand measurements for differentiating early (F0–F2) from advanced fibrosis (F3–F4) were 3.7 kPa, 4.2 kPa, and 3.0 kPa for GRE MRE, and 3.7 kPa, 3.5 kPa, and 3.1 kPa for SE-EPI MRE (Table 5). Sensitivity and specificity of both variants are also summarized in Table 5.

Fig. 5
figure 5

ROC analysis of MRE performance compared with liver biopsy pathology, using Freehand, Maximum and Mean ROI to differentiate F0 from F1 to F4 (left), F0–F1 from F2 to F4 (middle) and F0–F2 from F3 to F4 (right)

Table 5 numbers of patients that were reference-standard-positive (N+) and reference-standard -negative (N−) for each outcome (i.e., high-stage fibrosis according to each definition), the AUC achieved by each measure as a predictor of each outcome, the optimal reader-independent criterion for classifying patients as test-positive for each outcome, the sensitivity and specificity (pooled over readers) achieved using the optimal threshold

Overall, GRE MRE demonstrated higher accuracy than SE-EPI MRE in the assessment of most fibrosis stages. Both GRE and SE-EPI MRE showed higher accuracy for more advanced fibrosis than for lower-stage fibrosis. The most sensitive measurement for differentiating early (F0–F2) from advanced fibrosis (F3–F4) was the Freehand stiffness measurement on GRE MRE (0.84). The most specific measurement for differentiating early from advanced fibrosis was Maximum ROI stiffness measurement on GRE MRE (0.92).

Discussion

In this study, we evaluated the feasibility, technical failure rate, and accuracy of 2D-GRE and 2D-SE-EPI MR elastography on 387 patients at 3 Tesla. Our results show that the failure rate for SE-EPI MRE (2.8%) was less than for GRE MRE (4.1%) at 3T. This is significantly lower than previously reported failure rates of up to 15.3% at 3T [22]. 56% (9/16) of patients with failed GRE MRE had hepatic iron overload. These results are comparable to the previously published literature, reporting up to 70% of failure attributed to iron overload [14]. GRE MRE failed in 30% (9/30) of patients with R2* values greater than 80 s−1. All of these 9 patients had successful SE-EPI MRE. These results are comparable to the previously published literature on 1.5T which show that SE-EPI MRE succeeds in patients with failed GRE MRE due to iron overload [17]. The SE-EPI acquisition is insensitive to T2* decay and hence less prone to failure in iron overload. There were two patients with R2* of 98 s−1 and 100 s−1 (indicating minimal hepatic iron overload) where the SE-EPI sequence failed but the GRE sequence did not. However, the SE-EPI sequence failed in these patients due to failure of fat suppression (Fig. 3d).

The relatively higher failure rates of GRE MRE in case of iron overload might be attributed to the longer TE of 20 ms, which may cause significant signal decay. Garteiser et al. described fractional encoding for GRE MRE, which reduced TE to 6.9 ms at 3T [23]. While TE is reduced, fractional encoding comes with the disadvantage of decreased encoding efficiency [24]. Recently, Guenthner et al. reported refinements that further improved fractional encoding in GRE MRE [24], which may potentially lower the failure rates of GRE MRE. Of note, fractional encoding can also be combined with SE-EPI MRE. The comparison of GRE and SE-EPI MRE utilizing fractional encoding in clinical setting needs further investigations.

The iron overload associated high failure rates in MRE can be overcome by different noninvasive imaging techniques not affected by susceptibility artifact. Ultrasound elastography has been reported to achieve accuracy of 0.78–0.99 to differentiate ≥ F2 fibrosis in NAFLD patients [25]. But it is limited by presence of ascites, high body mass index and multiple confounding factors including technical variations [25]. Liver stiffness can also be evaluated by tagged MRI, which measures liver tissue strain induced by nearby cardiac motion [26]. Mannelli et al. demonstrated that cirrhotic patients have significantly higher liver strain than healthy controls [27]. However, MR tagging requires substantial postprocessing, and optimal strain cutoff for different fibrosis stages is not yet clear.

While the prototype SE-EPI MRE used the same inversion algorithm as GRE MRE, it employed an improved background masking algorithm. The background masking serves to restrict stiffness calculation within tissues. The improvement in masking algorithm likely led to the absence of reconstruction failures in SE-EPI MRE, compared to 8 cases with failed background masking in GRE-MRE (5 with R2* > 80 s−1 and 3 with R2* < 80 s−1). The new masking algorithm has in the meantime been integrated into the GRE-MRE product. Another advantage of the SE-EPI sequence is the shorter acquisition time (4 slices in a single breath hold, at a shorter breath-hold time of 11 s) decreasing risk of motion artifacts. However, none of the GRE MRE failures in this study were due to respiratory motion.

Three of our patients with failed GRE MRE had poor wave propagation; two of these had severely atrophic liver (Fig. 3c), which possibly resulted in altered wave propagation and poor SNR as proposed by Kim et al. [18]. While other factors such as interposition of fat, lung, and colon, as well as driver position, are expected to affect both GRE and SE-EPI MRE, none of these patients had failure of SE-EPI MRE. This is possibly related to better base SNR of the SE-EPI sequence, and better motion encoding with the settings used here. In contrast, 10 out of 11 cases with failed SE-EPI MRE were likely due to a mis-adjustment of the water center frequency. Interestingly, SE-EPI MRE failed on follow-up MR elastography performed 3 months later in one patient with severe obesity, while GRE MRE succeeded.

The Mean, Maximum, and Freehand ROI stiffness measurements demonstrated high reproducibility in measuring hepatic stiffness with high intra-class correlation for both the GRE and SE-EPI variants. The relatively high coefficient of variation might result from the large sample size in this study, which enables detection of small but significant variations between readers. Of note, Freehand ROI measurement demonstrated significant bias up to 18% of average liver stiffness measured in both GRE and SE EPI MRE (Table 3). One possible reason is that Freehand ROI measurement requires inclusion of all measurable liver parenchyma. Excluding regions with incoherent wave propagation (see Methods section) while drawing the largest possible ROI can be challenging and heavily dependent on the reader’s experience. In contrast, the biases for Maximum ROI (2.6–3.3% of average LS) and Mean ROI (5.8–6.6% of average LS) were found to be relatively small. Therefore, both the Maximum and Mean ROI measurements are highly reproducible methods while reporting the stiffness measurements regardless of expertise of the reader. Lee et al. reported high intra- and inter-observer agreement in liver stiffness measurements by using Mean ROI measurements (six ROIs) with high intra-class correlation (ICC: 0.945) [19]. In our study, the Maximum ROI measurement demonstrated the highest inter-observer agreement compared with average of 3 ROIs (ICC 0.94 vs 0.84–0.89).

The accuracy of Mean, Maximum, and Freehand ROI measurements for liver stiffness were compared using ROC analysis. AUC for detection of significant fibrosis was similar across all three measurements, and comparable to previously published literature [10, 13]. For all three measurements, GRE MRE demonstrated higher accuracy than SE-EPI MRE (0.88, 0.83, and 0.87 vs 0.83, 0.75, and 0.81 for Freehand, Maximum, and Mean ROI). Accuracy for Freehand measurements was marginally higher compared to Mean and Maximum measurements for both variants (0.88 and 0.83 for GRE and SE-EPI, respectively). In contrast, Toguchi et al. reported better accuracy for small-ROI measurements compared to Freehand ROIs for detection of significant fibrosis (F2+) − AUC: 0.93 vs 0.89 [20]. Freehand ROI measurements, however, have slightly higher inter-reader variability, likely dependent on the expertise of the reader.

Most importantly, in our study, there were significant differences in the optimum cutoff values determined by the Youden index for differentiating varying degrees of fibrosis for Mean, Maximum, and Freehand measurements. The optimum cutoff values for Freehand stiffness measurements for differentiating early (F0–F2) from advanced fibrosis (F3–F4) were lower (3.0, 3.1 kPa) for both GRE and SE-EPI MRE. In comparison, the optimal cutoff for Mean stiffness measurements for differentiating early from advanced fibrosis for both sequences was 3.7 kPa. The cutoffs identified in this study are lower than most previously reported studies. For instance, Ichikawa reported 4.0 kPa as the optimal cutoff for differentiating early (F0–F2) from advanced (F3–F4) fibrosis [28]. Kim et al. in their study performed at 3T also reported low optimal cutoff for identifying clinical cirrhosis (3.03 kPa for 2D-GRE MRE and 3.06 kPa for 2D-SE-EPI MRE) [18].

Using 4.2 kPa as the optimal cutoff, Maximum ROI stiffness measurements on GRE MRE had the highest specificity (92%) but low sensitivity (63.7%). Mean ROI stiffness measurements for both sequences had high specificity (85.2% for GRE and 88.8% for SE-EPI) but relatively low sensitivity (73.9% and 61.2%, respectively). Sensitivity and specificity for Freehand ROI stiffness measurements were 84.4% and 77.3% for GRE-MRE and 71.9% and 80% for SE-EPI MRE.

There are several limitations of this study. First, the majority of the patients did not undergo liver biopsy. Second, studies with low signal–noise-ratio such as those with less than 30% liver within the confidence map as defined by Mariappan et al. [17] were not excluded from the study cohort. Third, we did not compare the image quality and size of the confidence region for the SE-EPI and GRE sequences. Fourth, pathological fibrosis stage was not re-reviewed by an expert pathologist. High inter- and intra-observer variability has been reported in pathological staging of liver biopsy specimens with ICC up to 0.91 [29,30,31].

In summary, we observed high inter-class agreement on measured hepatic stiffness for both 2D-GRE and 2D-SE-EPI MR elastography. Failure rates of both the 2D-SE-EPI MRE and 2D-GRE-MRE sequence at 3T are comparable to those at 1.5T, and significantly lower than what was previously reported. 2D-SE-EPI MRE is a quicker, single-breath-hold sequence with lower failure rate than the 2D-GRE-MRE sequence, especially in patients with iron overload. 2D-GRE MRE had a slightly higher accuracy for staging liver fibrosis than 2D-SE-EPI MRE. The optimal cutoff values for differentiating early from advanced fibrosis are lower for ‘Freehand’ measurements compared to the ‘Maximum’ and ‘Mean’ stiffness measurements.