Keywords

1 Introduction

Magnetic resonance imaging is a medical modality used to guide the diagnosis process and the treatment planning. To do so, it needs to develop the images or slices segmentation, in order to detect and characterize the lesions, as well as to visualize and quantify the pathology severity. Based on their experiences and knowledge, medical specialists make a subjective interpretation of this type of images; in other words, a manual segmentation is performed. This task is long, painstaking and subject to human variability.

Brain MRIs in most cases do not have well-defined limits between the elements that compose them; in addition, they include non-soft tissues, as well as artifacts that can hinder segmentation. Despite all these inherent conditions, numerous automatic algorithms or techniques have been developed and introduced in state-of-the-art. Among approaches exclusively designed to segment the brain tissues stand out those that are based on the paradigm of fuzzy clustering as well as all its variants [3, 4, 6, 11]. With the same purpose, hybrid methods based on combinations of different paradigms of machine learning and optimization algorithms have also been presented, e.g. [7, 8, 10]. On the other hand, methods designed to segment brain tumors or other abnormalities have also been introduced, among which one can refer to [1, 2]. For this task it is possible to affirm that in the state-of-the-art the proposals based on Deep Learning are the most novel and have the best results. The majority of these proposals yielded a high performance in the image processing task, specifically when these were brain magnetic resonance images. Nevertheless, after the pertinent analysis it was noted that most methods of them suffer from one or more challenges such as: training need, special handcrafted features (local or global), sensitive to initializations, many parameters that require a tuning, various processing stages, designed to segment just T1-weighted brain MRI images, among others. In this research paper, we concentrate on brain tissue image segmentation, the introduced proposal has the following special features in contrast with those above-mentioned: (1) it is able to segment RMIs with different relaxation times such as T1, T2, T1ce and Flair, (2) it does not require the initialization of any parameter, such as the number of regions in which the slice will be segmented, (3) it does not require any preprocessing stage to improve the segmentation quality of each slice and (4) it does not need various processing stages to increase its performance.

The rest of this paper is organized as follow. In Sect. 2, a brief theoretical explanation about Deep Learning and the layers required is given. The parallel architecture of Convolutional Neural Networks is introduced in detail in Sect. 3. Experimental results and a comparative analysis with other current methods in the literature are presented in Sect. 4. In the final section the Conclusions are drawn and future work is outlined.

2 Background

2.1 Convolutional Deep Neural Networks

Deep architectures are conventional neural networks, which share the same common basic property. They process de information by means of hierarchical layers in order to understand representations and features from data in increasing levels of complexity. Among them, there exists different variants that have found success in specific domains. In this regard, Convolutional Deep Neural Networks (CNNs) highlight in most computer vision tasks. A CNN is a feedforward neural network with several types of special layers; typically, it has convolutional layers interspersed with spatial pooling layers, as well as fully connected layers such as a standard multi-layer neural network. Lead role is developed by convolution layers, since they can detect local features at different positions in the input feature maps by means of learnable kernels.

An explicit mathematical formulation of layers used in most conventional models is given in [12]. Let \(x\in \mathbb {R}^{H\times W \times D}\) to be the imput map, K a bank of multi-dimensional filters, \(f\in \mathbb {R}^{H'\times W' \times D \times D''}\), b the biases and \(y \in \mathbb {R}^{H''\times W'' \times D''}\) the output, last one is given as:

$$\begin{aligned} y_{i''j''d''}=b_{d^{''}}+\sum _{i'=1}^{H'}\sum _{j'=1}^{W'}\sum _{d'=1}^{D}k_{i'j'd'}\times x_{S_{h}(i''-1)+i'-P_{h}^{-},S_{w}(j''-1)+j'-P_{w}^{-},d',d'' }, \end{aligned}$$
(1)

where \(y_{i''j''d}\) is the feature map result after the convolution operation, \(b_{d^{''}}\) is the bias value added to convolution result between the \(k_{i'j'd'}\) filter and the input neurons x. By other hand, \(\left( P_{h}^{-},P_{h}^{+},P_{w}^{-}, P_{w}^{+} \right) \) stand for top-bottom-left-right paddings and \(\left( S_{h},S_{w} \right) \) are subsampling strides of the output array. In order to obtain features with the attribute of being non-linear transformations of the input, an elementwise non-linearity is applied to the kernel convolution result by means of activation functions. There exist modern such as Rectified Linear Unit (ReLU), Leaky ReLU, Exponential Linear Units (ELU), among others; as well as classical ones e.g. step, sigmoid and tanh, that let to develop this process. To obtain a baseline accuracy it is convenient to use the standard ReLU (or its Leaky ReLU variant), which is defined simply as:

$$\begin{aligned} y_{ijd}=\max \left\{ 0,x_{ijd}\right\} , \end{aligned}$$
(2)

Most of the time a convolution layer is followed by a spatial pooling layer. In detail, a pooling layer takes the feature map that occurred in the convolution layer and performs a condensate of the feature map, by taking small regions of this and performing an operation on it, usually proceeding by obtaining the maximum value (Max-Pooling) of each of these regions. This operator computes the maximum response of each feature channel in a \(H'\times W'\) patch in next way:

$$\begin{aligned} y_{i''j''d''}=\max _{1\le i'\le H',1\le j' \le W'} x_{i''+i'-1,j''+j'-1,d} , \end{aligned}$$
(3)

resulting in an output of size \(y \in \mathbb {R}^{H''\times W'' \times D''}\) similar to the convolution operator. For the segmentation process, the so-called deconvolution layer is used. It aims at the reconstruction of the entrance maintaining a pattern of connectivity compatible with the convolution, mathematically it is given as:

$$\begin{aligned} y_{i''j''d''}=\sum _{d'=1}^{D} \sum _{i'=0}^{q(H',S_{h})} \sum _{j'=0}^{q(W',S_{w})}&f_{1+S_{h}i'+m(i''+P_{h}^{-},S_{h}), 1+S_{w}j'+m(j''+P_{w}^{-},S_{w}),d'',d'} \times \nonumber \\&\quad x_{1-i'+q(i''+P_{h}^{-},S_{h}), 1-j'+q(j''+P_{w}^{-},S_{w}),d'} , \end{aligned}$$
(4)

where are the vertical and horizontal input upsampling factors, \(\left( P_{h}^{-}, P_{h}^{+},P_{w}^{-},P_{w}^{+} \right) \) are the output crops, x and f are zero-padded as needed in the calculation.

2.2 U-Net

U-Net is a fully convolutional neuronal network model originally designed to develop a binary segmentation [9]; that is, the main object and the background of the image. This network is divided into two parts, in the first part, the images are subjected to a downward sampling, by means of convolution operations with a kernel of \(3\times 3\) each followed by a rectified linear unit (ReLU) and a maximum grouping layer of \(2\times 2\). The next part of the model consists of layers of deconvolution and convolution with \(2\times 2\) kernel, finally the output will correspond to a specific class of objects to be segmented, in Fig. 1 the U-Net model is shown graphically.

Fig. 1.
figure 1

U-Net model.

3 Parallel Architecture of CNNs for RMIs Segmentation

3.1 Proposed Scheme

Conventionally, it may be assumed that next five different regions can be found in a MRI slice: (1) White Matter (WM), (2) Gray Matter (GM), (3) Cerebral Spinal Fluid (CSF), (4) Abnormalities (ABN) and (5) Background. Nevertheless, it should be clarified that depending on the slice, not all regions may be present or the magnitude of their presence will be variant. Given the complexity that this consideration brings with it, most methods proposed in the state-of-the-art work only with the central slices of medical studies, mainly because they facilitate their segmentation by having a better delimitation in the regions.

To address this issue, a parallel architecture of CNNs is introduced in order to develop an automatic soft tissues recognition and their segmentation, for each slice of the whole medical study. The proposal is depicted in Fig. 2; it is basically comprised by four U-Nets models trained to work on a specific soft tissue. The operation of proposed scheme is quite intuitive, in the first instance any slice of a study must be entered into the system, then a binary segmentation is developed by each U-Net model. That is, all of them have to identify the pixels that correspond to the tissue for which it was trained, and therefore must be able to segment it. After that, the binary segmented images are merged in order to obtain the final segmentation. Two remarks must be stated: (1) Depending on the slice number, the different tissues should appear; in this situation, if the input image does not contain certain specific tissue, the U-Net in charge of segmenting it will return the corresponding label to the background of the image as a result. (2) If the study corresponds to a healthy patient, then there will be no abnormality or tumor, in the same way as in the previous remark, the result should be the label of the image background. This adaptive capacity of the proposed scheme allows it to be able to segment all slices of a complete medical study, automatically and without human assistance.

Fig. 2.
figure 2

Proposed parallel architecture of CNNs.

4 Experimental Setup

4.1 Data

In this research paper, two databases specialized in brain magnetic resonance imaging were considered. From BrainWeb [13] a normal anatomical model and one with abnormalities were used for training, while another normal model was used in the validation stage; each of them has 101 images with a size of \(256\times 256\) pixels with \(8-\)bits depth. For a real and objective evaluation of our proposal, tests were done with BraTS 2017 [5], it consists of 210 medical studies with existing Glioblastoma and 75 with Glioma and 47 more without classification. On the other hand, each study has RMIs in modalities T1, T1ce, T2, Flair, as well as their respective ground truth images. For each modality there are 155 images of \(8-\)bits with a size of \(240\times 240\) pixels.

4.2 Tuning

In order to accelerate the training of the four neural networks required to segment the different tissues, all characteristic maps of low and medium level of the original trained U-Net were transferred to each network, and only the high level ones were trained. By other hand, it is a well-known fact that data augmentation is essential to teach the network the desired invariance and robustness properties, when only few training samples are available. In our particular case, BrainWeb was used as a training information source. This repository has only one study with 101 soft tissue images, for which it was required to increase the process of information. In Table 1 all operations carried out to increase the information are summarized. Scale stands out for 3 different image sizes, rotation implies 120 possible images if an angle of \({3}^\circ \) is taken into account; besides, 4 quadrants where considered for translating, as well as without translation. In addition to 181, 800 images to train each neural network, their respective ground-truth images were required. During the training phase, several preliminary tests were developed to make the meta parameter tuning for each network. In order to obtain the best results in the test phase it is suggested: (a) color depth of \(8-\)bits, (b) TIFF image format, (c) Adaptive Moment Estimation (ADAM) optimization method, (d) 1000 epochs and (e) learning rate of 0.001.

Table 1. Data augmentation summary.

4.3 Evaluation

In order to evaluate quantitative and objectively the image segmentation performance as well as the robustness three metrics were considered in this study. To measure the segmentation accuracy, we used the Misclassification Ratio (MCR), which is given by:

$$\begin{aligned} MCR = \dfrac{misclassified\ pixels}{overall\ number\ of\ pixels} \times 100 \end{aligned}$$
(5)

where, the values can ranges from 0 to 100, a minimum value means better segmentation. Dice Similarity Coefficient is used to quantify the overlap between segmented results with ground-truth; it is expressed in terms of true positives (TP), false positives (FP), and false negatives (FN) as:

$$\begin{aligned} Dice = \dfrac{2\cdot TP}{2\cdot TP + FP + FN} \end{aligned}$$
(6)

where \(TP + FP + TN + FN =\) number of brain tissue pixels in a brain MR image. In this metric a higher value means better agreement with respect to ground-truth. In addition to stated metrics, the Intersection-Over-Union (IOU) metric was also considered. This is defined by:

$$\begin{aligned} IOU = \dfrac{TP}{TP + FP + FN} \end{aligned}$$
(7)

The IOU metric takes values in [0, 1] with a value of 1 indicating a perfect segmentation.

5 Results and Discussion

The performance of the proposed scheme (for convenience it will be identified as PA-CNNs) was compared with other methods mentioned previously in the introductory section, such as the Chaotic Firefly Integrated Fuzzy C-Means (C-FAFCM) [4], Discrete Cosine Transform Based Local and Nonlocal FCM (DCT-LNLFCM) [11], Generalized Rough Intutionistic Fuzzy C-Means (GRIFCM) [6], Particle Swarm Optimization - Kernelized Fuzzy Entropy Clustering with Spatial Information and Bias Correction (PSO-KFECSB) [10]. All of them were implemented in the MATLAB R2018a environment, while for ours we used CUDA+CuDNN+TensorFlow+Keras, that is, conventional frameworks and libraries for Deep Learning, as well as a GPU Nvidia Titan X.

5.1 Segmentation of a Simulated BrainWeb Study

In this experiment, a fully study was simulated was simulated using the BrainWeb database (consisting in 181 images). The parameters were established as: T1 modality, normal phantom, \(3\%\) of noise level and a non-uniform intensity level of \(20\%\). A quantitative comparison in terms of MCR, Dice and IOU is summarized in Table 2. The results reveal that proposed clustering algorithm has a superior performance in terms of segmentation quality than all compared methods. This is mainly due to the fact that the parallel architecture is robust in the presence of a noise level like the pre-established one. To visually exemplify the results obtained, the slice_071 was taken as sample. As can be seen in Fig. 3, in the presence of Gaussian noise, all comparative methods were affected with loss and gain information phenomenons, which directly impacts their quantitative results. On the other hand, the proposed scheme obtained the result with greater similarity to the ground-truth, which confirms its balance in the quantitative and qualitative results.

Table 2. Average performance on BrainWeb study.
Fig. 3.
figure 3

BrainWeb segmentation results.

Table 3. Average performance on BraTS-2017 study.
Fig. 4.
figure 4

BraTS17_2013_10_1 segmentation results.

5.2 Segmentation of a Real BraTs-2017 Study

A convincing way to know the true performance of the proposed method is to subject it to the task of tissues segmentation of real brain magnetic resonance images. In this regard, the second experiment is related with the segmentation of images with modalities T1, T1ce, T2 and Flair taken from the BraTS-2017 database; specifically, the Glioblastoma Brats17_2013_10_1 study. The quantitative evaluation was done considering the metrics established above, a summary of these is presented in Table 3. The numerical results reveal a superior performance of the segmentation method proposed in all the metrics considered, as well as all exposition modalities.

A sample image and the segmentation provided by all algorithms evaluated in this experiment are depicted in Fig. 4, it is possible to see that just the proposed algorithm was able to segment images with different modalities. On the other hand, all the other methods presented problems of loss of information in the segmented regions, and in some cases they were not even able to segment the images in the 4 established regions. In the BraTS challenge, primary task is the Multimodal Brain Tumor Segmentation, in this regard a good segmentation of the images in these modalities can guarantee the identification and segmentation of brain tumors. With the results obtained by the proposed algorithm and depicted in Figs. 4.26 to 4.30, it is possible to affirm its ability to detect abnormalities in the brain, unlike the comparative methods.

6 Conclusions and Future Improvements

In this research paper, a parallel architecture of Convolutional Neural Networks was stated. The experimentation carried out on simulated and real images allow us to sustain the following qualities: (1) It has the capacity to identify and segment the regions of an MRI without prior specification of the regions, that is, it carries out identification and segmentation autonomously, (2) It has the ability to segment images without prior processing and in different modalities or exposure times such as T1, T1ce, T2 and Flair, (3) It is robust to most artifacts in this type of magnetic resonance imaging of the brain, (4) It has the ability to generalize, that is, although it has been trained with a simulated database, it is capable of segmenting real images. As future work will be done the training using the BraTS database, which is expected to increase the performance of the proposed architecture, as well as to specifically target brain tumors.