Keywords

1 Introduction

Diatoms are a group of unicellular algae that are present in a great variety of aquatic environments. It is estimated that the total number of species is more than 200,000 (although the number of species already described is about 10,000). Since diatoms can adapt themselves to the environment, they can be used as a natural water quality indicator in environmental studies [5].

Diatoms are formed by two thecae that fit together to create a capsule known as a frustule. The frustule is formed by silica and depending on its shape diatoms can be centric (rounded frustule) or pennate (elongated frustule). The reproduction of the diatoms is asexual and sexual. In the asexual stage, the frustule is separated in the two valves. Then the other half of the cell grows originating two different diatoms, one bigger than the other. These differences in size are what is called life cycle. After several generations, the size of the valve can not decrease more triggering sexual reproduction. At this point, the cell form auxospores that will form new full-size algae. This is called sexual reproduction.

Traditionally, the task of identifying diatoms in samples from different aquatic environments was made by biologists. They usually looked for different morphometric features (length, width, shape) and frustule ornamentation such as the striae density. The identification is made comparing against previously described specimens [2]. Doing this task manually involves different challenges due to inter-species similarities and intra-species dissimilarities, originated from the various stages of the life cycle.

Different attempts to automate this process has been made [3, 4, 22]. This task is challenging due to different factors such as to the vast number of diatom species, similarities between them and the life cycle related changes in shape and texture. Some researchers [21] used shape descriptors based on Legendre polynomials and principal component analysis (PCA) in the identification of the Cymbella cistula species. Others [20] applied PCA to the Fourier descriptors extracted from the contour of the Tabellaria group. There are also recent studies on the application of different classification methodologies and the consideration of different image features such as textures, geometry, morphology and their combination [3]. Convolutional neural networks (CNNs) have also been applied with success for a high number of taxa [22]. However, the main source of errors come from the misclassification of algae due to their life cycle.

In this paper, we present an extension of the work presented in [24]. Two different contributions are added to the previous work. The main novelty of this work resides on the one hand that the number of classes has been increased from 8 to 14 and secondly a different approach has been considered using CNNs to classify the diatoms. CNNs have been applied recently to the taxonomic identification of diatoms with a 99.51% of accuracy in 80 species. However, the dataset used by these authors contains an average of 100 samples per taxa before applying any data augmentation technique. Due to the known need of relatively large training datasets for training some architectures such as AlexNet or GoogleNet from scratch, we propose to use transfer learning techniques as a fine-tuning strategy to the complete the model or by fixing the convolutional layers to use them as feature extractor to retrain the last part of the network [30]. In both cases, the networks are initialized with the weights of their corresponding architectures previously trained on ImageNet. In this work, ResNet18, AlexNet, VGG11, SqueezeNet1.0, DenseNet121, and InceptionV3 have been compared. Finally, a comparison between the results obtained with a traditional image identification workflow (i.e., image preprocessing, segmentation, feature extraction, dimensionality reduction, and classification) and CNNs is presented.

2 Materials and Methods

2.1 Database

The database used in this work is formed by 1085 diatom images of 14 different classes distributed as in Table 1.

Table 1. Number of images per taxa.

2.2 Traditional Image Classification

The first step to carry out is image segmentation and contour extraction. Then three different sets of features are extracted to describe the segmented image and the contour. After that, all the features undergo a dimensionality reduction process. Finally, a classifier is used with this reduced set of features. The method is more extensively described in [24].

A. Segmentation and Contour Extraction. Semi-automatic global thresholding based on the Otsu method and morphological operations was used. In this process, few images were manually discarded due to inhomogeneous illumination and noise.

B. Feature Extraction. Three different descriptors have been used to describe the images. Elliptical Fourier descriptors (EFD) model the diatom contour while Gabor filters and phase congruency (PC) descriptors characterize the diatom ornamentation.

  • Elliptical Fourier descriptors. The method to calculate EFD is described in [16]. It starts with a contour image and calculates the Freeman chain code. Then the x, y projections of the chain code are calculated. Finally, the Fourier coefficients are obtained from these projections. It was empirically determined that the first 30 coefficients are sufficient to represent the contour.

  • Phase congruency descriptors. The phase congruency is based on the fact that all Fourier components are in phase in areas where signals occur, i.e., corners, edges, and textures of the images. PC descriptors are calculated as in [28]. Starting from the phase congruency maximum (M) and minimum (m) momentum images(described in [14]), the mean and standard deviation were calculated for both images. Those images combine the phase congruency information of each orientation. A total of 4 phase congruency descriptors are obtained.

  • Gabor filters descriptors. Gabor based descriptors are calculated by the same method as in [3] and initially described in [6]. First, the log-Gabor filters are calculated as shifted Gaussians for different orientations and scales. These filters are applied to the images and then the first and second order statistics are obtained for every sub-band.

C. Dimensionality Reduction. After the feature extraction, a total of 223 features were obtained. Therefore a dimensionality reduction is needed. For such purpose, Linear Discriminant Analysis (LDA) [7] was selected as it was proven that enhances classification results over other techniques such as PCA. LDA projects the feature space into a new smaller subspace that maximizes the separation between classes. With this supervised method, the original 223 dimensions space is reduced to \(N-1\) dimensions, where N is the number of classes in the dataset (\(N=14\) in this work).

D. Classification. In machine learning, a classifier can be defined as a function that takes the values of different features of a sample and gives as an output the prediction of the class to which the sample belongs [23]. In [24], different supervised and non-supervised classifiers were tested. Among the tested algorithms, Hierarchical Agglomerative Clustering [25] was chosen as it achieved the best results with the proposed dataset. Hierarchical clustering is a machine learning algorithm to cluster unlabeled data points. It produces a set of nested clusters organized as a hierarchical tree that can be visualized using a dendogram. They may correspond to meaningful taxonomies e.g. diatom taxa. The initial phase of this algorithm states that every single observation is a different cluster. Then a distance function between clusters is computed, and the closer clusters are merged. The algorithm finishes once the number of clusters is equal to the previously defined number of clusters.

2.3 Deep Learning

The number of images contained in this dataset is reasonable for applying traditional machine learning methods but is far from the amount required by deep learning techniques as explained in [22]. This number can be decreased to 100 samples per class by using transfer learning techniques [8]. However, most of the classes have fewer samples than that, and the number should be later reduced by partitioning the dataset into training, validation and test datasets. To deal with this problem, we added a data augmentation step that performs:

  1. 1.

    Horizontal flip

  2. 2.

    Vertical flip

  3. 3.

    Random rotation between 0\(^\circ \) and 90\(^\circ \)

The combination of these three transformations is randomly applied each time a batch is requested during the training. After this process, images are resized to the network input size, i.e., 224 \(\times \) 224 pixels. Figure 1 shows some examples of this process.

Fig. 1.
figure 1

Data augmentation examples. Note that after the size normalization the aspect ratio of the original images is not preserved. This fact will have a negative effect in the learning process reducing the final classification accuracy.

Fig. 2.
figure 2

Source: http://alexlenail.me/NN-SVG/AlexNet.html

Scheme of the Alexnet network tested.

Since image classification is a common task, several classification network architectures have been proposed in the literature. In this case, we have tested ResNet18 [9], AlexNet [15] (see Fig. 2), VGG11 [26], SqueezeNet1.0 [12], DenseNet121 [10], and InceptionV3 [27]. To deal with convergence problems due to the low number of samples per class, two transfer learning techniques have been applied. One of them is fine-tuning, in which a pre-trained model in used to initialize the network and then all the weights are adjusted during training. The other one consists in using the convolutional layers as a feature extractor and then training only the last part of the architecture. In all cases, the model weights were initialized with the ones from their corresponding pre-trained models on ImageNet since it has demonstrated to be successful on a wide range of transfer tasks [11]. Therefore, ImageNet is only used to learn good general-purpose features as a starting point for our diatom classification task.

The dataset was split into 3 different parts to train and evaluate the models. The 80% of the images were used for training whereas the other 20% was divided into validation: 10%, and test: 10%. This was repeated 10 times following a 10 fold cross validation scheme. Data augmentation was applied after this division. Analogously to the pretrained models, the subtracted mean m and standard deviation \(\sigma \) used to normalize the inputs were \((m=0.485, \sigma =0.229)\), \((m=0.456, \sigma =0.224)\), \((m=0.406\), \(\sigma =0.225)\) for training, validation and testing respectively.

3 Results

Two different tests were done with the dataset. In the first experiment, the images were analyzed with a traditional image classification scheme obtaining a classification accuracy of 99.7%. In the second experiment, different CNNs were tested, being Densenet with 99.07% accuracy the best result achieved.

Figure 3 is a representation of the clusters using t-Distributed Stochastic Neighbor Embedding (t-SNE) [17] algorithm to reduce the dimension of the feature vector. In such figure, it can be observed that the separation of the clusters allows to identify each cluster with a different class. Despite not being perfectly differentiated all the clusters in this representation, it is possible to assure that they are well separated in the 14 dimensions hyperplane of the features space according to the classification results where only 3 observations were misclassified.

Figure 4 represents the confusion matrix with the correctly identified samples and the errors produced by the classifier. In addition to classification accuracy, different objective metrics were calculated to assess the clustering performance [13, 29]. These metrics measure similarities with the ground truth (RAND), the similarity between elements of the same cluster (Silhouette), the similarities between the class assignment and the ground truth classes (Adjusted Mutual Information), if a cluster contains only members of the same class (Homogeneity) and if all the members of the same class are assigned to the same cluster (Completeness) Table 2 shows the values for the metrics. The values close to 1 indicates that the clusters are separated and well defined.

Fig. 3.
figure 3

Representation of the data using t-SNE algorithm for visualization. The data after the dimensionality reduction using LDA produces well separated clusters.

Fig. 4.
figure 4

Confusion matrix of the classification results obtained using hierarchical agglomerative clustering. Elements in the main diagonal represent the correct identifications while the other elements are the errors.

Table 2. Clustering metrics.

Tables 3 and 4 show the accuracies of the CNN models on the test set. All architectures obtained better results with the use of fine-tuning rather than using them as a feature extractor, being the average accuracy difference between both techniques of around 11%. DenseNet, ResNet and VGG are the model architectures that provide the highest accuracy. From them, DenseNet shows the best results by achieving 99.07% of the samples correctly classified and having only one image misclassified. With the use of the convolutional layers as a feature extractor, SqueezeNet provides the best results with an accuracy of 93.52%. The differences between the two transfer learning techniques may be caused by the dissimilarity between diatoms and the classes in the ImageNet dataset. Based on that, it reasonable to have better results when the weights of the feature extractor are adjusted to the new dataset.

Table 3. Fine tuning results
Table 4. CNN as a feature extractor results

Regardless of the model used, the average per class accuracies show that the most challenging classes for both techniques are: Nitzschia amphibia, Sellaphora blackfordensis, and Sellaphora pupula. Nitzschia amphibia is often classified as Gomphonema minutum. Misclassification between them may be caused by the similarities of their lateral views as shown in Fig. 5(a)–(b). On the other hand, Sellaphora blackfordensis and Sellaphora pupula are often misclassified as Sellaphora capitata and Sellaphora auldreekie. The confusion between those classes is most likely to be caused by their high general similarity (Fig. 5(c)–(f)).

Fig. 5.
figure 5

Common misclassifications of the CNN models. Nitzschia amphibia (a) is sometimes classified as Gomphonema minutum (b), Sellaphora blackfordensis (c) as Sellaphora capitata (d), and Sellaphora pupula (e) as Sellaphora auldreekie (f).

4 Discussion

This work pursued two main purposes as a sequel of the previously presented [24]. On the one hand, use a larger dataset with more different classes for testing the method described for diatoms life cycle classification. Elsewhere, test deep learning CNNs with the same dataset to compare with the results obtained with traditional classification algorithms.

With the new dataset (14 classes), a 99.7% accuracy was obtained with classical methods, whereas a similar result than the 98% obtained with a smaller dataset (8 classes) in [24].

Despite the good results obtained for 14 classes, the dataset can be still considered small. The absence of loss of precision when some additional classes were included in the experiment needs to be corroborated in the case of considering a significantly large number of classes (e.g., 50–100) together with a sufficiently high number of samples per class. This would be a more realistic situation where a higher number of diatoms coexist in the same ecosystem.

Convolutional Neural Networks classified correctly the 99.07% of the samples in the best scenario and 65.74% in the worst case. Concerning per class accuracies, it has been shown that three classes (Nitzschia amphibia, Sellaphora pupula and Sellaphora blackfordensis) are the most difficult to classify independently of the learning technique. The best results were obtained using a fine-tuning strategy and thus, adjusting all the weights whereas the worst results were obtained using the first layers of the pre-trained models as fixed feature extractors. This may be caused due to the differences between the different application domains. While models trained on ImageNet learn how to classify instances from categories such as animals or objects, diatoms are very different from those. Therefore, using such models as a feature extractor do not allow to extract the needed features for diatom classification. On the contrary, models trained on ImageNet can generalize with good results to other classification problems with some adjustments.

5 Conclusions

Increasing the number of classes present in the dataset and, consequently, the number of images has not decreased the accuracy of the method based on image descriptors and a traditional classifier. It remains close to 99%. Moreover, the results obtained using Deep Learning reach also high classification rates. Although the dataset is small to train a CNN to classify diatom according to the taxa, a transfer learning procedure has been applied to obtain the 99.07% of samples correctly classified. From the two proposed techniques, fine tuning (adjusting all the network weights) achieves the best performance since diatoms differ from the objects of the categories commonly used to initialize the architectures.