Keywords

1 Volcanic Seismology and Pattern Recognition

Because of its geographical location in the pacific ring of fire, Chile has an intense volcanic activity, throughout its territory. The “Observatorio Vulcanológico de los Andes Sur” (OVDAS) is the state agency responsible for establishing systems to continuously monitor and record forty three of the most active Chilean volcanoes. This monitoring is mainly of seismological type, as seismicity is highly related to volcanic activity [1]. The most common seismic events are Tremor (TR) and Long period (LP) events, which are related to the sustained and transitory fluid flow through the volcanic conduits, respectively. Volcano Tectonic (VT) events are related to rock failure inside the volcanic structure [2]. In recent years, automatic analysis of these events has grown rapidly. This task is not trivial because each volcano has a particular behavior. In many studies, the problem is addressed in two stages: extraction of signal parameters and classification.

Several approaches have been used to address the classification problem of volcanic seismicity, including Support Vector Machines [3], Self Organizing Maps [4], Hidden Markov Models (HMM) [5], Multi-Layer Perceptron (MLP) [6], Gaussian Mixture Models (GMM) [7], among others. These studies show the diversity of techniques for implementing the classification stage.

Although these works achieved good results, it is necessary to improve them to reach acceptable performances in the real time and online applications. One approach is to improve the quality of the events’ descriptors (features). In this sense it is important to underline the specificities of each volcano. The feature extraction process defines which information is extracted from the signals to facilitate their discrimination. Literature on volcanoes presents many features: Linear predictor coding and a signal parameterization to extract information from the waveform can be found in [8]. In [9], autocorrelation functions, obtained using Fast Fourier Transform, represent the spectral contents, and the sum and ratios of the amplitude were used. In other works, wavelet transform gives the possibility to use multiband analysis of the frequencies [3, 10]. Cepstral coefficients were extracted in [11]. The phase information of the events was included to the analysis in [12] and the amplitude statistics (mean, standard deviation, skewness and kurtosis), together with the power of the events were studied in [13].

In addition, many methods have been applied in the literature to perform feature selection in volcanoes [7], like the discriminant method [14], principal component analysis [15] and genetic algorithms [6].

The main motivation of this study is to improve the discrimination capacity of the classifiers by improving the quality of the features. This is why a high number of features were extracted from the events and a selection process, based on a genetic algorithm was performed to evaluate which is the best feature combination for each class.

2 Materials and Methods

Figure 1 depicts the block diagram of the proposed method. The signal database contains events already classified and segmented by experts. The automatic classification stage performs the discrimination between events, according to a subset of features extracted from the data. The optimal subset of features is found using the validation error of the classifier as the unique fitness criterion. This error is used by the genetic algorithm to select new features, searching for the best subset.

Fig. 1.
figure 1

(a) General working structure of the proposed method and (b) “one versus all” structure of the classification step.

2.1 Classified Database

The Llaima volcano is located in the Araucanía Region (38° 41’S–71° 44’W). The signals were obtained from the LAVE station, located 6 km from the crater. Only the Z component was considered. The records belong to the period 2010 to 2013. The events considered were of type LP, TR and VT. A contrast group was created, named “other” (OT) that contained events that did not belong to the first groups (like tectonic earthquakes, noise, ice breaking, etc.). The database had 769 events, 296 LP, 173 TR, 134 VT and 166 OT. The events were stored in files with variable length, according to their duration. The features were extracted from these files, thus, in this work the features were calculated from variable length windows that covered the entire events.

2.2 Feature Extraction

Most of the extracted features were obtained from our research on volcanic seismicity references, however, some of them belong to other processing domains, like speech. The features marked in gray in Table 1 were introduced as new features for this study.

Table 1. Overview of the features extracted from each volcano’s event

All the extracted features were linearly normalized between [− 1, 1] and stored in a matrix of 769 rows (all the events) and 63 columns, each one is a specific feature, according to the order presented in Table 1. Each feature is explained next.

Energy:

measures the size of the event and is calculated as the sum of the square of the samples [3].

LTA/STA:

It is the ration of a long time window average (LTA) and a short time window average (STA) of the event [9]. This ratio is calculated along the signal. When it exceeds a threshold, it may define the beginning of a seismic event. Here it was calculated at the beginning of the event.

STA WAV1-7:

The STA indexes, calculated at the beginning of the event, were also calculated in the spectrum of 7 frequency bands, thus obtaining 7 features: STA WAV1 to STA WAV7. The considered frequency bands are presented in Table 2.

Table 2. Frequency bands for the feature extractions.

WAV1-7 Energy:

This is the relative energy per wavelet band [3]. It is obtained decomposing the event in the 7 bands of Table 2, using a Daubechies mother wavelet type five. The percentage of energy is calculated as the ratio of the energy of the band over the energy of the event.

Variance, Skewness and Kurtosis:

The variance, the skewness and the kurtosis are extracted from the events in the time domain [6]. The variance is the expected value of the squared deviation from the mean of the event. The skewness is a statistical parameter that measures the asymmetry of the distribution of the seismic event. The kurtosis is a statistical parameter that measures the sharpness of the distribution of the seismic event. These features measure the shape of the event.

WAV4 Variance, WAV4 Skewness and WAV4 Kurtosis:

These features are calculated for the 5th frequency band (from 1.56 to 3.13 Hz).

Pitch:

This feature is mainly used in speech processing but it has been applied here to the events. It is related to the lowest frequency at which an object vibrates.

First Circular Moment:

reflects the behavior of the phase part of the volcanic signals [12]. The phase is obtained using the Hilbert transform in the range of [0 2π). This feature is mainly used in biomedical signal processing.

Entropy:

is a measure of the uncertainty of a distribution. It is also a measure of the “quantity of information” contained in a signal.

LPC 1-5:

Linear predictive coding (LPC) is a tool used mostly in audio and speech signal processing, for representing the spectral envelope of a signal in a compressed form [8]. It uses a model based on the fact that a signal can be modeled as the linear combination of N previous samples multiplied by coefficients plus a prediction error. This work considered only 5 of all the possible LPC coefficients.

Maximum Amplitude, Mean and Median:

The maximum amplitude is the maximum value of the segmented event [3, 6]. The mean of the event is obtained from its absolute value. The median calculates the median of the samples of the event. All are extracted in the time domain.

DCT1-13 and LOGDCT1-13:

The discrete cosine transform (DCT) is mainly used in audio and image processing. Like the discrete Fourier transform (DFT), the DCT expresses a signal as the sum of sinusoidal signals, with different frequencies and amplitudes. However, unlike DFT that works with complex exponentials, the DCT only works with cosines. The DCT has a good ability to compact the energy to the transformed domain, that is, it concentrates most of the information in few coefficients. 13 DCT and 13 LOG-DCT were extracted from the events.

Duration:

The duration is extracted from the segmented event in the time domain, counting the number of samples from the start to the end of the event. The automatic extraction of this feature is not trivial, as it requires a start and end detection system. This system is being developed in a parallel work, so for the present study, this characteristic was manually defined by an OVDAS analyst.

Mean of the 5 Frequency Peaks:

The events are transformed using the FFT [3, 6]. The 5 highest peaks are detected and their mean is calculated to obtain this feature.

Skewness Envelope and Kurtosis Envelope:

The envelope of a signal is related to its external form, that is, the slow variations of the amplitude in the time domain. The envelope is obtained using the Hilbert transform of the signal and calculating its absolute value. In this work, the kurtosis and the skewness of the envelope were used as shape features.

2.3 Classification

The classification step was performed using support vector machines (SVM). SVM tackles classification problems by nonlinearly mapping input data into high-dimensional feature spaces, wherein a linear decision hyperplane separates two classes [16]. To do so, SVM transforms the input space where the data is not linearly separable into a higher-dimensional space called a feature space through functions called kernels. One of the most generally used is the RBF Kernel that was applied here.

To train SVM it is necessary to adjust two hyperparameters: σ defines the width of the Gaussian used in the RBF and c determines the trade-off between the complexity of the model and the error tolerated.

Since SVM is a two-class classifier, a “one vs. all” structure was implemented using four classifiers (one per class). The training process of the classifiers applied a 2-fold cross validation strategy (bilateral cross-validation) due to the high computational cost of the feature selection strategy. The validation error of each trained classifier is obtained as the mean of the validation error of the 2-folds.

To evaluate the performance of each classifier, a contingency table was built. Four statistical indices were calculated for each classifier: sensitivity (Se), specificity (Sp), exactitude (Ex) and error (Er), obtained from the following Eqs. 1 to 4.

$$ Se = \frac{TP}{TP + FN}; $$
(1)
$$ Sp = \frac{TN}{TN + FP}; $$
(2)
$$ Ex = \frac{TP + TN}{n}; $$
(3)
$$ Er = \frac{FP + FN}{n} $$
(4)

where TP (true positives) is the number of events correctly classified for each class; TN (true negatives) is the number of events correctly classified as negatives, for each class; FP (false positives) and FN (false negatives) is the number of events classified erroneously and n is the total number of events.

2.4 Feature Selection

A genetic algorithm (GA) [17] performed the search of the best feature subset. The chromosome of each individual was defined as a string of 63 bits, where a “1” represented the presence of the corresponding feature, according to the number of Table 1, and the “0” represented its absence. Each generation had 64 individuals and the maximum number of generations was 200. The first 63 individuals in the initial population had one of the 63 features (diagonal matrix). The last individual (number 64) had all the features set to “1”. The percentage of elitism was set to 20% and mutation to 10%.

The Vasconcelos GA [18] was used, since in previous works it reached better solutions than the traditional GA approach [6]. The main characteristics of this algorithm is that it maintains a highly diverse population along the generations, because the selection and crossover operators combine the genetic information of good and bad solutions. Good individuals survive from one generation to another, through elitism.

The performance of each individual was measured by the validation error of the classifier, trained with the features retrieved by its chromosome. In addition to the validation error, chromosomes with a high number of features were penalized. Therefore, the performance of each individual is given by Eq. 5:

$$ \varvec{Performance} = \varvec{Validation}\;\varvec{Error} + \frac{{\varvec{Number}\;\varvec{of}\;\varvec{Features}}}{{{\mathbf{63}}}}\varvec{*}{\mathbf{25}}\varvec{\% } $$
(5)

3 Results

The simulations were carried out with Matlab of Mathworks, in an Intel Core™ i5 CPU with 3 GHz and 8 GB RAM. The results of the best classifiers, after the 200 generations, are presented in Table 3. This table presents the contingency table and the performance of the best classifiers for each class. It also shows the values of the hyper parameters, for each classifier and the features selected for the best models.

Table 3. Results of the best classifiers after 200 generations.

The “Not Assigned” events correspond to events that were classified positive by more than one classifier or negative by all the classifiers. Thus, the results presented in Table 3 may improve if a confidence step is added to the classifiers. This confidence step has to be able to give an output, even when the classifiers do not agree.

4 Discussion and Conclusions

In a previous work [3], the authors demonstrated that the features selected for the classification of the seismic events of the Villarrica volcano [6], also performed well for the Llaima volcano. However, the global performance decreased from 90 to 80%. This paper presents a study on new characteristics extracted from the seismic and speech processing areas. This study evaluates their impact in the discrimination of four groups of events from the Llaima volcano: the LP, TR, VT and a contrast group, OT.

A feature selection was necessary to define which of the 63 extracted features led to a better performance of the classifiers. It is essential that the set of features that represent the signals is appropriate to the classification problem, so that it divides the space into decision regions associated to the classes to be distinguished. In addition, the number of features has to be reduced to simplify the complexity of the classifier. This is why a feature selection process is needed. Different methods applied to the feature selection process demonstrate the complexity of this task. It is also important to observe that features that are not individually relevant may become so when used in combination with others, and features that are individually relevant may not be useful due to possible redundancies when combined with other. This is why we applied a strategy that evaluates subsets of features. The classification performance was the objective function and the GA performed the search of the best subset. The resulting classifiers reached performances superior to 95%, improving the results of previous works.

LP and VT are more difficult to classify than TR, which needed only two features to reach a high sensitivity. The energy of the low frequencies, from 0.78 to 6.25 Hz are the most important discriminators for LP and VT, while the energy of the 1.56 to 3.13 Hz is the best discriminator for the TR. It is interesting to note the importance of the duration feature, as a good descriptor for LP and OT events. It is also interesting to see that DCT, a feature from the speech domain, was considered important to discriminate the OT group.

The results obtained in this work were very promising, as the addition of new features improved the classification performance. Future works need to go-on in the study of new features and other selection methods. The performances reached in the off-line experiments were considered satisfactory by the OVDAS experts.