MeLiF+: Optimization of Filter Ensemble Algorithm with Parallel Computing

Isaev, Ilya; Smetannikov, Ivan

doi:10.1007/978-3-319-44944-9_29

Ilya Isaev¹⁷ &
Ivan Smetannikov¹⁷

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 475))

Included in the following conference series:

IFIP International Conference on Artificial Intelligence Applications and Innovations

2677 Accesses
1 Citations

Abstract

Search of algorithms ensemble – that is, best algorithms combination is common used approach in machine learning. MeLiF algorithm uses this technique for filter feature selection. In our research we proposed parallel version of this algorithm and showed that it is not only improves algorithm performance significantly, but also improves feature selection quality.

You have full access to this open access chapter, Download conference paper PDF

Swarm MeLiF: Feature Selection with Filter Combination Found via Swarm Intelligence

Parallel Feature Selection Approaches for High Dimensional Data: A Survey

Development of ensemble learning classification with density peak decomposition-based evolutionary multi-objective optimization

Article 07 April 2021

Keywords

1 Introduction

In modern world, machine learning became one of the most promising and studied science areas, mainly, because of its universal application to any data-related problem. One example of such an area is bioinformatics [3, 4, 6, 10], which produces giant amount of data about gene expression of different organisms. This data could potentially allow to determine which DNA pieces are responsible for some visual change of indiviual, or for reactions to particular environment change. The main problem of such data is its huge number of features and relatively low amount of objects. Because of high-dimensional space, it is very hard to build a model which generalizes such data well. Furthermore, a lot of features in such datasets have nothing in common with results, so, they should be treated as noize.

$$\begin{aligned} A^*=4 \end{aligned}$$

It seems to be logical in this case to select somehow the most relevant features and to learn a classifier on these only. This idea is implemented in such area of machine learning as feature selection. There are three main methods of feature selection: filter selection based on statistical measures of every single feature or features subsets, wrapper selection based on subspace search with classifier result as an optimization measure, and embedded selection that uses classificators inner properties [12].

The main peculiarity of filter methods is their speed. This leads to the fact that they are frequently used for preprocessing, and resulting subsets of features further passed to other wrapper or embedded method. This is especially important for bioinformatics, where number of features in datasets is sometimes dozens and hundrends of thousands.

These days, many machine learning algorithms use ensembling [1, 4, 8]. MeLiF algorithm [13] tries to apply this method to feature selection. It builds a linear combination of basic filters, that selects the most relevant features. MeLiF has a structural characteristic that it can be easily modified to work in concurrent or distributed manner. At this research, we implemented parallel version of MeLiF called MeLiF+ and achieved significant speed improvement without losing in selection quality.

The remainder of the paper is organized as follows: MeLiF algorithm is described in Sect. 2, parallelization scheme is proposed in Sect. 3, experiment setup and used quality measures are outlined in Sect. 4, and finally experiment results are contained in Sect. 5.

2 MeLiF

Algorithm treats some linear combinations of basic filters as starting points. It has been observed during experiments that the best option is this following choice of starting points: (1, 0, ..., 0), (0, 1, ..., 0), ..., (0, 0, ..., 1) – only one basic filter matters at the beginning, and (1, 1, ..., 1) – all basic filters are equal at the beginning. Algorithm iterates over the starting points and tries to shift each coordinate value to small constants $+\delta $ and $-\delta $ – value of grid spacing for each point. If some of applied changes succeed, i.e. quality measure for a point after a shift is greater than the maximum value: the algorithm chooses that point and starts searching from its first coordinate. If, all coordinates were shifted to $+\delta $ and $-\delta $ and no quality improvement observed, algorithm stops.

Then, for each point obtained during coordinate descent, the algorithm measures value of resulting linear combination of basic filters for each feature in dataset. After that, results are sorted, and the algorithm selects N best features. Then, the algorithm runs some classifier only with that feature subset. The obtained result is saved for comparing with other points and caching. It helps to reduce working time due to visited points usage.

3 MeLiF+

We proposed the following improvements to the MeLiF method: each starting point is processed in a distinct thread with global maximum maintained through synchronization point. Moreover, evaluate submethod is run concurrently for $+\delta $ and $-\delta $, and selects the best point after retrieving both results. We showed that it not only improves the algorithm performance on multicore system, but also usually improves feature selection quality.

This fact has the following explanation: the original MeLiF algorithm is greedy, so it assumes that if each point it steps in is a local optimum then resulting point will be the global optimum, adding an ability to lookup for two deltas simultaneously allows algorithm to select better local optimum. Also, as starting points are processed in parallel, one thread can find a local optimum. This causes other threads to stop their work even if further descent leads to the better result. This can cause different selection result, better or worse (both cases are presented in Sect. 5), but experiments show that avarage MeLiF+ results are better.

4 Experiments

We used SVM [5] from WEKA [14] library, with polynomial kernel and soft margin parameter $C = 1$ as classifier. To improve stability, we used 5-fold cross-validation. The number of selected features was constant: $N=100$. In order to compare our method with the old one, we used $F_1$ score [11] of SVM classifier. As we wanted to know how much our method differs from the original one in terms of space search strategy, we calculated z-score for each dataset.

We ran our experiments on a machine with following characteristics: 32-core CPU AMD Opteron 6272 @ 2.1 GHz, 128 GB RAM. We used $N=50$ threads, $N=2\cdot p\cdot f$ threads, where p is the number of starting points, f is the number of folders used for cross-validation.

As basic filters, we used Spearman Rank Correlation (SPC), Symmetric Uncertainty (SU), Fit Criterion (FC) and Value Difference Metric (VDM) [2, 9]. For each dataset, we executed MeLiF and MeLiF+ and stored their working time and points with the best classification result.

We used 50 datasets of different sizes: 33 datasets have been taken from Gene Expression Omnibus, 5 from Kent Ridge Bio-Medical Dataset, 5 from RSCTC’2010 Discovery Challenge, 4 from Broad institute Cancer Program Data Sets, 3 from Feature Selection Datasets at Arizona State University. Some datasets were multi-labeled, therefore we splitted them into several derivative binary datasets with commonly used one-versus-all technique. Then we excluded datasets that contained too few instances of one of the classes. After that, we used standard feature scaling and discretized all features to 11 different values from −5 to 5.

5 Results

Table below contains experiment results. All the datasets are sorted by their total size which is basically a multiplication of their features and objects number. In $F_1$ score comparison of MeLiF and MeLiF+ better results for each dataset are highlighted in grey, equal results are not highlighted. Runtime is presented in seconds. At the last column, z-score is provided.

Table 1. MeLiF in comparison with swarm algorithms

Full size table

As it can be seen from the table above, MeLiF+ is always at least 3 times faster than the MeLiF, and this difference gets up to 6 times for some datasets. Although MeLiF and MeLiF+ have almost the same results in $F_1$ score, there is some difference in their work on 15 datasets as provided via z-score. But only in 5 cases MeLiF+ had worse results than original the MeLiF algorithm. But on 36 datasets, they performed equally and at 11 datasets new algorithm outperformed the original one.

6 Conclusion

The proposed parallelization scheme made algorithm in average to work 5.5 times faster without affecting selection quality. Unforunately, in this research we did not achieved linear speed improvement because of the fixed maximum of parallel processed points. In our future work, we are planning to use threads pool which is limited by the testing system and achieve linear speed growth with using exploration and exploitation [7] strategy to spread the search points in the search space. Also this should lead to high increase in optimized measure.

References

Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)
Article Google Scholar
Auffarth, B., López, M., Cerquides, J.: Comparison of redundancy and relevance measures for feature selection in tissue classification of CT images. In: Perner, P. (ed.) ICDM 2010. LNCS, vol. 6171, pp. 248–262. Springer, Heidelberg (2010)
Google Scholar
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inform. Sci. 282, 111–135 (2014)
Article Google Scholar
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: An ensemble of filters and classifiers for microarray data classification. Pattern Recogn. 45(1), 531–539 (2012)
Article Google Scholar
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)
Article Google Scholar
Chuang, L.Y., Yang, C.H., Wu, K.C., Yang, C.H.: A hybrid feature selection method for dna microarray data. Comput. Biol. Med. 41(4), 228–237 (2011)
Article Google Scholar
Desautels, T., Krause, A., Burdick, J.W.: Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization. J. Mach. Learn. Res. 15(1), 3873–3923 (2014)
MathSciNet MATH Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Meyers, R.A. (ed.) Multiple Classifier Systems, pp. 1–15. Springer, New York (2000)
Chapter Google Scholar
Filchenkov, A., Dolganov, V., Smetannikov, I.: Pca-based algorithm for constructing ensembles of feature ranking filters. In: Proceedings of ESANN Conference, pp. 201–206 (2015)
Google Scholar
Haury, A.C., Gestraud, P., Vert, J.P.: The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PloS ONE 6(12), e28210 (2011)
Article Google Scholar
Huang, H., Xu, H., Wang, X., Silamu, W.: Maximum f1-score discriminative training criterion for automatic mispronunciation detection. Trans. Audio Speech Lang. Process. 23(4), 787–797 (2015)
Article Google Scholar
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Article Google Scholar
Smetannikov, I., Filchenkov, A.: MeLiF: filter ensemble learning algorithm for gene selection. In: Advanced Science Letters. American Scientific Publisher (2016, to appear)
Google Scholar
Waikato, T.U.: Weka 3: Data Mining Software in Java (2016). http://www.cs.waikato.ac.nz/ml/weka/. Accessed 7 May 2016

Download references

Acknowledgements

Authors would like to thank Julia Ugarkina and Andrey Filchenkov for useful comments and proofreading. This work was financially supported by the Government of Russian Federation, Grant 074-U01.

Author information

Authors and Affiliations

Computer Science Department, ITMO University, 49 Kronverksky Pr., 197101, St. Petersburg, Russia
Ilya Isaev & Ivan Smetannikov

Authors

Ilya Isaev
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Smetannikov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan Smetannikov .

Editor information

Editors and Affiliations

Democritus University of Thrace , Thessaloniki, Greece
Lazaros Iliadis
University of Piraeus , Piraeus, Greece
Ilias Maglogiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Isaev, I., Smetannikov, I. (2016). MeLiF+: Optimization of Filter Ensemble Algorithm with Parallel Computing. In: Iliadis, L., Maglogiannis, I. (eds) Artificial Intelligence Applications and Innovations. AIAI 2016. IFIP Advances in Information and Communication Technology, vol 475. Springer, Cham. https://doi.org/10.1007/978-3-319-44944-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-44944-9_29
Published: 02 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44943-2
Online ISBN: 978-3-319-44944-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MeLiF+: Optimization of Filter Ensemble Algorithm with Parallel Computing

Abstract

Similar content being viewed by others

Swarm MeLiF: Feature Selection with Filter Combination Found via Swarm Intelligence

Parallel Feature Selection Approaches for High Dimensional Data: A Survey

Development of ensemble learning classification with density peak decomposition-based evolutionary multi-objective optimization

Keywords

1 Introduction

2 MeLiF

3 MeLiF+

4 Experiments

5 Results

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

MeLiF+: Optimization of Filter Ensemble Algorithm with Parallel Computing

Abstract

Similar content being viewed by others

Swarm MeLiF: Feature Selection with Filter Combination Found via Swarm Intelligence

Parallel Feature Selection Approaches for High Dimensional Data: A Survey

Development of ensemble learning classification with density peak decomposition-based evolutionary multi-objective optimization

Keywords

1 Introduction

2 MeLiF

3 MeLiF+

4 Experiments

5 Results

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation