Abstract
Semi-supervised learning methods create models from a few labeled instances and a great number of unlabeled instances. They appear as a good option in scenarios where there is a lot of unlabeled data and the process of labeling instances is expensive, such as those where most Web applications stand. This paper proposes a semi-supervised self-training algorithm called Ant-Labeler. Self-training algorithms take advantage of supervised learning algorithms to iteratively learn a model from the labeled instances and then use this model to classify unlabeled instances. The instances that receive labels with high confidence are moved from the unlabeled to the labeled set, and this process is repeated until a stopping criteria is met, such as labeling all unlabeled instances. Ant-Labeler uses an ACO algorithm as the supervised learning method in the self-training procedure to generate interpretable rule-based models—used as an ensemble to ensure accurate predictions. The pheromone matrix is reused across different executions of the ACO algorithm to avoid rebuilding the models from scratch every time the labeled set is updated. Results showed that the proposed algorithm obtains better predictive accuracy than three state-of-the-art algorithms in roughly half of the datasets on which it was tested, and the smaller the number of labeled instances, the better the Ant-Labeler performance.
Similar content being viewed by others
Notes
Refer to (Otero et al. 2013) for more details on the cAnt-Miner\(_{\mathrm {PB}}\) algorithm.
No results for APSSC with 70 and 100 % of labeled data are reported, as the KEEL implementation was not able to generate results for these data configurations.
The running times were observed on a Xeon 2.4 GHz machine with 3.5 GB of RAM.
References
Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M., Ventura, S., Garrell, J., et al. (2009). KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 13(3), 307–318.
Angus, D. (2009). Niching for ant colony optimisation. In A. Lewis, S. Mostaghim, & M. Randall (Eds.), Biologically-inspired optimisation methods, studies in computational intelligence (Vol. 210, pp. 165–188). Heidelberg: Springer.
Arcanjo, F. L., Pappa, G. L., Bicalho, P. V., Meira, W. Jr., & da Silva, A. S. (2011). Semi-supervised genetic programming for classification. In Proceedings of the 13th annual conference on genetic and evolutionary computation (GECCO 2011) (pp. 1259–1266). ACM.
Bache, K., & Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml
Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the 11th annual conference on computational learning theory (COLT ’98) (pp. 92–100). ACM.
Chapelle, O., Schölkopf, B., & Zien, A. (eds) (2010) Semi-Supervised Learning (528 pp.). Cambridge: MIT Press.
Davidson, I., & Ravi, S. (2005). Clustering with constraints: Feasibility issues and the k-means algorithm. In Proceedings of the 2005 SIAM international conference on data mining (SDM05) (pp. 201–211). SIAM.
Freitas, A. A. (2002). Data mining and knowledge discovery with evolutionary algorithms. New York: Springer.
Ginestet, C. (2009). Semisupervised learning for computational linguistics. Journal of the Royal Statistical Society: Series A (Statistics in Society), 172(3), 694–694.
Halder, A., Ghosh, S., & Ghosh, A. (2010). Ant based semi-supervised classification. In Swarm intelligence: 7th international conference (ANTS 2010) (vol. 6234, pp. 376–383). Springer, LNCS.
Halder, A., Ghosh, S., & Ghosh, A. (2013). Aggregation pheromone metaphor for semi-supervised classification. Pattern Recognition, 46(8), 2239–2248.
Hsu, C., & Lin, C. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425.
Joachims, T. (1999). Transductive inference for text classification using support vector machines. In Proceedings of the 16th international conference on machine learning (ICML ’99) (pp. 200–209). Morgan Kaufmann.
Kasabov, N., & Pang, S. (2003). Transductive support vector machines and applications in bioinformatics for promoter recognition. In Proceedings of the 2003 international conference on neural networks and signal processing (ICNNSP) (pp. 1–6). IEEE.
Koutra, D., Ke, T. Y., Kang, U., Chau, D. H., Pao, H. K. K., & Faloutsos, C. (2011). Unifying guilt-by-association approaches: Theorems and fast algorithms. In Machine learning and knowledge discovery in databases: European conference (ECML PKDD 2011) (vol. 6912, pp. 245–260). Springer, LNCS.
Li, M., & Zhou, Z. H. (2007). Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 37(6), 1088–1098.
Li, Y. F., & Zhou, Z. H. (2011). Towards making unlabeled data never hurt. In Proceedings of the 28th international conference on machine learning (ICML ’11) (pp. 1081–1088). ACM.
Martens, D., Baesens, B., & Fawcett, T. (2011). Editorial survey: Swarm intelligence for data mining. Machine Learning, 82(1), 1–42.
Olmo, J. L., Luna, J. M., Romero, J. R., & Ventura, S. (2010). An automatic programming aco-based algorithm for classification rule mining. In Trends in practical applications of agents and multiagent systems: 8th international conference on practical applications of agents and multiagent systems, advances in intelligent and soft computing (vol. 71, pp. 649–656). Springer.
Otero, F., Freitas, A., & Johnson, C. (2008). \(c\)Ant-Miner: An ant colony classification algorithm to cope with continuous attributes. In Ant colony optimization and swarm intelligence: 6th international conference (ANTS 2008) (vol. 5217, pp. 48–59). Springer, LNCS.
Otero, F., Freitas, A., & Johnson, C. (2013). A new sequential covering strategy for inducing classification rules with ant colony algorithms. IEEE Transactions on Evolutionary Computation, 17(1), 64–76.
Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1), 1–39.
Tong, S., & Chang, E. (2001). Support vector machine active learning for image retrieval. In Proceedings of the 9th ACM international conference on multimedia (MULTIMEDIA ’01) (pp. 107–118). ACM.
Triguero, I., Garçia, S., & Herrera, F. (2015). Self-labeled techniques for semi-supervised learning: Taxonomy, software and empirical study. Knowledge and Information Systems, 42(2), 245–284.
Wang, J., Zhao, Y., Wu, X., & Hua, X. S. (2008). Transductive multi-label learning for video concept detection. In Proceedings of the 1st ACM international conference on multimedia information retrieval (MIR ’08) (pp. 298–304). ACM.
Wang, J., Jebara, T., & Chang, S. F. (2013). Semi-supervised learning using greedy max-cut. Journal of Machine Learning Research, 14(1), 771–800.
Xu, X., Lu, L., He, P., Ma, Y., Chen, Q., & Chen, L. (2013). Semi-supervised classification with multiple ants maximal spanning tree. In Proceedings of IEEE/WIC/ACM international joint conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) (pp. 315–320). IEEE.
Zhao, B., Wang, F., & Zhang, C. (2008). CutS3VM: A fast semi-supervised SVM algorithm. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD) (pp. 830–838). ACM.
Zhou, Z. H., & Li, M. (2005). Tri-Training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 17(11), 1529–1541.
Zhu, X., & Goldberg, A. B. (2009). Introduction to semi-supervised learning. San Rafael: Morgan & Claypool.
Acknowledgments
The authors would like to thank the anonymous reviewers and the associate editor for their valuable comments and suggestions. This work was partially supported by the following Brazilian Research Support Agencies: CNPq, FAPEMIG, and CAPES.
Author information
Authors and Affiliations
Corresponding author
Additional information
Julio Albinati and Samuel E. L. Oliveira have contributed equally to this work.
Rights and permissions
About this article
Cite this article
Albinati, J., Oliveira, S.E.L., Otero, F.E.B. et al. An ant colony-based semi-supervised approach for learning classification rules. Swarm Intell 9, 315–341 (2015). https://doi.org/10.1007/s11721-015-0116-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11721-015-0116-8