Abstract
In this paper, a new genetic algorithm (GAR-SD\(^{+})\) for subgroup discovery tasks is described. The main feature of this new method is that it can work with both discrete and continuous attributes without previous discretization. The ranges of numeric attributes are obtained in the rules induction process itself. In this way, we ensure that these intervals are the most suitable for maximizing the quality measures. An experimental study was carried out to verify the performance of the method. GAR-SD\(^{+}\) was compared with other subgroup discovery methods by evaluating certain measures (such as number of rules, number of attributes, significance, unusualness, support and confidence). For subgroup discovery tasks, GAR-SD\(^{+}\) obtained good results compared with existing algorithms.
Similar content being viewed by others
References
Alcalá-Fdez J, Sánchez L, García S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(13):307–318
Atzmüller M, Puppe F (2006) SD-Map a fast algorithm for exhaustive subgroup discovery. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD-06), pp 6–17
Bay SD, Pazzani MJ (2001) Detecting group differences. Mining contrast sets. Data Min Knowl Discov 5(3):213–246
Berlanga F, del Jesus MJ, González P, Herrera F, Mesonero M (2006) Multiobjective evolutionary induction of subgroup discovery fuzzy rules: a case study in marketing. In: Perner P (ed) ICDM 2006. LNCS, vol 4065. Springer, pp 337–349 (2006)
Carmona CJ, González P, del Jesús MJ, Herrera F (2010) NMEEF-SD: non-dominated multi-objective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans Fuzzy Syst 18(5):958–970
Chen MY (2014) A high-order fuzzy time series forecasting model for Internet stock trading. Future Gen Comput Syst—Int J Grid Comput eSci 37:461–467
Chen MY (2013) A hybrid ANFIS model for business failure prediction–utilization of particle swarm optimization and subtractive clustering. Inf Sci 220:180–195
Chen MY, Fan MH, Chen YL, Wei HM (2013) Design of experiments on neural network’s parameters optimization for time series forecasting in stock markets. Neural Netw World 23(4):369–393
del Jesús MJ, González P, Herrera F, Mesonero M (2007) Evolutionary fuzzy rule induction process for subgroup discovery. A case study in marketing. IEEE Trans Fuzzy Syst 15(4):578–592
Dong G , Li J (1999) Efficient mining of emerging patterns. Discovering trends and differences. In: Proccedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 43–52
Fayyad U, Irani KB (1990) Multi-interval discretization of continuous-valued attributes for classification learning. In: 13th international joint conference on artificial intelligence, pp 1022–1029
Guan Y-Y, Wang H-K, Wang Y, Yang F (2009) Attribute reduction and optimal decision rules acquisition for continuous valued information systems. Inf Sci 179:2974–2984 (8/5)
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Wesley Longman, Reading
Grosskreutz H, Rüping S (2009) On subgroup discovery in numerical domains. Data Min Knowl Discov 19:210–226
Grosskreutz H, Rüping S, Wrobel S (2008) Tight optimistic estimates for fast subgroup discovery. In: Proceedings of the ECML/PKDD. Lecture notes in artificial intelligence, vol 5211. Springer, pp 440–456
Herrera F (2008) Genetic fuzzy systems: taxonomy, current research trends and propects. Evolut Intell 1:27–46
Kavsek B, Lavrač N (2006) APRIORI-SD: adapting association rule learning to subgroup discovery. Appl Artif Intell 20(7):543–583
Klösgen W, May M (2002) Spatial subgroup mining integrated in an object-relational spatial database. In Proccedings of the 6th European conference on principles and practice of KDD, pp 275–286
Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Advances in knowledge discovery and data mining, pp 249–271
Lavrač N, Flach P, Zupan B (1999) Rule evaluation measures: a unifying view. In: Proceedings of the 9th international workshop on inductive logic programming (ILP-99). LNCS, vol 1634. Springer, pp 174–183
Lavrač N, Kavsek B, Flach P, Todorovski L (2004) Subgroup discovery with CN2-SD. J Mach Learn Res 5:153–188
Lemmerich F, Puppe F (2011) Local models for expectation-driven subgroup discovery. In: Proceedings of the IEEE international conference on data mining (ICDM). IEEE, Washington, DC, pp 360–369
Lemmerich F, Rohlfs M, Atzmueller M (2010) Fast discovery of relevant subgroup patterns. In: Proceedings of the 23rd international FLAIRS conference. AAAI Press, pp 428–433
Mata J, Alvarez JL, Riquelme JC (2002) Discovering numeric association rules via evolutionary algorithm. In: Proccedings of the of PAKDD 2002. Springer, pp 40–51
Novak PK, Lavrač N, Webb GI (2009) Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J Mach Learn Res 10:377–403
Pachón V, Mata J (2012) An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization. Expert Syst Appl 39(1):585–593
Pachón V, Mata J, Domínguez JL, Maña MJ (2011) A multi-objective evolutionary approach for subgroup discovery. In: Corchado E, Kurzynski M, Wozniak M (eds) Proceedings of the 6th international conference on hybrid artificial intelligent systems–volume part II (HAIS’11). Springer, Berlin, Heidelberg, pp 271–279
Rodríguez D, Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2012) Searching for rules to detect defective modules: a subgroup discovery approach. Inf Sci 191(15):14–30
Terlecki P, Walczak K (2007) On the relation between rough set reducts and jumping emerging patterns. Inf Sci 177:74–83 (1/1)
Tsai C-J, Lee C-I, Yang W-P (2008) A discretization algorithm based on class-attribute contingency coefficient. Inf Sci 178:714–731 (2/1)
Wrobel S (1997) An algorithm for multi-relational discovery of subgroups. In: Proccedings of the 1st European conference on principles of data mining and knowledge discovery (PKDD-97), pp 78–87
Zelezny F, Lavrač N (2006) Propositionalization-based relational subgroup discovery with RSD. Mach Learn 62:33–63
Acknowledgments
This work was partially funded by the Regional Government of Andalusia (Junta de Andalucía, Grant Number TIC-7629) and the Spanish Ministry of Economy and Competitiveness (Grant Number TIN2013-47153-C3-2-R).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers? bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial inter-est (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Pachón, V., Mata, J. & Domínguez, J.L. Searching for the most significant rules: an evolutionary approach for subgroup discovery. Soft Comput 21, 2609–2618 (2017). https://doi.org/10.1007/s00500-015-1961-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-015-1961-5