Skip to main content

Readable and Accurate Rulesets with ORGA

  • Conference paper
Parallel Problem Solving from Nature – PPSN X (PPSN 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5199))

Included in the following conference series:

  • 3494 Accesses

Abstract

A key task for data mining is to produce accurate and descriptive models. ‘Human readable’ models are often necessary to enable understanding, potentially leading to further insight, and also inducing trust in the user. Rules, or decision trees (if not too numerous or large) are readable, unlike, for example SVM models. However, descriptiveness and accuracy normally conflict; a challenge is to find algorithms that have both high accuracy and high readability. We introduce ORGA (Optimized Ripper using Genetic Algorithm) which hybridizes evolutionary search with the RIPPER ruleset algorithm. RIPPER is effective at producing accurate and readable rulesets, and we show that ORGA provides significant further improvement. ORGA outperforms overall a suitable set of comparative algorithms including implementations of RIPPER, C4.5 and PART. On a majority of the datasets, ORGA’s outperformance of the other algorithms is spectacular, and it is rarely dominated in terms of both accuracy and readability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cohen, W.W.: Fast effective rule induction. In: Machine Learning: Proceedings of the Twelfth International Conference, Lake Tahoe, California (1995)

    Google Scholar 

  2. Cios, K.J., Moore, G.W.: Medical data mining and knowledge discovery: Overview of key issues. In: Cios (ed.) Medical Data Mining and Knowledge Discovery, pp. 1–20. Physica-verlag, New York (2001)

    Google Scholar 

  3. Pagallo, G., Haussler, D.: Boolean feature discovery in empirical learning. Machine Learning 5(1) (1990)

    Google Scholar 

  4. Furnkranz, J., Widmer, G.: Incremental Reduced Error Pruning. In: Cohen, W., Hirsh, H. (eds.) Proceedings of the 11th International Conference on Machine Learning (ML 1994), pp. 70–77. Morgan Kaufmann, New Brunswick (1994)

    Chapter  Google Scholar 

  5. Quinlan, R.: C4.5: Programs for Machine Learning. Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  6. Cohen, W.W., Singer, Y.: A Simple, Fast and Effective Rule Learner (1999)

    Google Scholar 

  7. Turney, P.D.: Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. JAIR 2, 369–409 (1995)

    Google Scholar 

  8. Bala, J., Huang, J., Vafaie, H., DeJong, K., Wechsler, H.: Hybrid learning using genetic algorithms and decision tress for pattern classification. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, Montreal, Canada, pp. 719–724 (1995)

    Google Scholar 

  9. Carvalho, D., Freitas, A.: A hybrid decision tree/genetic algorithm method for data mining. Information Sciences 163(1-3), 13–35 (2004)

    Article  Google Scholar 

  10. Hsu, P.L., Lai, R., Chiu, C.C.: The hybrid of association rule algorithms and genetic algorithms for tree induction: an example of predicting the student course performance. Expert Systems with Applications 25(1), 51–62 (2003)

    Article  Google Scholar 

  11. Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, Heidelberg (2002)

    Book  MATH  Google Scholar 

  12. Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extraction for breast tumor diagnosis. In: IS&T/SPIE 1993 International Symposium on Electronic Imaging: Science and Technology, San Jose, CA, vol. 1905, pp. 861–870 (1993)

    Google Scholar 

  13. Kan, G., Visser, C., Kooler, J., Dunning, A.: Short and long term predictive value of wall motion score in acute myocardial infarction. British Heart Journal 56, 422–427 (1986)

    Article  Google Scholar 

  14. Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., Froelicher, V.: International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology 64, 304–310 (1989)

    Article  Google Scholar 

  15. Diaconis, P., Efron, B.: Computer-Intensive Methods in Statistics. Scientific American 248 (1983)

    Google Scholar 

  16. Michalski, R., Mozetic, I., Hong, J., Lavrac, N.: The Multi-Purpose Incremental Learning System AQ15 and its Testing Applications to Three Medical Domains. In: Proceedings of the Fifth National Conference on Artificial Intelligence, pp. 1041–1045. Morgan Kaufmann, Philadelphia (1986)

    Google Scholar 

  17. Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proc. of the Symp. on Computer Applications and Medical Care, pp. 261–265. IEEE Computer Society Press, Los Alamitos (1988)

    Google Scholar 

  18. Coomans, D., Broeckaert, M., Jonckheer, M., Massart, D.L.: Comparison of Multivariate Discriminant Techniques for Clinical Data - Application to the Thyroid Functional State. Meth. Inform. Med. 22(1983), 93–101 (1983)

    Google Scholar 

  19. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features. In: Machine Learning, Proceedings of the Twelfth International Conference on Machine Learning, pp. 194–202. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  20. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  21. Holt, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)

    Article  MathSciNet  Google Scholar 

  22. Gaines, B.R., Compton, P.: Induction of Ripple-Down Rules Applied to Modeling Large Databases. J. Intell. Inf. Syst. 5(3), 211–228 (1995)

    Article  Google Scholar 

  23. Frank, E., Witten, I.H.: Generating Accurate Rule Sets Without Global Optimization. In: Fifteenth International Conference on Machine Learning, pp. 144–151 (1998)

    Google Scholar 

  24. Kohavi, R.: The Power of Decision Tables. In: 8th ECML, pp. 174–189 (1995)

    Google Scholar 

  25. Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: Weka: Practical machine learning tools and techniques with java implementations. In: Proc. ICONIP/ANZIIS/ANNES 1999 Int. Workshop: Emerging Knowledge Engineering and Connectionist-Based Info. Systems, pp. 192–196 (1999)

    Google Scholar 

  26. Friedman, J.H., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting. The Annals of Statistics 28, 337–374 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  27. Edgington, E.S.: Randomization tests, 3rd edn. Marcel-Dekker, New York (1995)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Daud, M.N.R., Corne, D. (2008). Readable and Accurate Rulesets with ORGA. In: Rudolph, G., Jansen, T., Beume, N., Lucas, S., Poloni, C. (eds) Parallel Problem Solving from Nature – PPSN X. PPSN 2008. Lecture Notes in Computer Science, vol 5199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87700-4_86

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87700-4_86

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87699-1

  • Online ISBN: 978-3-540-87700-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics