Noise elimination in inductive concept learning: A case study in medical diagnosis

Gamberger, Dragan; Lavrač, Nada; Džeroski, Sašo

doi:10.1007/3-540-61863-5_47

Dragan Gamberger¹,
Nada Lavrač² &
Sašo Džeroski²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1160))

Included in the following conference series:

International Workshop on Algorithmic Learning Theory

231 Accesses
30 Citations

Abstract

Compression measures used in inductive learners, such as measures based on the MDL (Minimum Description Length) principle, provide a theoretically justified basis for grading candidate hypotheses. Compression-based induction is appropriate also for handling of noisy data. This paper shows that a simple compression measure can be used to detect noisy examples. A technique is proposed in which noisy examples are detected and eliminated from the training set, and a hypothesis is then built from the set of remaining examples. The separation of noise detection and hypothesis formation has the advantage that noisy examples do not influence hypothesis construction as opposed to most standard approaches to noise handling in which the learner typically tries to avoid overfitting the noisy example set. This noise elimination method is applied to a problem of early diagnosis of rheumatic diseases which is known to be a difficult problem, due both to its nature and to the imperfections in the dataset. The method is evaluated by applying the noise elimination algorithm in conjunction with the CN2 rule induction algorithm, and by comparing their performance to earlier results obtained by CN2 in this diagnostic domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cestnik, B., and Bratko, I. (1991). On estimating probabilities in tree pruning. In Proc. 5th European Working Session on Learning, pages 138–150. Springer, Berlin.
Google Scholar
Clark, P. and Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4):261–283.
Google Scholar
Clark, P. and Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Proc. 5th European Working Session on Learning, pages 151–163. Springer, Berlin.
Google Scholar
Džeroski, S., Cestnik, B., and Petrovski, I. (1993) Using the m-estimate in rule induction. Journal of Computing and Information Technology, 1:37–46.
Google Scholar
Fayyad, U.M. and Irani, K.B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8:87–102.
Google Scholar
Gamberger, D. (1995). A minimization approach to propositional inductive learning. In Proc. 8th European Conference on Machine Learning, pages 151–160. Springer, Berlin.
Google Scholar
Gamberger, D., and Lavrač, N. (1996). Noise elimination in inductive learning. Technical report IJS-DP-7400, J. Stefan Institute, Ljubljana, 1996.
Google Scholar
Karalič, A., and Pirnat, V. (1990). Machine learning in rheumatology. Sistemica 1(2):113–123.
Google Scholar
Kononenko, I. and Bratko, I. (1991). Information-based evaluation criterion for classifier's performance. Machine Learning, 6(1):67–80.
Google Scholar
Kononenko, I., and Kukar, M. (1995). Machine learning for medical diagnosis. In Proc. Workshop on Computer-Aided Data Analysis in Medicine, US Scientific Publishing, IJS-SP-95-1, Ljubljana.
Google Scholar
Lavrač, N., Džeroski, S., Pirnat, V., and Križman, V. (1993). The utility of background knowledge in learning medical diagnostic rules. Applied Artificial Intelligence, 7:273–293.
Google Scholar
Lavrač, N. and Džeroski, S. (1994). Inductive Logic Programming: Techniques and Applications. Ellis Horwood, Chichester.
Google Scholar
Lavrač, N., Gamberger, D., and Džeroski, S. (1995). An Approach to Dimensionality Reduction in Learning from Deductive Databases. In Proc. 5th International Workshop on Inductive Logic Programming. Katholieke Universiteit Leuven, 1995.
Google Scholar
Lavrač, N., Džeroski, S., and Bratko, I. (1996). Handling imperfect data in inductive logic programming. In L. De Raedt (ed.) Advances in Inductive Logic Programming. pages 48–64. IOS Press, Amsterdam.
Google Scholar
Michalski, R., Carbonell, J., and Mitchell, T., editors (1983). Machine Learning: An Artificial Intelligence Approach, volume I. Tioga, Palo Alto, CA.
Google Scholar
Michalski, R., Mozetič, I., Hong, J., and Lavrač, N. (1986). The multi-purpose incremental learning system AQ15 and its testing application on three medical domains. In Proc. Fifth National Conference on Artificial Intelligence, 1041–1045. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Michie, D., Spiegelhalter, D.J., and Taylor, C.C., editors (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood, Chichester.
Google Scholar
Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4(2):227–243.
Article Google Scholar
Mingers, J. (1989). An empirical comparison of selection measures for decisiontree induction. Machine Learning, 3(4):319–342.
Google Scholar
Muggleton, S., editor (1992). Inductive Logic Programming. Academic Press, London.
Google Scholar
Muggleton, S., Srinivasan, A., and Bain, M. (1992). Compression, significance and accuracy. In Proc. 9th International Conference on Machine Learning, 338–347. Morgan Kaufmann, San Mateo, CA.
Google Scholar
Pirnat, V., Kononenko, I., Janc, T., and Bratko, I. (1989). Medical analysis of automatically induced rules. In Proc. 2nd European Conference on Artificial Intelligence in Medicine pages 24–36. Springer, Berlin.
Google Scholar
Quinlan, J.R. (1987) Simplifying decision trees. International Journal of ManMachine Studies, 27(3):221–234.
Google Scholar
J.R. Quinlan.(1990) Learning logical definitions from relations. Machine Learning, 5(3):239–266.
Google Scholar
J. Rissanen. (1978) Modeling by shortest data description. Automatica, 14:465–471.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Rudjer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia
Dragan Gamberger
Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Nada Lavrač & Sašo Džeroski

Authors

Dragan Gamberger
View author publications
You can also search for this author in PubMed Google Scholar
Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar
Sašo Džeroski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Setsuo Arikawa Arun K. Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gamberger, D., Lavrač, N., Džeroski, S. (1996). Noise elimination in inductive concept learning: A case study in medical diagnosis. In: Arikawa, S., Sharma, A.K. (eds) Algorithmic Learning Theory. ALT 1996. Lecture Notes in Computer Science, vol 1160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61863-5_47

Download citation

DOI: https://doi.org/10.1007/3-540-61863-5_47
Published: 03 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61863-8
Online ISBN: 978-3-540-70719-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics