Impurity Measurement in Selecting Decision Node Tree that Tolerate Noisy Cases

Srisura, Benjawan

doi:10.1007/978-3-319-60663-7_2

Benjawan Srisura¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 566))

Included in the following conference series:

International Conference on Computing and Information Technology

821 Accesses

Abstract

In a recent years, recommending an appropriate attribute of binary decision tree under unusual circumstances – such as training or testing with noisy attribute, has become more challenge in researching. Since, most of traditional impurity measurements have never been tested how much they can tolerate with encountered noisy cases. Consequently, this paper studies and proposes an impurity measurement which can be used to evaluate the goodness of binary decision tree node split under noisy situation, accurately. In order to make sure that the accuracy of decision tree classification by using the proposed measurement has been yet preserved, setting up an experiment to compare with the traditional impurity measures was conducted. And the result shows that accuracy of the proposed measurement in classifying a class under noisy case is acceptable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Michael, J.A., Berry, G.S., Linoff, G.S.: Mastering Data Mining. Wiley, New York (2000)
Google Scholar
Pang-Ning, T., Michael, S., Vipin, K.: Introduction to Data Mining. Addison Wesley, Boston (2000)
Google Scholar
Morgan, J.N., Sonquist, J.A.: Problems in the analysis of survey data and a proposal. J. Am. Stat. Assoc. 58(302), 415–434 (1963)
Article MATH Google Scholar
Kass, G.V.: An exploratory technique for investigation large quantities of categorical data. Appl. Stat. 29, 119–127 (1980)
Article Google Scholar
Quinlan, J.R.: Introduction of decision tree. J. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Elomaa, T., Rousu, J.: General and efficient multi splitting of numerical attributes. J. Mach. Learn. 36(3), 201–244 (1999)
Article MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufman Publishers, Burlington (1993)
Google Scholar
Chandra, B., Kothari, R., Paul, P.: A new node splitting measure for decision tree construction. J. Pattern Recognit. 43(8), 2725–2731 (2010)
Article MATH Google Scholar
Singdong, W., Vipin, K.: The Top Ten Algorithm in Data Mining. CRC Press, Boca Raton (1984)
Google Scholar
Morgan, J.N.: THAID: A Sequential Analysis Program for Analysis of Nominal Scale Dependent Variables. Survey Research Center, Institute for Social Research, University of Michigan (1973)
Google Scholar
Luis, P.F.G., Andre, C.P.L.F., Ana, C.L.: Effect of label noise in the complexity of classification problems. Neurocomputing 160, 108–119 (2015)
Article Google Scholar
Jakramate, B.: A generalized label noise model for classification in the presence of annotation errors. Neurocomputing 192, 61–71 (2016)
Article Google Scholar
Alexandros, N., Apostos, N., Yannis, M.: Robust classification based on correlations between attributes. In: Data Warehousing and Mining: Concepts, Methodologies, Tools and Applications, vol. 3, pp. 3212–3221. IGI Global (2008)
Google Scholar
Yang, H., Fong, S.: Moderated VFDT in stream mining using adaptive tie threshold and incremental pruning. In: 13th International Conference on Data Warehousing and Knowledge Discovery. LNCS, pp. 471–483, Springer, Berlin (2011)
Google Scholar
Hang, Y., Simon, F.: Incrementally optimized decision tree for noisy big data. In: 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Application. pp. 36–44. ACM, New York (2012)
Google Scholar
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03), ACM, pp. 523–528, New York (2003)
Google Scholar
Carla, E.B., Mark, A.F.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)
MATH Google Scholar
Frenay, B., Michel, V.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25, 845–869 (2014)
Article Google Scholar
Aritra, G., Manwani, N., Sastry, P.S.: Making risk minimization tolerant to label noise. Neurocomputing 160, 93–107 (2015)
Article Google Scholar
Aritra, G., Manwani, N., Sastry, P.S.: On the Robustness of Decision Tree Learning Under Label Noise. Math Pubs Publication (2016)
Google Scholar
Frank, E.G., Beck, G.: Extension of sample size and percentage points for significant tests of outlier observation. Technometrics 14(4), 847–854 (1972)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Information Technology Laboratory, Vincent Mary School of Science and Technology, Assumption University, Bangkok, Thailand
Benjawan Srisura

Authors

Benjawan Srisura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjawan Srisura .

Editor information

Editors and Affiliations

Faculty of Information Technology, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
Phayung Meesad
Faculty of Information Technology, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
Sunantha Sodsee
Lehrgebiet Kommunikationsnetze, FernUniversität in Hagen, Hagen, Germany
Herwig Unger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Srisura, B. (2018). Impurity Measurement in Selecting Decision Node Tree that Tolerate Noisy Cases. In: Meesad, P., Sodsee, S., Unger, H. (eds) Recent Advances in Information and Communication Technology 2017. IC2IT 2017. Advances in Intelligent Systems and Computing, vol 566. Springer, Cham. https://doi.org/10.1007/978-3-319-60663-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-60663-7_2
Published: 20 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60662-0
Online ISBN: 978-3-319-60663-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics