Abstract
In a recent years, recommending an appropriate attribute of binary decision tree under unusual circumstances – such as training or testing with noisy attribute, has become more challenge in researching. Since, most of traditional impurity measurements have never been tested how much they can tolerate with encountered noisy cases. Consequently, this paper studies and proposes an impurity measurement which can be used to evaluate the goodness of binary decision tree node split under noisy situation, accurately. In order to make sure that the accuracy of decision tree classification by using the proposed measurement has been yet preserved, setting up an experiment to compare with the traditional impurity measures was conducted. And the result shows that accuracy of the proposed measurement in classifying a class under noisy case is acceptable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Michael, J.A., Berry, G.S., Linoff, G.S.: Mastering Data Mining. Wiley, New York (2000)
Pang-Ning, T., Michael, S., Vipin, K.: Introduction to Data Mining. Addison Wesley, Boston (2000)
Morgan, J.N., Sonquist, J.A.: Problems in the analysis of survey data and a proposal. J. Am. Stat. Assoc. 58(302), 415–434 (1963)
Kass, G.V.: An exploratory technique for investigation large quantities of categorical data. Appl. Stat. 29, 119–127 (1980)
Quinlan, J.R.: Introduction of decision tree. J. Mach. Learn. 1(1), 81–106 (1986)
Elomaa, T., Rousu, J.: General and efficient multi splitting of numerical attributes. J. Mach. Learn. 36(3), 201–244 (1999)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufman Publishers, Burlington (1993)
Chandra, B., Kothari, R., Paul, P.: A new node splitting measure for decision tree construction. J. Pattern Recognit. 43(8), 2725–2731 (2010)
Singdong, W., Vipin, K.: The Top Ten Algorithm in Data Mining. CRC Press, Boca Raton (1984)
Morgan, J.N.: THAID: A Sequential Analysis Program for Analysis of Nominal Scale Dependent Variables. Survey Research Center, Institute for Social Research, University of Michigan (1973)
Luis, P.F.G., Andre, C.P.L.F., Ana, C.L.: Effect of label noise in the complexity of classification problems. Neurocomputing 160, 108–119 (2015)
Jakramate, B.: A generalized label noise model for classification in the presence of annotation errors. Neurocomputing 192, 61–71 (2016)
Alexandros, N., Apostos, N., Yannis, M.: Robust classification based on correlations between attributes. In: Data Warehousing and Mining: Concepts, Methodologies, Tools and Applications, vol. 3, pp. 3212–3221. IGI Global (2008)
Yang, H., Fong, S.: Moderated VFDT in stream mining using adaptive tie threshold and incremental pruning. In: 13th International Conference on Data Warehousing and Knowledge Discovery. LNCS, pp. 471–483, Springer, Berlin (2011)
Hang, Y., Simon, F.: Incrementally optimized decision tree for noisy big data. In: 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Application. pp. 36–44. ACM, New York (2012)
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03), ACM, pp. 523–528, New York (2003)
Carla, E.B., Mark, A.F.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)
Frenay, B., Michel, V.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25, 845–869 (2014)
Aritra, G., Manwani, N., Sastry, P.S.: Making risk minimization tolerant to label noise. Neurocomputing 160, 93–107 (2015)
Aritra, G., Manwani, N., Sastry, P.S.: On the Robustness of Decision Tree Learning Under Label Noise. Math Pubs Publication (2016)
Frank, E.G., Beck, G.: Extension of sample size and percentage points for significant tests of outlier observation. Technometrics 14(4), 847–854 (1972)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Srisura, B. (2018). Impurity Measurement in Selecting Decision Node Tree that Tolerate Noisy Cases. In: Meesad, P., Sodsee, S., Unger, H. (eds) Recent Advances in Information and Communication Technology 2017. IC2IT 2017. Advances in Intelligent Systems and Computing, vol 566. Springer, Cham. https://doi.org/10.1007/978-3-319-60663-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-60663-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60662-0
Online ISBN: 978-3-319-60663-7
eBook Packages: EngineeringEngineering (R0)