Skip to main content

Impurity Measurement in Selecting Decision Node Tree that Tolerate Noisy Cases

  • Conference paper
  • First Online:
Recent Advances in Information and Communication Technology 2017 (IC2IT 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 566))

Included in the following conference series:

  • 821 Accesses

Abstract

In a recent years, recommending an appropriate attribute of binary decision tree under unusual circumstances – such as training or testing with noisy attribute, has become more challenge in researching. Since, most of traditional impurity measurements have never been tested how much they can tolerate with encountered noisy cases. Consequently, this paper studies and proposes an impurity measurement which can be used to evaluate the goodness of binary decision tree node split under noisy situation, accurately. In order to make sure that the accuracy of decision tree classification by using the proposed measurement has been yet preserved, setting up an experiment to compare with the traditional impurity measures was conducted. And the result shows that accuracy of the proposed measurement in classifying a class under noisy case is acceptable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Michael, J.A., Berry, G.S., Linoff, G.S.: Mastering Data Mining. Wiley, New York (2000)

    Google Scholar 

  2. Pang-Ning, T., Michael, S., Vipin, K.: Introduction to Data Mining. Addison Wesley, Boston (2000)

    Google Scholar 

  3. Morgan, J.N., Sonquist, J.A.: Problems in the analysis of survey data and a proposal. J. Am. Stat. Assoc. 58(302), 415–434 (1963)

    Article  MATH  Google Scholar 

  4. Kass, G.V.: An exploratory technique for investigation large quantities of categorical data. Appl. Stat. 29, 119–127 (1980)

    Article  Google Scholar 

  5. Quinlan, J.R.: Introduction of decision tree. J. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  6. Elomaa, T., Rousu, J.: General and efficient multi splitting of numerical attributes. J. Mach. Learn. 36(3), 201–244 (1999)

    Article  MATH  Google Scholar 

  7. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufman Publishers, Burlington (1993)

    Google Scholar 

  8. Chandra, B., Kothari, R., Paul, P.: A new node splitting measure for decision tree construction. J. Pattern Recognit. 43(8), 2725–2731 (2010)

    Article  MATH  Google Scholar 

  9. Singdong, W., Vipin, K.: The Top Ten Algorithm in Data Mining. CRC Press, Boca Raton (1984)

    Google Scholar 

  10. Morgan, J.N.: THAID: A Sequential Analysis Program for Analysis of Nominal Scale Dependent Variables. Survey Research Center, Institute for Social Research, University of Michigan (1973)

    Google Scholar 

  11. Luis, P.F.G., Andre, C.P.L.F., Ana, C.L.: Effect of label noise in the complexity of classification problems. Neurocomputing 160, 108–119 (2015)

    Article  Google Scholar 

  12. Jakramate, B.: A generalized label noise model for classification in the presence of annotation errors. Neurocomputing 192, 61–71 (2016)

    Article  Google Scholar 

  13. Alexandros, N., Apostos, N., Yannis, M.: Robust classification based on correlations between attributes. In: Data Warehousing and Mining: Concepts, Methodologies, Tools and Applications, vol. 3, pp. 3212–3221. IGI Global (2008)

    Google Scholar 

  14. Yang, H., Fong, S.: Moderated VFDT in stream mining using adaptive tie threshold and incremental pruning. In: 13th International Conference on Data Warehousing and Knowledge Discovery. LNCS, pp. 471–483, Springer, Berlin (2011)

    Google Scholar 

  15. Hang, Y., Simon, F.: Incrementally optimized decision tree for noisy big data. In: 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Application. pp. 36–44. ACM, New York (2012)

    Google Scholar 

  16. Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03), ACM, pp. 523–528, New York (2003)

    Google Scholar 

  17. Carla, E.B., Mark, A.F.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)

    MATH  Google Scholar 

  18. Frenay, B., Michel, V.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25, 845–869 (2014)

    Article  Google Scholar 

  19. Aritra, G., Manwani, N., Sastry, P.S.: Making risk minimization tolerant to label noise. Neurocomputing 160, 93–107 (2015)

    Article  Google Scholar 

  20. Aritra, G., Manwani, N., Sastry, P.S.: On the Robustness of Decision Tree Learning Under Label Noise. Math Pubs Publication (2016)

    Google Scholar 

  21. Frank, E.G., Beck, G.: Extension of sample size and percentage points for significant tests of outlier observation. Technometrics 14(4), 847–854 (1972)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjawan Srisura .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Srisura, B. (2018). Impurity Measurement in Selecting Decision Node Tree that Tolerate Noisy Cases. In: Meesad, P., Sodsee, S., Unger, H. (eds) Recent Advances in Information and Communication Technology 2017. IC2IT 2017. Advances in Intelligent Systems and Computing, vol 566. Springer, Cham. https://doi.org/10.1007/978-3-319-60663-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60663-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60662-0

  • Online ISBN: 978-3-319-60663-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics