Skip to main content
Log in

GMDH-Based Outlier Detection Model in Classification Problems

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

In many practical classification problems, datasets would have a portion of outliers, which could greatly affect the performance of the constructed models. In order to address this issue, we apply the group method of data handing (GMDH) neural network in outlier detection. This study builds a GMDH-based outlier detection (GOD) model. This model first implements feature selection in the training set L using GMDH neural network. Then a new training set L′ can be obtained by mapping the selected key feature subset. Next, a linear regression model can be constructed in the set L′ by ordinary least squares estimation. Further, it eliminates a sample from the set L′ randomly every time, and then rebuilds a linear regression model. Finally, outlier detection is realized by calculating Cook’s distance for each sample. Four different customer classification datasets are used to conduct experiments. Results show that GOD model can effectively eliminate outliers, and compared with the five existing outlier detection models, it generally performs significantly better. This indicates that eliminating outliers can effectively enhance classification accuracy of the trained classification model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kamishima T, Akaho S, Asoh H, et al., Model-based and actual independence for fairness-aware classification, Data Mining and Knowledge Discovery, 2018, 32(1): 258–286.

    Article  MathSciNet  MATH  Google Scholar 

  2. Kim M, Efficient histogram dictionary learning for text/image modeling and classification, Data Mining and Knowledge Discovery, 2017, 31(1): 203–232.

    Article  MathSciNet  MATH  Google Scholar 

  3. Ding C X and Tao D C, Trunk-branch ensemble convolutional neural networks for video-based face recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 1002–1014.

    Article  Google Scholar 

  4. Xiao J, Tian Y H, Xie L, et al., A hybrid classification framework based on clustering, IEEE Transactions on Industrial Informatics, 2020, 16(4): 2177–2188.

    Article  Google Scholar 

  5. Hawkins D M, Identification of Outliers, Chapman and Hall, London, 1980.

    Book  MATH  Google Scholar 

  6. Han J W, Pei J, and Kamber M, Data Mining: Concepts and Techniques, Elsevier, Netherlands, 2011.

    MATH  Google Scholar 

  7. Yuen K V and Ortiz G A, Outlier detection and robust regression for correlated data, Computer Methods in Applied Mechanics and Engineering, 2017, 313(1): 632–646.

    Article  MathSciNet  MATH  Google Scholar 

  8. Zhao H D, Liu H F, Ding Z M, et al., Consensus regularized multiview outlier detection, IEEE Transactions on Image Processing, 2018, 27(1): 236–248.

    Article  MathSciNet  MATH  Google Scholar 

  9. Johansen S and Nielsen B, Asymptotic theory of outlier detection algorithms for linear time series regression models, Scandinavian Journal of Statistics, 2016, 43(2): 321–348.

    Article  MathSciNet  MATH  Google Scholar 

  10. Breunig M M, Kriegel H P, Ng R T, et al., Optics-of: Identifying local outliers, Principles of Data Mining and Knowledge Discovery (Eds. by Żytkow J M and Rauch J), Springer, Berlin, 1999, 262–270.

    Chapter  Google Scholar 

  11. Li L, Huang L S, Yang W, et al., Privacy-preserving LOF outlier detection, Knowledge and Information Systems, 2015, 42(3): 579–597.

    Article  Google Scholar 

  12. Zhu J L, Wang Y Q, Zhou D H, et al., Batch process modeling and monitoring with local outlier factor, IEEE Transactions on Control Systems Technology, 2018, 99(3): 1–14.

    Google Scholar 

  13. Knorr E M and Ng R T, A unified notion of outliers: Properties and computation, Proceedings of KDD-97, AAAI Press, 1997, 219–222; An extended version of this paper appears as: A unified approach for mining outliers, Proceedings of CASCON, IBM Press, 1997, 236–248.

  14. Angiulli F, Basta S, Lodi S, et al., GPU strategies for distance-based outlier detection, IEEE Transactions on Parallel and Distributed Systems, 2016, 27(11): 3256–3268.

    Article  Google Scholar 

  15. Shi Y and Zhang L, COID: A cluster-outlier iterative detection approach to multi-dimensional data analysis, Knowledge and Information Systems, 2011, 28(3): 709–733.

    Article  Google Scholar 

  16. Hawkins S, He H X, Williams G, et al., Outlier detection using replicator neural networks, Data Warehousing and Knowledge Discovery (eds. by Kambayashi Y, Winiwarter W, and Arikawa M), Springer, Berlin, 2002, 170–180.

    Chapter  Google Scholar 

  17. Hamlet C, Straub J, Russell M, et al., An incremental and approximate local outlier probability algorithm for intrusion detection and its evaluation, Journal of Cyber Security Technology, 2017, 1(2): 75–87.

    Article  Google Scholar 

  18. Zhang Y X, Du B, Zhang L P, et al., A low-rank and sparse matrix decomposition-based mahalanobis distance method for hyperspectral anomaly detection, IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(3): 1376–1389.

    Article  Google Scholar 

  19. Pang G S, Cao L B, Chen L, et al., Sparse modelingbased sequential ensemble learning for effective outlier detection in high-dimensional numeric data, Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence, 2018, 3892–3899.

  20. Singh A K and Lalitha S, A novel spatial outlier detection technique, Communications in Statistics — Theory and Methods, 2018, 47(1): 247–257.

    Article  MathSciNet  MATH  Google Scholar 

  21. Huang J L, Zhu Q S, Yang L J, et al., A non-parameter outlier detection algorithm based on natural neighbor, Knowledge-Based Systems, 2016, 92(1): 71–77.

    Article  Google Scholar 

  22. Zhang Y, Hamm N A S, Meratnia N, et al., Statistics-based outlier detection for wireless sensor networks, International Journal of Geographical Information Science, 2012, 26(8): 1373–1392.

    Article  Google Scholar 

  23. Hamamoto A H, Carvalho L F, Sampaio L D H, et al., Network anomaly detection system using genetic algorithm and fuzzy logic, Expert Systems with Applications, 2018, 92(2): 390–402.

    Article  Google Scholar 

  24. Cao N, Lin C G, Zhu Q H, et al., Voila: Visual anomaly detection and monitoring with streaming spatiotemporal data, IEEE Transactions on Visualization and Computer Graphics, 2018, 24(1): 23–33.

    Article  Google Scholar 

  25. Madala H R and Ivakhnenko A G, Inductive Learning Algorithms for Complex Systems Modeling, Boca Raton, Florida, 1994.

  26. Xiao J, He C Z, and Jiang X Y, Structure identification of Bayesian classifiers based on GMDH, Knowledge-Based Systems, 2009, 22(6): 461–470.

    Article  Google Scholar 

  27. Xiao J, Jiang X Y, He C Z, et al., Churn prediction in customer relationship management via GMDH based multiple classifiers ensemble, IEEE Intelligent Systems, 2016, 31(2): 37–44.

    Article  Google Scholar 

  28. Xie L, Xiao J, Hu Y, et al., China’s energy consumption forecasting by GMDH based autoregressive model, Journal of Systems Science and Complexity, 2017, 30(6): 1332–1349.

    Article  MATH  Google Scholar 

  29. Xiao J, Cao H W, Jiang X Y, et al., GMDH-based semi-supervised feature selection for customer classification, Knowledge-Based Systems, 2017, 132(9): 236–248.

    Article  Google Scholar 

  30. Mo L L, Xie L, Jiang X Y, et al., GMDH-based hybrid model for container throughput forecasting: Selective combination forecasting in nonlinear subseries, Applied Soft Computing, 2018, 62(1): 478–490.

    Article  Google Scholar 

  31. Xiao J, Li Y X, Xie L, et al., A hybrid model based on selective ensemble for energy consumption forecasting in China, Energy, 2018, 159(9): 534–546.

    Article  Google Scholar 

  32. Ahmed M and Mahmood A N, A novel approach for outlier detection and clustering improvement, Proceedings of 2013 8th IEEE Conference on Industrial Electronics and Applications (ICIEA), 2015, 577–582.

  33. Tang Z Z, Li B, and Qiu H Y, A dynamic clustering method to largescale distribution problems, Journal of Systems Science and Information, 2015, 3(1): 25–36.

    Article  Google Scholar 

  34. Pamula R, Deka J K, and Nandi S, An outlier detection method based on cluster pruning, Proceedings of 2014 2nd IEEE International Conference on Business and Information Management (ICBIM), 2014, 138–141.

  35. Mourão-Miranda J, Hardoon D R, Hahn T, et al., Patient classification as an outlier detection problem: An application of the one-class support vector machine, Neuroimage, 2011, 58(3): 793–804.

    Article  Google Scholar 

  36. Aggarwal C C and Yu P S, An effective and efficient algorithm for high-dimensional outlier detection, The VLDB Journal, 2005, 14(2): 211–221.

    Article  Google Scholar 

  37. Mueller J A and Lemke F, Self-Organising Data Mining: An Intelligent Approach to Extract Knowledge from Data, Publication Libri, Hamburg, 2000.

    Google Scholar 

  38. Kahng M, Andrews P Y, Kalro A, et al., Visual exploration of industry-scale deep neural network models, IEEE Transactions on Visualization and Computer Graphics, 2018, 24(1): 88–97.

    Article  Google Scholar 

  39. Gautam M K and Giri V K, An approach of neural network for electrocardiogram classification, APTIKOM Journal on Computer Science and Information Technologies, 2016, 1(3): 115–123.

    Article  Google Scholar 

  40. Kolassa J, Reichle R H, Liu Q, et al., Estimating surface soil moisture from SMAP observations using a neural network technique, Remote Sensing of Environment, 2018, 204(1): 43–59.

    Article  Google Scholar 

  41. Kong A and Zhu H L, Predicting trend of high frequency CSI 300 index using adaptive input selection and machine learning techniques, Journal of Systems Science and Information, 2018, 6(2): 120–133.

    Article  Google Scholar 

  42. Skalská H and Freylich V, Web-bootstrap estimate of area under ROC curve, Austrian Journal of Statistics, 2016, 35(2&3): 325–330.

    Google Scholar 

  43. Wilcoxon F, Individual comparisons by ranking methods, Biometrics Bulletin, 1945, 1(6): 80–83.

    Article  Google Scholar 

  44. Pan W B, Huang L, and Zhao L L, An integrated DEA model allowing decomposition of ecoefficiency: A case study of China, Journal of Systems Science and Information, 2017, 5(5): 473–488.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jin Xiao or Jing Huang.

Additional information

This research was partly supported by the Major Project of the National Social Science Foundation of China under Grant No. 18VZL006; the National Natural Science Foundation of China under Grant Nos. 71571126 and 71974139; the Excellent Youth Foundation of Sichuan Province under Grant No. 20JCQN0225; the Tianfu Ten-thousand Talents Program of Sichuan Province; the Excellent Youth Foundation of Sichuan University under Grant No. sksyl201709; the Leading Cultivation Talents Program of Sichuan University; the Teacher and Student Joint Innovation Project of Business School of Sichuan University under Grant No. LH2018011; the 2018 Special Project for Cultivation and Innovation of New Academic, Qian Platform Talent under Grant No. 5772-012.

This paper was recommended for publication by Editor WANG Shouyang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, L., Jia, Y., Xiao, J. et al. GMDH-Based Outlier Detection Model in Classification Problems. J Syst Sci Complex 33, 1516–1532 (2020). https://doi.org/10.1007/s11424-020-9002-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-020-9002-6

Keywords

Navigation