Skip to main content

Theory of Outlier Ensembles

  • Chapter
  • First Online:
Outlier Ensembles

Abstract

Outlier detection is an unsupervised problem, in which labels are not available with data recordsĀ (Aggarwal, Outlier analysis, 2017, [2]). As a result, it is generally more challenging to design ensemble analysis algorithms for outlier detection. In particular, methods that require the use of labels in intermediate steps of the algorithm cannot be generalized to outlier detection.

Theory helps us to bear our ignorance of facts.

George Santayana

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    If there are errors in the feature values, this will also be reflected in the hypothetically ideal (but unobserved) outlier scores. For example, if a measurement error causes an outlier, rather than an application-specific reason, this will also be reflected in the ideal but unobserved scores.

  2. 2.

    It is noteworthy that the most popular outlier detectors are based on distance-based methods. These detectors are lazy learners in which the test point is itself never included among the k-nearest neighbors at prediction time. Therefore, these learners are essentially out-of-sample methods because they do not include the test point within the model (albeit in a lazy way).

  3. 3.

    In practice, such unsupervised methods are never used in such real-life scenarios. This example is only for illustrative purposes in order to provide a concrete example of the workings of the bias-variance trade-off.

References

  1. C. C. Aggarwal. Outlier Ensembles: Position Paper, ACM SIGKDD Explorations, 14(2), pp.Ā 49ā€“58, December, 2012.

    Google ScholarĀ 

  2. C. C. Aggarwal. Outlier Analysis, Second Edition, Springer, 2017.

    Google ScholarĀ 

  3. C. C. Aggarwal and P. S. Yu. Outlier Detection in Graph Streams. IEEE ICDE Conference, 2011.

    Google ScholarĀ 

  4. C. C. Aggarwal and S. Sathe. Theoretical Foundations and Algorithms for Outlier Ensembles, ACM SIGKDD Explorations, 17(1), June 2015.

    Google ScholarĀ 

  5. L. Brieman. Bagging Predictors. Machine Learning, 24(2), pp. 123ā€“140, 1996.

    Google ScholarĀ 

  6. L. Brieman. Random Forests. Journal Machine Learning archive, 45(1), pp. 5ā€“32, 2001.

    Google ScholarĀ 

  7. G. Brown, J. Wyatt, R. Harris, and X. Yao. Diversity creation methods: a survey and categorisation. Information Fusion, 6:5(20), 2005.

    Google ScholarĀ 

  8. R. Bryll, R. Gutierrez-Osuna, and F. Quek. Attribute Bagging: Improving Accuracy of Classifier Ensembles by using Random Feature Subsets. Pattern Recognition, 36(6), pp.Ā 1291ā€“1302, 2003.

    Google ScholarĀ 

  9. P. Buhlmann, B. Yu. Analyzing bagging. Annals of Statistics, pp.Ā 927ā€“961, 2002.

    Google ScholarĀ 

  10. P. Buhlmann. Bagging, Subagging and Bragging for Improving Some Prediction Algorithms. Recent advances and trends in nonparametric statistics, Elsevier, 2003.

    Google ScholarĀ 

  11. A. Buja, W. Stuetzle. Observations on bagging. Statistica Sinica, 16(2), 323, 2006.

    Google ScholarĀ 

  12. M. Denil, D. Matheson, and N. De Freitas. Narrowing the Gap: Random Forests In Theory and in Practice. ICML Conference, pp.Ā 665ā€“673, 2014.

    Google ScholarĀ 

  13. T. Dietterich. Ensemble Methods in Machine Learning. First International Workshop on Multiple Classifier Systems, 2000.

    Google ScholarĀ 

  14. Y. Freund and R. Schapire. A Decision-theoretic Generalization of Online Learning and Application to Boosting. Computational Learning Theory, 1995.

    Google ScholarĀ 

  15. Y. Freund and R. Schapire. Experiments with a New Boosting Algorithm. ICML Conference, pp.Ā 148ā€“156, 1996.

    Google ScholarĀ 

  16. J. Friedman. On Bias, Variance, 0/1loss, and the Curse-of-Dimensionality. Data Mining and Knowledge Discovery, 1(1), pp.Ā 55ā€“77, 1997.

    Google ScholarĀ 

  17. S. Geman, E. Bienenstock, and R. Doursat. Neural Networks and the Bias/Variance Dilemma. Neural computation, 4(1), pp.Ā 1ā€“58, 1992.

    Google ScholarĀ 

  18. T. K. Ho. Random decision forests. Third International Conference on Document Analysis and Recognition, 1995. Extended version appears in IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), pp. 832ā€“844, 1998.

    Google ScholarĀ 

  19. T. K. Ho. Nearest Neighbors in Random Subspaces. Lecture Notes in Computer Science, Vol. 1451, pp. 640ā€“648, Proceedings of the Joint IAPR Workshops SSPRā€™98 and SPRā€™98, 1998. http://link.springer.com/chapter/10.1007/BFb0033288

  20. R. Kohavi and D.H. Wolpert. Bias plus variance decomposition for zero-one loss functions, ICML Conference, 1996.

    Google ScholarĀ 

  21. E. Kong and T. Dietterich. Error-Correcting Output Coding Corrects Bias and Variance. Proceedings of the Twelfth International Conference on Machine Learning, pp.Ā 313ā€“321, 1995.

    Google ScholarĀ 

  22. A. Lazarevic, and V. Kumar. Feature Bagging for Outlier Detection, ACM KDD Conference, 2005.

    Google ScholarĀ 

  23. F. T. Liu, K. M. Ting, and Z.-H. Zhou. Isolation Forest. ICDM Conference, 2008. Extended version appears in: ACM Transactions on Knowledge Discovery from Data (TKDD), 6(1), 3, 2012.

    Google ScholarĀ 

  24. R. Michalski, I. Mozetic, J. Hong and N. Lavrac. The Multi-Purpose Incremental Learning System AQ15 and its Testing Applications to Three Medical Domains, Proceedings of the Fifth National Conference on Artificial Intelligence, pp.Ā 1041ā€“1045, 1986.

    Google ScholarĀ 

  25. S. Rayana, L. Akoglu. Less is More: Building Selective Anomaly Ensembles with Application to Event Detection in Temporal Graphs. SDM Conference, 2015.

    Google ScholarĀ 

  26. S. Rayana, L. Akoglu. Less is More: Building Selective Anomaly Ensembles. ACM Transactions on Knowledge Disovery and Data Mining, to appear, 2016.

    Google ScholarĀ 

  27. L. Rokach. Pattern classification using ensemble methods, World Scientific Publishing Company, 2010.

    Google ScholarĀ 

  28. M. Salehi, C. Leckie, M. Moshtaghi, and T. Vaithianathan. A Relevance Weighted Ensemble Model for Anomaly Detection in Switching Data Streams. Advances in Knowledge Discovery and Data Mining, pp.Ā 461ā€“473, 2014.

    Google ScholarĀ 

  29. G. Seni and J. Elder. Ensemble Methods in Data Mining: Improving Accuracy through Combining Predictions. Synthesis Lectures in Data Mining and Knowledge Discovery, Morgan and Claypool, 2010.

    Google ScholarĀ 

  30. R. Tibshirani. Bias, Variance, and Prediction Error for Classification Rules, Technical Report, Statistics Department, University of Toronto, 1996.

    Google ScholarĀ 

  31. G. Valentini and T. Dietterich. Bias-variance Analysis of Support Vector Machines for the Development of SVM-based Ensemble Methods. Journal of Machine Learning Research, 5, pp.Ā 725ā€“774, 2004.

    Google ScholarĀ 

  32. A. Zimek, M. Gaudet, R. Campello, J. Sander. Subsampling for efficient and effective unsupervised outlier detection ensembles, KDD Conference, 2013.

    Google ScholarĀ 

  33. Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms. Chapman and Hall/CRC Press, 2012.

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charu C. Aggarwal .

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Aggarwal, C.C., Sathe, S. (2017). Theory of Outlier Ensembles. In: Outlier Ensembles. Springer, Cham. https://doi.org/10.1007/978-3-319-54765-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54765-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54764-0

  • Online ISBN: 978-3-319-54765-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics