Skip to main content

Quick Real-Boost with: Weight Trimming, Exponential Impurity, Bins, and Pruning

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9692))

Included in the following conference series:

  • 1147 Accesses

Abstract

The central point of attention for this paper is weight trimming — a technique known for speeding up boosted learning procedures. The loss of accuracy introduced by the technique is typically negligible. Recently, an elegant algorithm has been proposed by Appel et al.: it applies weight trimming under AdaBoost, prunes some features using a special error bound, but simultanouesly guarantees the same outcome (ensemble of trees with exactly the same parameters) as if with no trimming. Thus, no loss of training accuracy occurs. In this paper, we supplement the idea by Appel with a suitable extension for real-boosting. We prove that this approach gives the same outcome guarantees, both for stumps and trees. Additionally, we analyze the complexity of Appel’s idea and we show that in some cases it may lead to computational losses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For simplification, assume data indexes mapped to successive integers, so that there are no indexing holes and therefore \(m_j-m_{j-1}\) differences reflect sizes of portions.

References

  1. Appel, R., et al.: Quickly boosting decision trees – pruning underachieving features early. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), vol. 28, pp. 594–602. JMLR Workshop and Conference Proceedings (2013)

    Google Scholar 

  2. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–156. Morgan Kaufman, San Francisco (1996)

    Google Scholar 

  3. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Sci. Syst. Sci. 55, 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  4. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28(2), 337–407 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  5. Graham, R.L., Knuth, D.E., Patashnik, O.: Concrete Mathematics: A Foundation for Computer Science, 2nd edn. Addison-Wesley Longman Publishing Co. Inc, Boston (1994)

    MATH  Google Scholar 

  6. Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5, 197–227 (1990)

    Google Scholar 

  7. Schapire, R.E., Singer, Y.: Improved boosting using confidence-rated predictions. Mach. Learn. 37(3), 297–336 (1999)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Przemysław Klęsk .

Editor information

Editors and Affiliations

A Proof of Outcome Guarantee for ‘quick’ Tree Growing Procedure with Exponential Impurity

A Proof of Outcome Guarantee for ‘quick’ Tree Growing Procedure with Exponential Impurity

Proof

(Theorem 2 ). First, let us define the Gini error (for the n-subset):

(21)

where \(Z_{\rho _l}^{+}=\sum _{i\in \rho _l}w_i[y=+1]\), \(Z_{\rho _l}^{-}=\sum _{i\in \rho _l}w_i[y=-1]\) are probability masses for classes. Let us write two representations (22), (23) for the products — mass times Gini error — that will be useful later on:

$$\begin{aligned} Z_{n}\varepsilon _{\mathop {{\scriptscriptstyle {\text {gini}}}}\limits _{n}}&=\sum _l \left( Z_{\rho _l}^{}-\frac{\left( Z_{\rho _l}^{+}\right) ^2}{Z_{\rho _l}^{}}-\frac{\left( Z_{\rho _l}^{-}\right) ^2}{Z_{\rho _l}^{}}\right) , \end{aligned}$$
(22)
$$\begin{aligned}&=\sum _l Z_{\rho _l}^{} \left( 1{-}\left( \frac{Z_{\rho _l}^{+}}{Z_{\rho _l}^{}}\right) ^2{-}\left( 1{-}\frac{Z_{\rho _l}^{+}}{Z_{\rho _l}^{}}\right) ^2\right) =\sum _l Z_{\rho _l}^{} \underbrace{2 \frac{Z_{\rho _l}^{+}}{Z_{\rho _l}^{}}\left( 1{-}\frac{Z_{\rho _l}^{+}}{Z_{\rho _l}^{}}\right) }_{\varepsilon _{\mathop {{\scriptscriptstyle {\text {gini}}}}\limits _{\rho _{l}}}}. \end{aligned}$$
(23)

We now write out the exponential error and show its connection to Gini error.

(24)
(25)

We aim at showing that , which means that if one removes from a leaf the examples that are not in the m-subset but keeps the tree parameters fixed then the mass times error product must decrease or stay unchanged. We remind that \(\rho _l=u_l\cup \bar{u_l}\) and \(n>m\), see back to definitions (14). Let us observe the square of using a representation from (25) for the leaf error.

(26)

Note the similarity of the last expression to a Gini representation (22).

We shall now expand (26) taking advantage of the following lemma (for straightforward algebraic proof see [1]) true for either class label \(y\in \{-1,+1\}\):

$$\begin{aligned} -\frac{\left( Z_{\rho _l}^{y}\right) ^2}{Z_{\rho _l}^{}} = -\frac{\left( Z_{u_l}^{y}+Z_{\bar{u_l}}^{y}\right) ^2}{Z_{u_l}^{y}+Z_{\bar{u_l}}^{y}} \geqslant -\frac{\left( Z_{u_l}^{y}\right) ^2}{Z_{u_l}^{}}-\frac{\left( Z_{\bar{u_l}}^{y}\right) ^2}{Z_{\bar{u_l}}^{}}. \end{aligned}$$
(27)

Therefore, we have:

(28)

The equality pass comes from grouping odd and even terms using representations (22). Hence, finally: . \(\quad \Box \)

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Klęsk, P. (2016). Quick Real-Boost with: Weight Trimming, Exponential Impurity, Bins, and Pruning. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9692. Springer, Cham. https://doi.org/10.1007/978-3-319-39378-0_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39378-0_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39377-3

  • Online ISBN: 978-3-319-39378-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics