The FastHCS algorithm for robust PCA

Schmitt, Eric; Vakili, Kaveh

doi:10.1007/s11222-015-9602-5

The FastHCS algorithm for robust PCA

Published: 08 October 2015

Volume 26, pages 1229–1242, (2016)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Eric Schmitt¹ &
Kaveh Vakili²

622 Accesses
2 Citations
Explore all metrics

Abstract

Principal component analysis (PCA) is widely used to analyze high-dimensional data, but it is very sensitive to outliers. Robust PCA methods seek fits that are unaffected by the outliers and can therefore be trusted to reveal them. FastHCS (high-dimensional congruent subsets) is a robust PCA algorithm suitable for high-dimensional applications, including cases where the number of variables exceeds the number of observations. After detailing the FastHCS algorithm, we carry out an extensive simulation study and three real data applications, the results of which show that FastHCS is systematically more robust to outliers than state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cauchy robust principal component analysis with applications to high-dimensional data sets

Article Open access 02 November 2023

Generalized spherical principal component analysis

Article 23 March 2024

The Alternating Least-Squares Algorithm for CDPCA

References

Björck, Å., Golub, G.H.: Numerical methods for computing angles between linear subspaces. Math. Comput. 27(2), 579–594 (1973)
Article MathSciNet MATH Google Scholar
Christensen, B.C., Houseman, E.A., Marsit, C.J., Zheng, S., Wrench, M.R., Wiemels, J.L., Nelson, H.H., Karagas, M.R., Padbury, J.F., Bueno, R., Sugarbaker, D.J., Yeh, R., Wiencke, J.K., Kelsey, K.T.: Aging and environemental exposure alter tissue-specific DNA methylation dependent upon CpG Island context. PLoS Genet. 5(8), e1000602 (2009)
Article Google Scholar
Croux, C., Ruiz-Gazen, A.: High breakdown estimators for principal components: the projection-pursuit approach revisited. J. Multivar. Anal. 95, 206–226 (2005)
Article MathSciNet MATH Google Scholar
Donoho, D.L.: Breakdown properties of multivariate location estimators. Ph.D. Qualifying Paper Harvard University (1982)
Debruyne, M., Hubert, M.: The influence function of the Stahel-Donoho covariance estimator of smallest outlyingness. Stat. Probab. Lett. 79(3), 275–282 (2009)
Article MathSciNet MATH Google Scholar
Deepayan, S.: Lattice: Multivariate Data Visualization with R. Springer, New York (2008)
MATH Google Scholar
Dyrby, M., Engelsen, S.B., Nørgaard, L., Bruhn, M., Lundsberg Nielsen, L.: Chemometric quantitation of the active substance in a pharmaceutical tablet using near infrared (NIR) transmittance and NIR FT Raman spectra. Appl. Spectrosc. 56(5), 579–585 (2002)
Article Google Scholar
Hubert, M., Rousseeuw, P.J., Vanden Branden, K.: ROBPCA: a new approach to robust principal components analysis. Technometrics 47, 64–79 (2005)
Article MathSciNet Google Scholar
Hubert, M., Rousseeuw, P., Vakili, K.: Shape bias of robust covariance estimators: an empirical study. Stat. Pap. 55(1), 15–28 (2014)
Article MathSciNet MATH Google Scholar
Jensen, D.R.: The structure of ellipsoidal distributions, II. Principal components. Biom. J. 28, 363–369 (1986)
Article MathSciNet MATH Google Scholar
Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (2002)
MATH Google Scholar
Krzanowski, W.J.: Between-groups comparison of principal components. J. Am. Stat. Assoc. 74(367), 703–707 (1979)
Article MathSciNet MATH Google Scholar
Li, G., Chen, Z.: Projection-pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo. J. Am. Stat. Assoc. 80, 759–766 (1985)
Article MATH Google Scholar
Locantore, N., Marron, J.S., Simpson, D.G., Tripoli, N., Zhang, J.T., Cohen, K.L.: Robust principal component analysis for functional data. Test 8(1), 1–73 (1999)
Article MathSciNet MATH Google Scholar
Maronna, R.A., Yohai, V.J.: The behavior of the Stahel-Donoho Robust multivariate estimator. J. Am. Stat. Assoc. 90(429), 330–341 (1995)
Article MathSciNet MATH Google Scholar
Maronna, R.: Principal components and orthogonal regression based on Robust scales. Technometrics 47, 264–273 (2005)
Article MathSciNet Google Scholar
Maronna, R.A., Martin, R.D., Yohai, V.J.: Robust Statistics: Theory and Methods. Wiley, New York (2006)
Book MATH Google Scholar
Muirhead, R.J.: Aspects of Multivariate Statistical Theory. Wiley, New York (1982)
Book MATH Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2014)
Google Scholar
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987)
Book MATH Google Scholar
Schmitt, E., Öllerer, V., Vakili, K.: The finite sample breakdown point of PCS. Stat. Probab. Lett. 94, 214–220 (2014)
Article MathSciNet MATH Google Scholar
Seber, G.A.F.: Matrix Handbook for Statisticians. Wiley Series in Probability and Statistics. Wiley, New York (2008)
Google Scholar
Stahel, W.: Breakdown of Covariance Estimators. Research Report 31, Fachgrupp für Statistik. E.T.H. Zürich (1981)
Todorov, V., Filzmoser, P.: An object-oriented framework for Robust multivariate analysis. J. Stat. Softw. 32, 1–47 (2009)
Tyler, D.E.: Finite sample breakdown points of projection based multivariate location and scatter statistics. Ann. Stat. 22(2), 1024–1044 (1994)
Article MathSciNet MATH Google Scholar
Vakili, K., Schmitt, E.: Finding multivariate outliers with FastPCS. Comput. Stat. Data Anal. 69, 54–66 (2014)
Article MathSciNet Google Scholar
Van Breukelen, M., Duin, R.P.W., Tax, D.M.J., Den Hartog, J.E.: Handwritten digit recognition by combined classifiers. Kybernetika 34, 381–386 (1998)
MATH Google Scholar
Wu, W., Massart, D.L., de Jong, S.: The Kernel PCA algorithms for wide data. Part I: theory and algorithms. Chemom. Intell. Lab. Syst. 36, 165–172 (1997)
Article Google Scholar
Yohai, V.J., Maronna, R.A.: The maximum bias of Robust covariances. Commun. Stat. Theory Methods 19, 2925–2933 (1990)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors wish to acknowledge the helpful comments from two anonymous referees and the editor which improved this paper.

Author information

Authors and Affiliations

Protix, Industriestraat 3, 5107, Dongen, NC, The Netherlands
Eric Schmitt
Brussels, Belgium
Kaveh Vakili

Authors

Eric Schmitt
View author publications
You can also search for this author in PubMed Google Scholar
Kaveh Vakili
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eric Schmitt.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 5 KB)

Supplementary material 2 (gz 1318 KB)

Supplementary material 3 (zip 7 KB)

Appendices

Appendix 1: Vulnerability of the I-index to orthogonal outliers

Throughout this appendix, let $\pmb Y$ be an $n\times p$ data matrix of uncontaminated observations drawn from a rank q distribution $\mathscr {F}$, with q and integer satisfying $2<q<\min (p,n)$. However, we do not observe $\pmb Y$ but an $n\times p$ (potentially) corrupted data matrix $\pmb Y^{\varepsilon }$ that consists of $g<n$ observations from $\pmb Y$ and $c=n-g$ arbitrary values with $\varepsilon =c/n$ denoting the (unknown) rate of contamination. Throughout, $h=\lceil (n+q+1)/2\rceil $ and the PCA estimates $(\pmb t^I, \pmb L_q^I,\pmb P_q^I)$ are defined as in Sect. 2 with $(\pmb L_q^I)_j,1\leqslant j\leqslant q$ will denoting the j-th diagonal entry of $\pmb L_q^I$.

We will consider the finite sample breakdown (Donoho 1982) in the context of PCA following (Li and Chen 1985):

$$\begin{aligned} \varepsilon _1= & {} \min _{1\leqslant c\leqslant n}\left\{ \varepsilon =\frac{c}{n}: (\pmb L_q)_1=\infty \right\} \end{aligned}$$

(11)

$$\begin{aligned} \varepsilon _2= & {} \min _{1\leqslant c\leqslant n}\left\{ \varepsilon =\frac{c}{n}: (\pmb L_q)_{q}=0\right\} \end{aligned}$$

(12)

Equation (11) defines the so-called finite sample explosion breakdown point and Eq. (12) the so-called finite sample implosion breakdown point of PCA estimates $(\pmb t, \pmb L_q,\pmb P_q)$, and the general finite sample breakdown point is $\varepsilon ^*_n = \min (\varepsilon _1, \varepsilon _2)$.

The following assumptions [as per, for example Tyler (1994)] all pertain to the original, uncontaminated, data set $\pmb Y$. We will consider the case whereby the point cloud formed by $\pmb Y$ lies in general position in $\mathbb {R}^q$. The following definition of general position is adapted from Rousseeuw and Leroy (1987):

Definition 1

General position in $\mathbb {R}^q$. $\pmb Y$ is in general position in $\mathbb {R}^q$ if no more than q-points of $\pmb Y$ lie in any $(q-1)$-dimensional affine subspace. For q-dimensional data, this means that there are no more than q points of $\pmb Y$ on any hyperplane, so that any $q+1$ points of $\pmb Y$ always determine a q-simplex with non-zero determinant.

The I-index is shift invariant so that, w.l.o.g., we only consider cases where the good observations are centered at the origin. Throughout, we will also assume that the members of $\pmb Y$ are bounded:

$$\begin{aligned} \max _{i=1}^n||\pmb y_i||<U_0 \end{aligned}$$

for some bounded scalar $U_0$ depending only on the uncontaminated observations and that the uncontaminated observations contain no duplicates:

$$\begin{aligned} ||\pmb y_i-\pmb y_j||>0\;\forall \;1\leqslant i<j\leqslant n. \end{aligned}$$

Theorem 1

The implosion breakdown, $\varepsilon _2(\pmb t^I, \pmb L_q^I,\pmb P_q^I)$, is $(n-h+1)/n$

Proof

If at least h rows of $\pmb Y^{\varepsilon }$ are in general position in $\mathbb {R}^q$, any subset of h observations will contain at least $q+1$ observations in general position. This guarantees that the q th eigenvalue corresponding to any h-subset is non-zero (Seber 2008). Thus, it follows that $\varepsilon _2(\pmb t^I, \pmb L_q^I,\pmb P_q^I)=(n-h+1)/n$. $\square $

1.1 Finite sample explosion breakdown of $(\pmb t^I, \pmb L_q^I,\pmb P_q^I)$

Denote $\pmb z\in \mathbb {R}^p$ the outlying entries of $\pmb Y^{\varepsilon }$ and $\pmb z^m=||\pmb z\pmb P_0^m||$. The only outliers capable of causing explosion breakdown must satisfy:

$$\begin{aligned} ||\pmb z||\geqslant & {} U_1, \end{aligned}$$

(13)

$$\begin{aligned} \min _m||\pmb z\pmb P_0^m||\leqslant & {} U_2. \end{aligned}$$

(14)

for any bounded scalar $U_1$ and $U_2$ depending only on the uncontaminated observations.

Proof

Suppose that the outliers do not satisfy Eq. (13) so that $\max _{i}||\pmb y_i^{\varepsilon }||\leqslant U_1$, but that the PCA estimates $(\pmb t^I, \pmb L_q^I,\pmb P_q^I)$ break down. This leads to a contradiction since

$$\begin{aligned} \left( \pmb L_q^I\right) _1\leqslant \max _{i\in H^I}\left| \left| \pmb y_i^{\varepsilon }\right| \right| \end{aligned}$$

(15)

Therefore, for a contaminated h-subset to cause explosion breakdown, the outliers must satisfy Eq. (13).

Assume that an outlier $\pmb z$ does not satisfy Condition (14). Schmitt et al. (2014) showed that any h subset $H^m$ indexing $\pmb z$ will have an unbounded value of $I(H^m,\pmb S_0^m)$ if and only if $\pmb z^m$ is unbounded. But for the uncontaminated data, it holds that

$$\begin{aligned} \max _i\min _m\left| \left| \pmb y_i\pmb P_0^m\right| \right|\leqslant & {} U_2 \end{aligned}$$

(16)

so if the contaminated data set $\pmb Y^{\varepsilon }$ contains at least h entries from the original data matrix $\pmb Y$, then it is always possible to construct a subset $H^m$ of entries of $\pmb Y^{\varepsilon }$ for which $I(H^l,\pmb S_0^l)$ is bounded so that $H^m$ will never be selected over $H^l$. $\square $

Appendix 2: The finite sample breakdown point of FastHCS

In this appendix, we derive the finite sample breakdown point of FastHCS. Define $\pmb Y$, $\pmb Y^\varepsilon $ and $\varepsilon ^*_n$ as in Appendix 1. Recall that

$$\begin{aligned}&D\left( \pmb Y^\varepsilon ,H^I,H^{PP}\right) \nonumber \\&\quad ={\mathop {{{\mathrm{ave}}}}\limits _{j=1}^q}\log \frac{{{\mathrm{ave}}}_{i \in H^I}\left( \left( \pmb y^\varepsilon _i-\pmb t^{I}\right) \pmb P^{I}_j\right) ^2}{{{\mathrm{var}}}_{i \in H^\bullet }\left( \pmb y^\varepsilon _i\pmb P^{I}_j\right) }\nonumber \\&\quad -\max _{j=1}^q\log \frac{{{\mathrm{ave}}}_{i \in H^\bullet }\left( \left( \pmb y^\varepsilon _i-\pmb t^{PP}\right) \pmb P^{PP}_j\right) ^2}{{{\mathrm{var}}}_{i \in H^{-}}\left( \pmb y^\varepsilon _i\pmb P^{PP}_j\right) }, \end{aligned}$$

(17)

where $H^-=H^{PP} \setminus H^{I}$. Then, if $D(\pmb Y^\varepsilon ,H^I,H^{PP})>0$ or if $\displaystyle \max _{j=1}^q{\mathop {{{\mathrm{var}}}}\limits _{i \in H^{-}}}(\pmb y^\varepsilon _i\pmb P^{PP}_j)=0$ then the final FastHCS estimates are based on $H^{PP}$. Otherwise, they are based on $ H^{I}$.

Lemma 1

If $||\pmb y^\varepsilon _i||>U_1$ and $\varepsilon < (n-1)/2n$, then $i \notin H^{\bullet }$.

Proof

Debruyne and Hubert (2009) showed that the population breakdown point of $(\pmb t^{PP}, \pmb L_q^{PP},\pmb P_q^{PP})$ is 50 %, which corresponds to a finite sample breakdown point of $(n-1)/2n$. Consequently, $H^{PP}$ will not index any data point for which $||\pmb y^\varepsilon _i||>U_1$. Since $H^{\bullet }$ indexes the overlap between $H^I$ and $H^{PP}$, if $||\pmb y^\varepsilon _i||>U_1$, then $i \notin H^{\bullet }$.

Lemma 2

When $\pmb Y$ is in general position, $n>q>2$, and $\varepsilon < \varepsilon _1 = (n-1)/2n,\; (\pmb L_q^I)_1 <\infty $.

Proof

We will proceed by showing that the denominators in Eq. (17) are bounded, while only the numerator dependent on $H^{PP}$ is bounded.

Lemma 1 implies there exists a fixed constant $U_4$ such that

$$\begin{aligned} \left| \left| \pmb y^\varepsilon _i\pmb P_j\right| \right| < U_4\; \forall \;i \in H^{\bullet },\; 1\leqslant j \leqslant q \end{aligned}$$

(18)

for any orthogonal matrix $\pmb P$. Similarly, since the projection pursuit approach has a breakdown point of $(n-1)/2n$, there exists a fixed $U_5$ such that

$$\begin{aligned} \left| \left| \pmb y^\varepsilon _i\pmb P_j\right| \right| < U_5\; \forall \;i \in H^{PP},\; 1\leqslant j \leqslant q \end{aligned}$$

(19)

As a consequence of (18) and (19), there exists a fixed constant $U_6$ such that:

$$\begin{aligned} \sum _j \log \left( {\mathop {{{\mathrm{var}}}}\limits _{i \in H^\bullet }}\left( \pmb y^\varepsilon _i\pmb P^{I}_j\right) \right)< & {} U_6 \nonumber \\ \sum _j \log \left( {\mathop {{{\mathrm{var}}}}\limits _{i \in H^{-}}}\left( \pmb y^\varepsilon _i\pmb P^{PP}_j\right) \right)< & {} U_6. \end{aligned}$$

(20)

Next, note that

$$\begin{aligned} \max _j\log \left( {\mathop {{{\mathrm{ave}}}}\limits _{i \in H^I}}\left( \left( \pmb y^\varepsilon _i-\pmb t^{I}\right) \pmb P^{I}_j\right) ^2\right)= & {} \left( \pmb L_q^I\right) _1 \end{aligned}$$

(21)

$$\begin{aligned} \min _j\log \left( {\mathop {{{\mathrm{ave}}}}\limits _{i \in H^I}}\left( \left( \pmb y^\varepsilon _i-\pmb t^{I}\right) \pmb P^{I}_j\right) ^2\right)= & {} \left( \pmb L_q^I\right) _q\geqslant \epsilon >0,\nonumber \\ \end{aligned}$$

(22)

[Equation (22) follows from Appendix 1, Theorem 1], so that

$$\begin{aligned} \sum _j\log \left( {\mathop {{{\mathrm{ave}}}}\limits _{i \in H^I}}\left( \left( \pmb y^\varepsilon _i-\pmb t^{I}\right) \pmb P^{I}_j\right) ^2\right) \end{aligned}$$

(23)

is not bounded from above. Conversely, $(\pmb t^{PP}, \pmb L_q^{PP},\pmb P_q^{PP})$ has an explosion breakdown point of $(n-1)/2n$, so that there exists a fixed $U_8$ such that:

$$\begin{aligned} \sum _j \log \left( {\mathop {{{\mathrm{ave}}}}\limits _{i \in H^\bullet }}\left( \left( \pmb y^\varepsilon _i-\pmb t^{PP}\right) \pmb P^{PP}_j\right) ^2\right) < U_8. \end{aligned}$$

(24)

From Eq. (20) and the unboundedness of (23) it follows that the left-hand side in Eq. (17) is unbounded. However, by Eqs. (20) and (24), the right-hand side of Eq. (17) is bounded from above so that in cases where outliers cause explosion breakdown of $(\pmb t^I, \pmb L_q^I,\pmb P_q^I)$, criterion (17) will select $H^* = H^{PP}$. Since the breakdown point of $(\pmb t^{PP}, \pmb L_q^{PP},\pmb P_q^{PP})$ is $(n-1)/2n$, we have that $\varepsilon _1 = (n-1)/2n$. $\square $

Lemma 3

When $\pmb Y$ is in general position, $n>q>2$, and $\varepsilon < \varepsilon _2 = (n-h+1)/n$, then $ (\pmb L_q^I)_q > 0$.

Proof

By Appendix 1, Theorem 1, we have that the implosion breakdown point of $(\pmb t^I, \pmb L_q^I,\pmb P_q^I)$ is $(n-h+1)/n$. The implosion breakdown point of $(\pmb t^{PP}, \pmb L_q^{PP},\pmb P_q^{PP})$ is $(n-1)/2n$, which is higher, so it follows that $\varepsilon _2=(n-h+1)/n$.

Theorem 2

For $n>p+1>2$, the finite sample breakdown point of $\pmb L_q$ is

$$\begin{aligned} \varepsilon _n^*(\pmb L_q,\pmb Y^\varepsilon )=(n-h+1)/n. \end{aligned}$$

Proof

The finite sample breakdown point of $\pmb L_q = \min (\varepsilon _1, \varepsilon _2)$. Given Lemmas 2 and 3, $\min ((n-1)/2n, (n-h+1)/n) = (n-h+1)/n$. $\square $

Appendix 3: Measures of dissimilarity for robust PCA fits

The objective of the simulation studies in Sect. 3.3 is to measure how much the fitted PCA parameters $(\pmb t,\pmb L_q^{},\pmb P_q^{})$ obtained by four robust PCA methods deviate from the true $(\pmb \mu ^u,\pmb \varLambda _q^{u},\pmb \varPi _q^{u})$ when they are exposed to outliers. One way to compare PCA fits is with respect to their eigenvectors, as in the maxsub criterion (Björck and Golub 1973):

$$\begin{aligned} \text {maxsub}(\pmb P_q)=\text {arccos}\big (\lambda _q^{1/2}(\pmb D_q)\big ), \end{aligned}$$

where $\lambda _q(\pmb D_q)$ is the smallest eigenvalue of the matrix $ \pmb D_q^{}=\pmb \varPi _q^\top \pmb P_q^{}\pmb P_q^\top \pmb \varPi _q^{}$. The maxsub has an appealing geometrical interpretation as it represents the maximum angle between a vector in $\pmb \varPi _q$ and the vector most parallel to it in $\pmb P_q$. However, it does not exhaustively account for the dissimilarity between two sets of eigenvectors. As an alternative to the maxsub, Krzanowski (1979) proposes the total dissimilarity:

$$\begin{aligned} \text {sumsub}(\pmb P_q)=\sum _{j=1}^q\lambda _j(\pmb D_q), \end{aligned}$$

(25)

which is an exhaustive measure of dissimilarity for orthogonal matrices. Furthermore, because $\sum _{j=1}^q\lambda _j(\pmb D_q)={{\mathrm{Tr}}}(\pmb D_q)$ and $|\pmb D_q|=1$ (Krzanowski 1979), it is readily seen that (25) is a measure of sphericity of $\pmb D_q$ [it is proportional to the likelihood ratio test statistics for non-sphericity of $\pmb D_q$ (Muirhead 1982, pp. 333–335)]. However, note that (25) now forfeits the geometric interpretation enjoyed by the maxsub.

In any case, measures of dissimilarity based solely on eigenvectors, such as the maxsub or sumsub, necessarily fail to account for bias in the estimation of the eigenvalues. This is problematic when used to evaluate robust fits because it is possible for outliers to exert substantially more influence on $\pmb L_q$ than on $\pmb P_q$. An extreme example is given by the so-called good leverage type of contamination in which the outliers lie on the subspace spanned by $\pmb \varPi _q$ so that even the classical PCA estimate (whose eigenvalues can be made arbitrarily bad by such outliers) will have low values of $\text {maxsub}(\pmb P_q)$.

In contrast, we are interested in an exhaustive measure of dissimilarity; one that summarizes the the effects of the outliers on all the parameters of the PCA fit into a single number, so that the algorithms can be ranked in terms total dissimilarity. To construct such a measure, it is logical to base it on $\pmb \varSigma _q^u=\pmb \varPi _q^{u}\pmb \varLambda _q^{u}(\pmb \varPi _q^{u})^{\top }$ and its estimate $\pmb V_q=\pmb P_q^{}\pmb L_q^{}\pmb P_q^{\top }$ because they contain all the parameters of the fitted model. For our purposes, one need to only consider the effects of outliers on $\pmb G_q=|\pmb V_q|^{-1/q}\pmb V_q$, the shape component of $\pmb V_q$ (Hubert et al. 2014). This is because to rank the observations in a contaminated sample in terms of their true outlyingness (and thus reveal the outliers), it is sufficient to estimate the shape component of $\pmb \varSigma _q^u$ correctly. Consequently, an exhaustive measure of dissimilarity between $\pmb G_q$ and $\pmb \varGamma _q=|\pmb \varSigma _q|^{-1/q}\pmb \varSigma _q$ is given by $\phi ((\pmb \varGamma ^u_q)^{-1/2}\pmb G_{q}(\pmb \varGamma ^u_q)^{-1/2})$, where $\phi $ is any measure of non-sphericity of its argument. In practice several choices of $\phi $ are possible, the simplest being the condition number of $\pmb W$ which is defined as the ratio of the largest to the smallest eigenvalue of $\pmb W$ (Maronna and Yohai 1995), explaining the definition of $\text {bias}(\pmb V_q)$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schmitt, E., Vakili, K. The FastHCS algorithm for robust PCA. Stat Comput 26, 1229–1242 (2016). https://doi.org/10.1007/s11222-015-9602-5

Download citation

Received: 19 May 2015
Accepted: 24 September 2015
Published: 08 October 2015
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11222-015-9602-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The FastHCS algorithm for robust PCA

Abstract

Access this article

Similar content being viewed by others

Cauchy robust principal component analysis with applications to high-dimensional data sets

Generalized spherical principal component analysis

The Alternating Least-Squares Algorithm for CDPCA

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (zip 5 KB)

Supplementary material 2 (gz 1318 KB)

Supplementary material 3 (zip 7 KB)

Appendices

Appendix 1: Vulnerability of the I-index to orthogonal outliers

Definition 1

Theorem 1

Proof

1.1 Finite sample explosion breakdown of \((\pmb t^I, \pmb L_q^I,\pmb P_q^I)\)

Proof

Appendix 2: The finite sample breakdown point of FastHCS

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Theorem 2

Proof

Appendix 3: Measures of dissimilarity for robust PCA fits

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The FastHCS algorithm for robust PCA

Abstract

Access this article

Similar content being viewed by others

Cauchy robust principal component analysis with applications to high-dimensional data sets

Generalized spherical principal component analysis

The Alternating Least-Squares Algorithm for CDPCA

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (zip 5 KB)

Supplementary material 2 (gz 1318 KB)

Supplementary material 3 (zip 7 KB)

Appendices

Appendix 1: Vulnerability of the I-index to orthogonal outliers

Definition 1

Theorem 1

Proof

1.1 Finite sample explosion breakdown of \((\pmb t^I, \pmb L_q^I,\pmb P_q^I)\)

Proof

Appendix 2: The finite sample breakdown point of FastHCS

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Theorem 2

Proof

Appendix 3: Measures of dissimilarity for robust PCA fits

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation