Abstract
We propose a novel supervised dimensionality reduction method named local tangent space discriminant analysis (TSD) which is capable of utilizing the geometrical information from tangent spaces. The proposed method aims to seek an embedding space where the local manifold structure of the data belonging to the same class is preserved as much as possible, and the marginal data points with different class labels are better separated. Moreover, TSD has an analytic form of the solution and can be naturally extended to non-linear dimensionality reduction through the kernel trick. Experimental results on multiple real-world data sets demonstrate the effectiveness of the proposed method.
Similar content being viewed by others
Notes
That’s because there are only \(k_1+1\) examples as the inputs of local PCA.
This protein sequence data set is available at http://www.ebi.ac.uk/uniprot.
References
Bache K, Lichman M (2013) UCI machine learning repository
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Cai D, He X, Zhou K, Han J, Bao H (2007) Locality sensitive discriminant analysis. In: Proceedings of the 20rd International Joint Conference on Artificial Intelligence (IJCAI), pp 708–713
Chung FRK (1997) Spectral graph theory. American Mathematical Society, Rhode Island
Donoho DL, Grimes C (2003) Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc Natl Acad Sci 100(10):5591–5596
Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic Press, New York
He X, Niyogi P (2004) Locality preserving projections. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT Press, Cambridge, pp 1–8
Lin B, He X, Zhang C, Ji M (2013) Parallel vector field embedding. J Mach Learn Res 14(1):2945–2977
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Simard P, LeCun Y, Denker JS (1993) Efficient pattern recognition using a new transformation distance. In: Hanson SJ, Cowan JD, Giles CL (eds) Advances in neural information processing systems 5. Morgan-Kaufmann, San Mateo
Sugiyama M (2007) Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis. J Mach Learn Res 8:1027–1061
Sun S (2013) Tangent space intrinsic manifold regularization for data representation. In: Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), pp 179–183
Tyagi H, Vural E, Frossard P (2013) Tangent space estimation for smooth embeddings of riemannian manifolds. Inf Inference 2(1):69–114
Yan S, Xu D, Zhang B, Zhang H, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51
Ye J, Li Q (2005) A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans Pattern Anal Mach Intell 27(6):929–941
Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimension reduction via local tangent space alignment. SIAM J Sci Comput 26(1):313–338
Zhu M, Martinez AM (2006) Subclass discriminant analysis. IEEE Trans Pattern Anal Mach Intell 28(8):1274–1286
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Projects 61370175 and 61075005, and Shanghai Knowledge Service Platform Project (No.ZF1213).
Author information
Authors and Affiliations
Corresponding author
Appendix 1: Detailed Derivation of S
Appendix 1: Detailed Derivation of S
In order to fix S, we decompose (10) into three additive terms as follows:
and then examine their separate contributions to the whole S.
Term One
where \(D^w\) is a diagonal weight matrix with \(D_{ii}^w = \sum _{j=1}^n W_{ij}^w\), and \(L^w = D^w-W^w\) is the Laplacian matrix. Then we have \(S_1 = 2 (D^w-W^w) = 2L^w\). And term one contributes to \(X S_1 X^{\top }\) in (14).
Term Two Define \(B_{ji}=T_{\varvec{x}_j}^{\top } (\varvec{x}_i-\varvec{x}_j)\), then
where we have defined matrices \(\{H_j\}_{j=1}^n\) with \(H_j=\sum _{i=1}^n W_{ij}^w B_{ji} B_{ji}^{\top }\).
Now suppose we define a block diagonal matrix \(S_3\) sized \(mn \times mn\) with block size \(m \times m\). Set the (i, i)-th block \((i=1,\ldots ,n)\) of \(S_3\) to be \(H_i\). Then the resultant \(S_3\) is the contribution of term two for S in (14).
Term Three Define vectors \(\{F_j\}_{j=1}^n\) with \(F_j=\sum _{i=1}^n W_{ij}^w B_{ji}\), then term three can be rewritten as:
From this expression, we can give the formulation of \(S_2\). Then the \(S_2^{\top }\) in (14), which is its transpose, is ready to get.
Suppose we define two block matrices \(S_2^1\) and \(S_2^2\) sized \(n\times mn\) each where the block size is \(1\times m\), and \(S_2^2\) is a block diagonal matrix. Set the (i, j)-th block \((i,j=1,\ldots ,n)\) of \(S_2^1\) to be \(-W_{ij}^w B_{ji}^{\top }\), and the (i, i)-th block \((i=1,\ldots ,n)\) of \(S_2^2\) to be \(F_{i}^{\top }\). Then, term three can be rewritten as: \( \varvec{t}^{\top } X (S_2^1 + S_2^2) \varvec{w} + \varvec{w}^{\top } (S_2^1 + S_2^2)^{\top } X^{\top } \varvec{t}\). It is clear that \(S_2=S_2^1 + S_2^2\).
Rights and permissions
About this article
Cite this article
Zhou, Y., Sun, S. Local Tangent Space Discriminant Analysis. Neural Process Lett 43, 727–744 (2016). https://doi.org/10.1007/s11063-015-9443-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-015-9443-4