Abstract
In functional linear regression, the parameters estimation involves solving a non necessarily well-posed problem, which has points of contact with a range of methodologies, including statistical smoothing, deconvolution and projection on finite-dimensional subspaces. We discuss the standard approach based explicitly on functional principal components analysis, nevertheless the choice of the number of basis components remains something subjective and not always properly discussed and justified. In this work we discuss inferential properties of least square estimation in this context, with different choices of projection subspaces, as well as we study asymptotic behaviour increasing the dimension of subspaces.
Similar content being viewed by others
References
Bache K, Lichman M (2013) UCI machine learning repository. https://archive.ics.uci.edu/ml/index.html. Accessed 27 Aug 2015
Cardot H, Ferraty F, Sarda P (2003) Spline estimators for the functional linear model. Stat Sin 13:571–591
Chiou JM, Müller HG, Wang JL, Carey JR (2003) A functional multiplicative effects model for longitudinal data, with application to reproductive histories of female medflies. Stat Sin 13:1119–1133
Cuevas A, Febrero M, Fraiman R (2002) Linear functional regression: the case of fixed design and functional response. Can J Stat 30(2):285–300
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148
Hastie T, Mallows C (1993) A discussion of A statistical view of some chemometrics regression tools by I. E. Frank and J. H. Friedman. Technometrics 35:140–143
Hawkins T (1977) Weierstrass and the theory of matrices. Arch Hist Exact Sci 17(2):119–163
Horváth L, Kokoszka P (2012) Inference for functional data with applications. Springer, New York
Jacques J, Preda C (2014) Model-based clustering for multivariate functional data. Comput Stat Data Anal 71:92–106
Koch I, Hoffman P, Marron JS (2013) Proteomics profiles from mass spectrometry. Electron J Stat 8(2):1703–1713
Larsen F, van den Berg F, Engelsenm S (2006) An exploratory chemometric study of NMR spectra of table wines. J Chemom 20(5):198–208
Marx BD, Eilers PH (1996) Generalized linear regression on sampled signals with penalized likelihood. In: Forcina A, Marchetti GM, Hatzinger R, Galmacci G (eds) Statistical modelling. Proceedings of the 11th international workshop on statistical modelling, Orvietto
Melas V, Pepelyshev A, Shpilev P, Salmaso L, Corain L, Arboretti R (2014) On the optimal choice of the number of empirical Fourier coefficients for comparison of regression curves. Stat Pap. doi:10.1007/s00362-014-0619-1
Osborne BG, Fearn T, Miller AR, Douglas S (1984) Application of near infrared reflectance spectroscopy to the compositional analysis of biscuits and biscuit dough. J Sci Food Agric 35:99–105
R Development Core Team (2009) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org. Accessed 27 Aug 2015
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York
Wang G, Zhou J, Wu W, Chen M (2015) Robust functional sliced inverse regression. Stat Pap. doi: 10.1007/s00362-015-0695-x
Acknowledgments
The authors wish to thank Piercesare Secchi for stimulating and essential discussions about topics covered by this paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Formal characterization of the sub-space E
This section focuses on computing explicitly the following quantities introduced in the Sect. 3:
-
(1)
the orthonormal basis of \(E{\text {:}}\,\{\varphi _{k}^{E};\,k=1,\ldots ,d\};\)
-
(2)
the multivariate projection matrix \(P{\text {:}}\,\mathbb {R}^{d}\rightarrow \mathbb {R}^{d}\) that transforms the basis coefficients of elements in D in the basis coefficients of elements in E;
-
(3)
the functional projection operator \(\pi {\text {:}}\,D\rightarrow E\subseteq S\) of D on S.
Let us consider point (1). First, project the basis of \(D\, (\{\varphi _{k}^{D};\,k=1,\ldots ,d\}\)) on S, so obtaining a \(\dim (S)\times d\)-matrix A, where \([A]_{ij}=\langle \varphi _{i}^{S},\,\varphi _{j}^{D}\rangle .\) Note that A may have infinite rows if \(\dim (S)=\infty .\) Then, the basis of D projected on S generates d linear independent functions given by \(A^{T}\varvec{\varphi ^{S}(t)},\) that is a basis for E. It is easy to show that \(A^{T}\varvec{\varphi ^{S}(t)}\) are linear independent since \(\varphi _{1}^{D},\ldots ,\varphi _{d}^{D}\) are, and \(D\cap S^{\perp }=0.\) To make \(A^{T}\varvec{\varphi ^{S}(t)}\) be an orthonormal basis for E we do some calculations, obtaining:
where \(D_{D}\) and \(V_{D}\) represent the eigen-structure of \(A^{T}A\,(A^{T}AV_{D}=V_{D}D_{D}\)) and \(V_{S}\) is an arbitrary \(d\times d\)-orthonormal matrix that allows the basis of E to be changed; without loss of generality, we can consider \(V_{S}=I_{d}.\) Note that, except for \(V_{S},\) the basis \(\varvec{\varphi ^{E}(t)}\) is independent of the choice of the basis \(\varvec{\varphi ^{D}(t)}\) and \(\varvec{\varphi ^{S}(t)}.\) It is worth saying that the eigenvalues in \(D_{D}\) are all strictly positive since \(A^{T}A\) has full rank, since \(\varphi _{1}^{E},\ldots ,\varphi _{d}^{E}\) are linear independent. Moreover, the eigenvalues in \(D_{D}\) are all less or equal to one since A is a projection operator.
Now, consider point (2). From (26) the projection matrix P from D to E can be defined as
since \(\langle \varvec{\varphi ^{S}(t)},\,(\varvec{\varphi ^{D}(t)})^{T}\rangle =A\) and \(V_{D}^{T}A^{T}A=D_{D}V_{D}^{T},\) we obtain
Note that, using (27) we can rewrite (26) as
Then, from the vectorial estimate in E given by (9), we can obtain the vectorial estimate in D with \(\varvec{\widehat{\beta }^{D}_{n}}=P^{-1}(\varvec{\widehat{\beta }^{E}_{n}}),\) and finally compute the functional estimate \(\widehat{\beta }^{D}_{n}(t)=(\varvec{\widehat{\beta }^{D}_{n}})^{T}\varvec{\varphi ^{D}(t)}.\) This coincides with the solution of (7).
Finally, consider point (3). Using the projection matrix P we can define the functional operator \(\pi \) as follows
for any \(g\in D.\) Then, using (27) we can easily obtain
Note that \(\pi \) is independent of any choice of basis of \(S,\,D\) and E. Using (28), once we get the vectorial estimate in E from (9), we can immediately compute the functional estimate \(\widehat{\beta }^{E}_{n}(t)=(\varvec{\widehat{\beta }^{E}_{n}})^{T}\varvec{\varphi ^{E}(t)},\) and then obtain the functional estimate in D, i.e., \(\widehat{\beta }^{D}_{n}=(\pi )^{-1}(\widehat{\beta }^{E}_{n}).\)
Appendix 2: Increasing information property
In this section, we discuss an interesting property concerning the behavior of the eigenvalues of the covariance matrix when its dimension increases.
Let \(\{M^{(n)}=[m^{(n)}_{ij}],\,n\ge 1\}\) be a sequence of symmetric matrices such that, for each \(n\ge 1,\,M^{(n)}\) is a \(n\times n\) matrix with \(m^{(n)}_{ij}=m^{(n-1)}_{ij}\) for any \(i,\,j\le n-1.\) In other words, \(M^{(n-1)}\) is obtained by \(M^{(n)}\) by deleting the last row and column. The eigenvalues are real, and are ordered according to the following general result proved by Cauchy.
Theorem 1
(See Hawkins 1997, p. 125) On the nested sequence \((M^{(n)})_{n}\) of matrices given above, denote with \(\{\lambda ^{n}_{k};\,k=1,\ldots ,n\}\) the sequences of the ordered eigenvalues of \(M^{(n)}.\) Then, for any \(n\ge 1,\)
A direct consequence of the previous theorem is
This result is applied in Sect. 4.2, where \(M^{(n)}\) is represented the covariance matrix of the random vector \((\langle X,\,\varphi _{1}\rangle ,\ldots ,\langle X,\,\varphi _{n}\rangle ).\) In this context, a direct interpretation of (29) is that the variance of X projected into a subspace increases when further components are added.
Appendix 3: Simulation settings
The settings of the simulation study presented in Sect. 3 are the following.
Data \(x_{i}(t)\) and regression coefficient \(\beta (t)\) belong to the Hilbert space \(L^{2}(T)\) with \(T= [-1,\,1]\) closed interval.
For each \(i = 1,\ldots ,n\) where n is the sample size (in our examples \(n=500\)),
where \(\{\theta _{k}^{X}(t)\} \equiv \{1/\sqrt{2}\} \bigcup \{\cos {(\pi k t)},\,k = 1,\ldots \},\,\alpha _{j}\) are randomly sampled from a uniform distribution \(U \sim \text {Unif}_{[-10,\,10]},\,\eta _{1} = 0.01,\,\eta _{j} = 1/j,\) for \(j > 1\) and \(J_{i}\) is a subset of size Z [with Z Poisson random variable \(Z \sim {\mathcal {P}}(\lambda )\)] of the integer from 1 to \(2\,*\,Z.\) We set \(\lambda = 10.\)
Chosen a function \(\beta (t) \in L^{2}(T)\) the scalar responses \(y_{1},\ldots ,y_{n}\) are generated as \(y_{i} = \int \nolimits _{T} \beta (t)X_{i}(t) dt + \epsilon _{i},\) where \(\epsilon _{i} \sim {\mathcal {N}}(0,\,1).\) We repeat the estimation procedure \(M = 100\) times.
In this setting the space S where the data are generated is the space of the even functions of \(L^{2}(T)\) and its orthogonal space \(S^{\perp }\) is composed by the odd functions of \(L^{2}(T).\) By definition, E is the projection of a sub-space D on S, and hence E will be made by even functions. In particular, we defined E as the space of the even polynomials of degree at most 4, i.e., \(E=\mathrm{{Span}}\{1,\,t^{2},\,t^{4}\}.\) For computational aspects, we adopt an equivalent orthonormal basis given by the Legendre polynomials, i.e., \(E=\text {Span}\{\phi _{0},\,\phi _{2},\,\phi _{4}\},\) where
Figure 1 shows the behavior of the estimator \(\widehat{\beta }^{D}_{n}\) for different choices of D that maintain the same projected space E. Thus, we introduce a parameter \(\theta \in [0,\,2\pi )\) and we define
where \(\phi _{1}=\sqrt{3/2}t\) is the Legendre polynomial of degree 1. Note that E represents the projection of \(D_{\theta }\) on S for any \(\theta \in [0,\,2\pi ).\) In Fig. 1, we set \(\beta (t)=t^{2}+2t+1/3,\) so that \(\beta (t)\in D_{\pi /3}.\)
In Figs. 2 and 3 we are not interested in the bias on \(S^{\perp }\) and hence we take \(D\equiv E=\text {Span}\{1,\,t^{2},\,t^{4}\}.\) Figure 2 is dedicated to the study of the bias \(\gamma (t),\) and hence we set \(\beta (t)\) that does not lie in D : in particular \(\beta (t)=\mathbf {1}_{[-0.5,0.5]}(t).\) Figure 3 illustrates the bias–variance trade-off between D and the sub-space generated by the FPCs. Hence, we set a true \(\beta (t)\) that lies in D: in particular \(\beta (t)=t^{4}.\)
Rights and permissions
About this article
Cite this article
Ghiglietti, A., Ieva, F., Paganoni, A.M. et al. On linear regression models in infinite dimensional spaces with scalar response. Stat Papers 58, 527–548 (2017). https://doi.org/10.1007/s00362-015-0710-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-015-0710-2
Keywords
- Functional regression
- Functional principal component analysis
- Asymptotic properties of statistical inference