Skip to main content

The General Linear Model I

  • Chapter
  • First Online:
Introductory Econometrics
  • 3693 Accesses

Abstract

Econometrics is the study of estimating the parameters of economic models and testing the predictions of theory. In this text, we develop and estimate the general linear regression model and test its forecasting efficiency. In many contexts of empirical research in economics we deal with situations in which one variable, say y, is determined by one or more other variables, say {x i  : i = 1, 2,  … , n}, without the former determining the latter. For example, suppose we are interested in the household demand for food. Typically, we assume that the household’s operations in the market for food are too insignificant to affect the price in the market, so that our view of the household is that of an atomistic competitor. Thus, we take the price of food, as well as the other commodity prices, as given irrespective of the actions of the household. The household’s income is, typically, determined independently of its food consumption activity—even if the household is engaged in agricultural pursuits. Here, then, we have a situation in which a variable of interest viz, the demand for food by a given household, is determined by its income and the prices of food and other commodities, while the latter group of variables is not influenced by the household’s demand for food.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The perceptive reader would have, perhaps, noted some aspect of the fallacy of composition argument. If no household’s consumption of food activity has any influence on the price of food, then by aggregation the entire collection of households operating in a given market has no effect on price. It is not clear, then, how price is determined in this market. Of course, the standard competitive model would have price determined by the interaction of (market) supply and demand. What is meant by the atomistic assumption is that an individual economic agent’s activity has so infinitesimal an influence on price as to render it negligible for all practical purposes.

Author information

Authors and Affiliations

Authors

Appendix: A Geometric Interpretation of the GLM: The Multiple Correlation Coefficient

Appendix: A Geometric Interpretation of the GLM: The Multiple Correlation Coefficient

1.1 The Geometry of the GLM

For simplicity, consider the bivariate GLM

$$ {y}_t={\beta}_0+{\beta}_1{x}_t+{u}_t $$

and note that the conditional expectation of y t given x t is

$$ \mathrm{E}\left({y}_t|{x}_t\right)={\beta}_0+{\beta}_1{x}_t, $$

The model may be plotted in the y , x plane as density functions about the mean.

Specifically, in Fig. 1.1 we have plotted the conditional mean of y given x as the straight line. Given the abscissa x, however, the dependent variable is not constrained to lie on the line. Instead, it is thought to be a random variable defined over the vertical line rising over the abscissa. Thus, for given x we can, in principle, observe a y that can range anywhere over the vertical axis. This being the conceptual framework, we would therefore not be surprised if in plotting a given sample in y , x space we obtain the disposition of Fig. 1.2. In particular, even if the pairs {(y t , x t ) : t = 1, 2,  … , T} have been generated by the process pictured in Fig. 1.1 there is no reason why plotting the sample will not give rise to the configuration of Fig. 1.2. A plot of the sample is frequently referred to as a scatter diagram. The least squares procedure is simply a method for determining a line through the scatter diagram such that for given abscissa (x) the square of the y distance between the corresponding point and the line is minimized.

Fig. 1.1
figure 1

A fitted regression line with confidence intervals

Fig. 1.2
figure 2

A fitted regression line

In Fig. 1.2 the sloping line is the hypothetical estimate induced by OLS. As such it represents an estimate of the unknown parameters in the conditional mean function. The vertical lines are the vertical distances (y distances) between the following two points: first, given an x that lies in the sample set of observations there corresponds a y that lies in the sample set of observations; second, given this same x there corresponds a y that lies on the sloping line. It is the sum of the squares of the distances between all points (such that the x component lies in the set of sample observations) that the OLS procedure seeks to minimize. In terms of the general results in the preceding discussion this is accomplished by taking

$$ {\widehat{\beta}}_0=\overline{y}-{\widehat{\beta}}_1\overline{x},\kern1em {\widehat{\beta}}_1=\frac{s_{yx}}{s_{xx}}, $$

where

$$ {s}_{yx}=\frac{1}{T}\sum \limits_{t=1}^T\left({y}_t-\overline{y}\right)\left({x}_t-\overline{x}\right),\kern1em {s}_{xx}=\frac{1}{T}\sum \limits_{t=1}^T{\left({x}_t-\overline{x}\right)}^2, $$
$$ \overline{y}=\frac{1}{T}\sum {y}_t,\kern1em \overline{x}=\frac{1}{T}\sum {x}_t. $$

The y distance referred to above is

$$ {y}_t-{\widehat{\beta}}_0-{\widehat{\beta}}_1{x}_t=\left({y}_t-\overline{y}\right)-{\widehat{\beta}}_1\left({x}_t-\overline{x}\right), $$

the square of which is

$$ {\left({y}_t-\overline{y}\right)}^2-2{\widehat{\beta}}_1\left({y}_t-\overline{y}\right)\left({x}_t-\overline{x}\right)+{\widehat{\beta}}_1^2{\left({x}_t-\overline{x}\right)}^2. $$

Notice, incidentally, that to carry out an OLS estimation scheme we need only the sums and cross products of the observations. Notice also that the variance of the slope coefficient is

$$ \mathrm{Var}\left({\widehat{\beta}}_1\right)=\frac{\sigma^2}{\sum_{t=1}^T{\left({x}_t-\overline{x}\right)}^2}. $$

Consequently, if we could design the sample by choosing the x coordinate we could further minimize the variance of the resulting estimator by choosing the x’s so as to make \( {\sum}_{t=1}^T{\left({x}_t-\overline{x}\right)}^2 \) as large as possible. In fact, it can be shown that if the phenomenon under study is such that the x’s are constrained to lie in the interval [a,  b] and we can choose the design of the sample, we should choose half the x’s at a and half the x’s at b. In this fashion we minimize the variance of \( {\widehat{\beta}}_1 \). Intuitively, and in terms of Figs. 1.1 and 1.2, the interpretation of this result is quite clear. By concentrating on two widely separated points in the x space we induce maximal discrimination between a straight line and a more complicated curve. If we focus on two x points that are very adjacent our power to discriminate is very limited, since over a sufficiently small interval all curves “look like straight lines.” By taking half of the observations at one end point and half at the other, we maximize the “precision” with which we fix these two ordinates of the conditional mean function and thus fix the slope coefficient by the operation

$$ {\widehat{\beta}}_1=\frac{{\overline{y}}^{(2)}-{\overline{y}}^{(1)}}{b-a}. $$

Above, \( {\overline{y}}^{(2)} \) is the mean of the y observations corresponding to x’s chosen at b and \( {\overline{y}}^{(1)} \) is the mean of the y observations corresponding to the x’s chosen at a.

In the multivariate context a pictorial representation is difficult; nonetheless a geometric interpretation in terms of vector spaces is easily obtained. The columns of the matrix of explanatory variables, X, are by assumption linearly independent. Let us initially agree that we deal with observations that are centered about respective sample means. Since we have, by construction, n such vectors, they span an n-dimensional subspace of the T-dimensional Euclidean space ℝ T . We observe that X(X X)−1 X is the matrix representation of a projection of ℝ T into itself. We recall that a projection is a linear idempotent transformation of a space into itself, i.e., if P represents a projection operator and y 1 , y 2 ∈ ℝ T  , c being a real constant, then

$$ P\left({cy}_1+{y}_2\right)= cP\left({y}_1\right)+P\left({y}_2\right),\kern1em P\left[P\left({y}_1\right)\right]=P\left({y}_1\right), $$

where P(y) is the image of y ∈ ℝ T , under P.

We also recall that a projection divides the space ℝ T into two subspaces, say S 1 and S 2, where S 1 is the range of the projection, i.e.,

$$ {S}_1=\left\{z:z=P(y),\kern0.5em y\in {\mathrm{\mathbb{R}}}_T\right\}, $$

while S 2 is the null space of the projection, i.e.,

$$ {S}_2=\left\{y:P(y)=0,\kern0.5em y\in {\mathrm{\mathbb{R}}}_T\right\}. $$

We also recall that any element of T can be written uniquely as the sum of two components, one from S 1 and one from S 2.

The subspace S 2 is also referred to as the orthogonal complement of S 1, i.e., if y 1 ∈ S 1 and y 2 ∈ S 2 their inner product vanishes. Thus, \( {y}_1^{\prime }{y}_2=0 \).

The application of these concepts to the regression problem makes the mechanics of estimation quite straightforward. What we do is to project the vector of observations y on the subspace of ℝ T spanned by the (linearly independent) columns of the matrix of observations on the explanatory variables X. The matrix of the projection is

$$ X{\left({X}^{\prime }X\right)}^{-1}{X}^{\prime }, $$

which is an idempotent matrix of rank n. The orthogonal complement of the range of this projection is another projection, the matrix of which is

$$ I-X{\left({X}^{\prime }X\right)}^{-1}{X}^{\prime }. $$

It then follows immediately that we can write

$$ y=\widehat{y}+\widehat{u}, $$

where

$$ \widehat{y}=X{\left({X}^{\prime }X\right)}^{-1}{X}^{\prime }y $$

is an element of the range of the projection defined by the matrix X(X X)−1 X , while

$$ \widehat{u}=\left[I-X{\left({X}^{\prime }X\right)}^{-1}{X}^{\prime}\right]y $$

and is an element of its orthogonal complement. Thus, mechanically, we have decomposed y into \( \widehat{y} \) , which lies in the space spanned by the columns of X, and \( \widehat{u} \) , which lies in a subspace which is orthogonal to it.

While the mechanics of regression become clearer in the vector space context above, it must be remarked that the context in which we studied the general linear model is by far the richer one in interpretation and implications.

1.2 A Measure of Correlation Between a Scalar and a Vector

In the discussion to follow we shall draw an interesting analogy between the GLM and certain aspects of multivariate, and more particularly, multivariate normal distributions. To fix notation, let

$$ x\sim N\left(\mu, \Sigma \right) $$

and partition

$$ x=\left(\begin{array}{l}{x}^1\\ {}{x}^2\end{array}\right),\kern1em \mu =\left(\begin{array}{l}{\mu}^1\\ {}{\mu}^2\end{array}\right),\kern1em \Sigma =\left[\begin{array}{cc}{\Sigma}_{11}& {\Sigma}_{12}\\ {}{\Sigma}_{21}& {\Sigma}_{22}\end{array}\right] $$

such that x 1 has k elements, x 2 has n − k , μ has been partitioned conformably with x , Σ11 is k × k , Σ22 is (n − k) × (n − k) , Σ12 is k × (n − k), etc.

We recall that the conditional mean of x 1 given x 2 is simply

$$ \mathrm{E}\left({x}^1|{x}^2\right)={\mu}^1+{\Sigma}_{12}{\Sigma}_{22}^{-1}\left({x}^2-{\mu}^2\right). $$

If k = 1 then x 1 = x 1 and

$$ \mathrm{E}\left({x}_1|{x}^2\right)={\mu}_1+{\sigma}_{1\cdot }{\Sigma}_{22}^{-1}\left({x}^2-{\mu}^2\right)={\mu}_1-{\sigma}_{1\cdot }{\Sigma}_{22}^{-1}{\mu}^2+{\sigma}_{1\cdot }{\Sigma}_{22}^{-1}{x}^2. $$

But, in the GLM we also have that

$$ \mathrm{E}\left(y|x\right)={\beta}_0+\sum \limits_{i=1}^n{\beta}_i{x}_i $$

so that if we look upon (y,  x 1,  x 2,  … , x n ) as having a jointly normal distribution we can think of the “systematic part” of the GLM above as the conditional mean (function) of y given the x i  , i = 1 , 2 , …, n.

In this context, we might wish to define what is to be meant by the correlation coefficient between a scalar and a vector. We have

Definition

Let x be an n-element random vector having (for simplicity) mean zero and covariance matrix Σ. Partition

$$ x=\left(\begin{array}{l}{x}^1\\ {}{x}^2\end{array}\right) $$

so that x 1 has k elements and x 2 has n − k elements. Let x i  ∈ x 1. The correlation coefficient between x i and x 2 is defined by

$$ \underset{\alpha }{\max \kern1em }\mathrm{Corr}\left({x}_i,{\alpha}^{\prime }{x}^2\right)=\underset{\alpha }{\max \limits}\frac{\mathrm{Cov}\left({x}_i,{\alpha}^{\prime }{x}^2\right)}{{\left[{\sigma}_{ii}{\alpha}^{\prime}\mathrm{Cov}\left({x}^2\right)\alpha \right]}^{1/2}}. $$

This is termed the multiple correlation coefficient and it is denoted by

$$ {R}_{i\cdot k+1,k+2,\dots, n}. $$

We now proceed to derive an expression for the multiple correlation coefficient in terms of the elements of Σ. To do so we require two auxiliary results. Partition

$$ \Sigma =\left[\begin{array}{cc}{\Sigma}_{11}& {\Sigma}_{12}\\ {}{\Sigma}_{21}& {\Sigma}_{22}\end{array}\right] $$

conformably with x and let σ i be the ith row of Σ12. We have:

Assertion A.1

For \( {\gamma}^{\prime }={\sigma}_{i\cdot }{\Sigma}_{22}^{-1},{x}_i-{\gamma}^{\prime }{x}^2 \) is uncorrelated with x 2.

Proof

We can write

$$ \left(\begin{array}{c}{x}_i-{\gamma}^{\prime }{x}^2\\ {}{x}^2\end{array}\right)=\left[\begin{array}{cc}{e}_{\cdot i}^{\prime }& -{\gamma}^{\prime}\\ {}0& I\end{array}\right]\left(\begin{array}{c}{x}^1\\ {}{x}^2\end{array}\right) $$

where e i is a k-element (column) vector all of whose elements are zero except the ith, which is unity. The covariance matrix of the left member above is

$$ {\displaystyle \begin{array}{l}\left[\begin{array}{cc}{e}_{\cdot i}^{\prime }& -{\gamma}^{\prime}\\ {}0& I\end{array}\right]\left[\begin{array}{cc}{\sum}_{11}& {\sum}_{12}\\ {}{\sum}_{21}& {\sum}_{22}\end{array}\right]\left[\begin{array}{ll}{e}_{\cdot i}& 0\\ {}-\gamma & I\end{array}\right]\\ {}\kern1em =\left[\begin{array}{cc}{e}_{\cdot i}^{\prime }{\Sigma}_{11}{e}_{\cdot i}-2{e}_{\cdot i}^{\prime }{\Sigma}_{12}\gamma +{\gamma}^{\prime }{\Sigma}_{22}\gamma & {e}_{\cdot i}^{\prime }{\Sigma}_{12}-{\gamma}^{\prime }{\Sigma}_{22}\\ {}{\Sigma}_{21}{e}_{\cdot i}-{\Sigma}_{22}\gamma & {\Sigma}_{22}\end{array}\right]\end{array}} $$

But

$$ {e}_{\cdot i}^{\prime }{\Sigma}_{12}={\sigma}_{i\cdot },\kern1em {\gamma}^{\prime }{\Sigma}_{22}={\sigma}_{i\cdot }{\Sigma}_{22}^{-1}{\Sigma}_{22}={\sigma}_{i\cdot }, $$

and the conclusion follows immediately. q.e.d.

Assertion A.2

The quantity

$$ \mathrm{Var}\left({x}_i-{\alpha}^{\prime }{x}^2\right) $$

is minimized for the choice α = γ , γ being as in Assertion A.1.

Proof

We may write, for any (n − k)-element vector α,

$$ {\displaystyle \begin{array}{c}\mathrm{Var}\left({x}_i-{\alpha}^{\prime }{x}^2\right)=\mathrm{Var}\left[\left({x}_i-{\gamma}^{\prime }{x}^2\right)+{\left(\gamma -\alpha \right)}^{\prime }{x}^2\right]\\ {}=\mathrm{Var}\left({x}_i-{\gamma}^{\prime }{x}^2\right)+\mathrm{Var}\left[{\left(\gamma -\alpha \right)}^{\prime }{x}^2\right].\end{array}} $$

The last equality follows since the covariance between x i  − γ x 2 and x 2 vanishes, by Assertion A.1.

Thus,

$$ \mathrm{Var}\left({x}_i-{\alpha}^{\prime }{x}^2\right)=\mathrm{Var}\left({x}_i-{\alpha}^{\prime }{x}^2\right)+{\left(\gamma -\alpha \right)}^{\prime }{\Sigma}_{22}\left(\gamma -\alpha \right), $$

which is (globally) minimized by the choice γ = α (why?). q.e.d.

It is now simple to prove:

Proposition A.1

Let x be as in Assertion A.1, and let x i  ∈ x 1. Then the (square of the) multiple correlation coefficient between x i and x 2 is given by

$$ {R}_{i\cdot k+1,k+2,\dots, n}^2=\frac{\sigma_{i\cdot }{\Sigma}_{22}^{-1}{\sigma}_{i\cdot}^{\prime }}{\sigma_{ii}}. $$

Proof

For any (n − k)-element vector α and scalar c, we have by Assertion A.2

$$ \mathrm{Var}\left({x}_i-c{\alpha}^{\prime }{x}^2\right)\ge \mathrm{Var}\left({x}_i-{\gamma}^{\prime }{x}^2\right). $$

Developing both sides we have

$$ {\sigma}_{ii}-2c{\sigma}_{i\cdot}\alpha +{c}^2{\alpha}^{\prime }{\Sigma}_{22}\alpha \ge {\sigma}_{ii}-2{\sigma}_{i\cdot}\gamma +{\gamma}^{\prime }{\Sigma}_{22}\gamma . $$

This inequality holds, in particular, for

$$ {c}^2=\frac{\gamma^{\prime }{\Sigma}_{22}\gamma }{\alpha^{\prime }{\Sigma}_{22}\alpha }. $$

Substituting, we have

$$ {\sigma}_{ii}-2{\left(\frac{\gamma^{\prime }{\Sigma}_{22}\gamma }{\alpha^{\prime }{\Sigma}_{22}\alpha}\right)}^{1/2}{\sigma}_{i\cdot}\alpha +{\gamma}^{\prime }{\Sigma}_{22}\gamma \ge {\sigma}_{ii}-2{\sigma}_{i\cdot}\gamma +{\gamma}^{\prime }{\Sigma}_{22}\gamma . $$

Cancelling σ ii and γ Σ22 γ, rearranging and multiplying both sides by (σ ii γ Σ22 γ)−1/2, we find

$$ \frac{\sigma_{i\cdot}\alpha }{{\left({\sigma}_{ii}{\alpha}^{\prime }{\Sigma}_{22}\alpha \right)}^{1/2}}\le \frac{\sigma_{i\cdot}\gamma }{{\left({\sigma}_{ii}{\gamma}^{\prime }{\Sigma}_{22}\gamma \right)}^{1/2}}. $$

But

$$ \frac{\sigma_{i\cdot}\alpha }{{\left({\sigma}_{ii}{\alpha}^{\prime }{\Sigma}_{22}\alpha \right)}^{1/2}}=\mathrm{Corr}\left({x}_i,\kern0.5em {\alpha}^{\prime }{x}^2\right). $$

Consequently, we have shown that for every α

$$ \mathrm{Corr}\left({x}_i,\kern0.5em {\alpha}^{\prime }{x}^2\right)\le \mathrm{Corr}\left({x}_i,{\gamma}^{\prime }{x}^2\right) $$

for γ  = σ iΣ22. Thus

$$ {R}_{i\cdot k+1,k+2,\dots, n}=\frac{\sigma_{i\cdot }{\Sigma}_{22}^{-1}{\sigma}_{i\cdot}^{\prime }}{{\left({\sigma}_{ii}{\sigma}_{i\cdot }{\Sigma}_{22}^{-1}{\sigma}_{i\cdot}^{\prime}\right)}^{1/2}}={\left(\frac{\sigma_i.{\Sigma}_{22}^{-1}{\sigma}_{i\cdot}^{\prime }}{\sigma_{ii}}\right)}^{1/2}\kern0.1em \mathrm{q}.\mathrm{e}.\mathrm{d}. $$

Remark A.1

If, in addition, we assume that the elements of x are jointly normal, then the conditional distribution of x i given x 2 is

$$ N\left({\mu}_i+{\sigma}_{i\cdot }{\Sigma}_{22}^{-1}\left({x}^2-{\mu}^2\right),\kern1em {\sigma}_{ii}-{\sigma}_{i\cdot }{\Sigma}_{22}^{-1}{\sigma}_{i\cdot}^{\prime}\right). $$

The ratio of the conditional to the unconditional variance of x i (given x 2) is given by

$$ \frac{\sigma_{ii}-{\sigma}_{i\cdot }{\Sigma}_{22}^{-1}{\sigma}_i^{\prime }}{\sigma_{ii}}=1-{R}_{i\cdot k+1,k+2,\dots, n}^2. $$

Thus, \( {R}_{i\cdot k+1,k+2,\dots, n}^2 \), measures the relative reduction in the variance of x i between its marginal and conditional distributions (given x k + 1 ,  x k + 2 ,  …  , x n ).

The analogy between these results and those encountered in the chapter is now quite obvious. In that context, the role of x i is played by the dependent variable, while the role of x 2 is played by the bona fide explanatory variables. If the data matrix is

$$ X=\left(e,\kern0.5em {X}_1\right), $$

where X 1 is the matrix of observations on the bona fide explanatory variables, then

$$ \frac{1}{T}{X}_1^{\prime}\left(I-\frac{e{e}^{\prime }}{T}\right)y $$

plays the role of σ i. In the above, y is the vector of observations on the dependent variable and, thus, the quantity above is the vector of sample covariances between the explanatory and dependent variables. Similarly,

$$ \frac{1}{T}{X}_1^{\prime}\left(I-\frac{e{e}^{\prime }}{T}\right){X}_1 $$

is the sample covariance matrix of the explanatory variables. The vector of residuals is analogous to the quantity x i  − γ x 2, and Assertion A.1 corresponds to the statement that the vector of residuals in the regression of y on X is orthogonal to X, a result given in Eq. (1.21). Assertion A.2 is analogous to the result in Proposition 1. Finally, the (square of the) multiple correlation coefficient is analogous to the (unadjusted) coefficient of determination of multiple regression. Thus, recall from Eq. (1.26) that

$$ {\displaystyle \begin{array}{l}{R}^2=1-\frac{{\widehat{u}}^{\prime}\widehat{u}}{y^{\prime}\left(I-\frac{e{e}^{\prime }}{T}\right)y}\\ {}=\frac{y^{\prime}\left(I-\frac{e{e}^{\prime }}{T}\right){X}_1{\left[{X}_1^{\prime}\left(I-\frac{e{e}^{\prime }}{T}\right){X}_1\right]}^{-1}{X}_1^{\prime}\left(I-\frac{e{e}^{\prime }}{T}\right)y}{y^{\prime}\left(I-\frac{e{e}^{\prime }}{T}\right)y},\end{array}} $$

which is the sample analog of the (square of the) multiple correlation coefficient between y and x 1 , x 2 , …, x n ,

$$ {R}_{y\cdot {x}_1,{x}_2,\dots, {x}_n}^2=\frac{\sigma_{y\cdot }{\Sigma}_{xx}^{-1}{\sigma}_{y\cdot}^{\prime }}{\sigma_{yy}}, $$

where

$$ \Sigma =\mathrm{Cov}(z)=\left[\begin{array}{ll}{\sigma}_{yy}& {\sigma}_{y\cdot}\\ {}{\sigma}_{y\cdot}^{\prime }& {\sum}_{xx}\end{array}\right],\kern1em z=\left(\begin{array}{l}y\\ {}x\end{array}\right),\kern1em x={\left({x}_1,{x}_2,\dots, {x}_n\right)}^{\prime }, $$

i.e., it is the “covariance matrix” of the “joint distribution” of the dependent and bona fide explanatory variables.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dhrymes, P. (2017). The General Linear Model I. In: Introductory Econometrics. Springer, Cham. https://doi.org/10.1007/978-3-319-65916-9_1

Download citation

Publish with us

Policies and ethics