Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Ordinary Least Squares (OLS) Regression and Simple Linear Regression, Study notes of Economics

An overview of ordinary least squares (ols) regression and simple linear regression, including the mechanics of ols, the normal equations of least squares, and the properties of the ols regression line. It also discusses the use of simple linear regression when the relationship between variables is nonlinear and the properties of ols estimators.

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-jm6
koofers-user-jm6 🇺🇸

10 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
iii
01
01
=+ +
()=0
yxu
i. y
x.
u,
xy
x
y
,
xy
x.
Eu , u
Chapter 2: The simple regression model
Wooldridge, Introductory Econometrics, 2d ed.
Most of this course will be concerned with use of a regression
model: a structure in which one or more explanatory variables
are considered to generate an outcome variable, or dependent
variable.We begin by considering the simple regression model, in
which a single explanatory, or independent, variable is involved.
We often speak of this as ‘two-variable’ regression, or ‘Y on X
regression’. Algebraically,
(1)
is the relationship presumed to hold in the population for each
observation The values of are expected to lie on a straight
line, depending on the corresponding values of Their values
will differ from those predicted by that line by the amount of
the error term, or disturbance, which expresses the net effect
of all factors other than on the outcome that is, it reflects
the assumption of ceteris paribus. We often speak of as the
‘regressor’ in this relationship; less commonly we speak of as
the ‘regressand. The coefficients of the relationship, and
are the regression parameters, to be estimated from a sample.
They are presumed constant in the population, so that the effect
of a one-unit change in on is assumed constant for all values
of
As long as we include an intercept in the relationship, we
can always assume that since a nonzero mean for
could be absorbed by the intercept term.
The crucial assumption in this regression model involves
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Ordinary Least Squares (OLS) Regression and Simple Linear Regression and more Study notes Economics in PDF only on Docsity!

i 0 1 i i

0 1

y   x u

i. y x.

u, x y x y   ,

x y x.

E u , u

Chapter 2: The simple regression model

Wooldridge, Introductory Econometrics, 2d ed.

Most of this course will be concerned with use of a regression model: a structure in which one or more explanatory variables are considered to generate an outcome variable, or dependent variable.We begin by considering the simple regression model, in which a single explanatory, or independent, variable is involved. We often speak of this as ‘two-variable’ regression, or ‘Y on X regression’. Algebraically,

(1)

is the relationship presumed to hold in the population for each observation The values of are expected to lie on a straight line, depending on the corresponding values of Their values will differ from those predicted by that line by the amount of the error term, or disturbance, which expresses the net effect of all factors other than on the outcome that is, it reflects the assumption of ceteris paribus. We often speak of as the ‘regressor’ in this relationship; less commonly we speak of as the ‘regressand.’ The coefficients of the relationship, and are the regression parameters, to be estimated from a sample. They are presumed constant in the population, so that the effect of a one-unit change in on is assumed constant for all values of As long as we include an intercept in the relationship, we can always assume that since a nonzero mean for could be absorbed by the intercept term. The crucial assumption in this regression model involves

x u. x u, u x. u, x u

E u x E u u

u

zero conditional mean

population regression function

the relationship between and We consider a random variable, as is and concern ourselves with the conditional distribution of given If that distribution is equivalent to the unconditional distribution of then we can conclude that there is no relationship between and which, as we will see, makes the estimation problem much more straightforward. To state this formally, we assume that

(2)

or that the process has a. This assumption states that the unobserved factors involved in the regression function are not related in any systematic manner to the observed factors. For instance, consider a regression of individuals’ hourly wage on the number of years of education they have completed. There are, of course, many factors influencing the hourly wage earned beyond the number of years of formal schooling. In working with this regression function, we are assuming that the unobserved factors–excluded from the regression we estimate, and thus relegated to the term–are not systematically related to years of formal schooling. This may not be a tenable assumption; we might consider “innate ability” as such a factor, and it is probably related to success in both the educational process and the workplace. Thus, innate ability–which we cannot measure without some proxies–may be positively correlated to the education variable, which would invalidate assumption (2). The , given the zero

1

0 1

0 1

1

=

1 1

=

1

1 =

2

1

n

i

i i i

n

i

i i i n

i

i i

n

i

i i n i i^ i n i i

n x y b b x

b y b x y x

b

x y y b x b x

x y y b x x x

b

x x y y x x

b

Cov x, y V ar x

x

the so-called normal equations of least squares. Why is this method said to be “least squares”? Because as we shall see, it is equivalent to minimizing the sum of squares of the regression residuals. How do we arrive at the solution? The first “normal equation” can be seen to be

(6) where and are the sample averages of those variables. This implies that the regression line passes through the point of means of the sample data. Substituting this solution into the second normal equation, we now have one equation in one unknown,

(7)

(8) where the slope estimate is merely the ratio of the sample covariance of the two variables to the variance of which, of course, must be nonzero for the estimates to be computed. This

0 1

b ,b

n

i

i

n

i

i i

i i

min = = ( )

0 1

=

2

0 1

2

0 1 0 1

0 1

x

x b b

S e y b b x

y, b b. ∂S/∂b ∂S/∂b

x, y

V ar x > n.

y b b x y x.

ordinary least squares (OLS)

sample regression function (SRF)

merely implies that not all of the sample values of can take on the same value. There must be diversity in the observed values of

. These estimates– and are said to be the estimates of the regression parameters, since they can be derived by solving the least squares problem:

(9)

Here we minimize the sum of squared residuals, or dif- ferences between the regression line and the values of by choosing and If we take the derivatives and and set the resulting first order conditions to zero, the two equations that result are exactly the OLS solutions for the es- timated parameters shown above. The “least squares” estimates minimize the sum of squared residuals, in the sense that any other line drawn through the scatter of points would yield a larger sum of squared residuals. The OLS estimates provide the unique solution to this problem, and can always be computed if (i) and (ii) The estimated OLS regression line is then (10) where the “hat” denotes the predicted value of correspond- ing to that value of This is the , corresponding to the population regression function, or PRF (3). The population regression function is fixed, but un- known, in the population; the SRF is a function of the particular sample that we have used to derive it, and a different SRF will be forthcoming from a different sample. The primary interest

=

=

2

=

2

=

2

n

i

i (^) i

i i i i

n

i

i n

i

i n

i

i

Cov e, x x e

y y e y Cov e, y , y x,

SST y y

SSE y y

SSR e

SST

y

residuals and the regressor is zero:

(13)

This is not an assumption, but follows directly from the second normal equation. The estimated coefficients, which give rise to the residuals, are chosen to make it so. (3) Each value of the dependent variable may be written in terms of its prediction and its error, or regression residual:

so that OLS decomposes each into two parts: a fitted value, and a residual. Property (13) also implies that since is a linear transformation of and linear transformations have linear effects on covariances. Thus, the fitted values and residuals are uncorrelated in the sample. Taking this property and applying it to the entire sample, we define

as the Total sum of squares, Explained sum of squares, and Residual sum of squares, respectively. Note that expresses the total variation in around its mean (and we do not strive to “explain” its mean; only how it varies about its mean). The

=

2

2

=

2

=

2 =1 =

2

explained unexplained

n

i

i

n

i

i i i n

i

i i n

i

i

n

i

i i

n

i

i

= [ + (ˆ )]

SSE,

y y y y. SSR, S SSE SSR,

SST SSE SSR

y

y y y y y y

e y y

e e y y y y

SST SSR SSE

e y, e,

y,

second quantity, expresses the variation of the predicted values of around the mean value of (and it is trivial to show that has the same mean as The third quantity, is the same as the least squares criterion from (9). (Note that some textbooks interchange the definitions of and since both “explained” and “error” start with E, and “regression” and “residual” start with R). Given these sums of squares, we can generalize the decomposition mentioned above into

(14)

or, the total variation in may be divided into that and that , i.e. left in the residual category. To prove the validity of (14), note that

given that the middle term in this expression is equal to zero. But this term is the sample covariance of and given a zero mean for and by (13) we have established that this is zero. How good a job does this SRF do? Does the regression function explain a great deal of the variation of or not very

2

9 5

Functional form

y

y x R

r

r r. r r

F C

F

C

y x,

measurement are changed? In the estimated model of CEO salaries, since the variable was measured in thousands of dollars, the intercept and slope coefficient refer to those units as well. If we measured salaries in dollars, the intercept and slope would be multiplied by 1000, but nothing would change. The correlation between and is not affected by linear transformations, so we would not alter the of this equation by changing its units of measurement. Likewise, if ROE was measured in decimals rather than per cent, it would merely change the units of measurement of the slope coefficient. Dividing by 100 would cause the slope to be multiplied by 100. In the original (11), with in percent, the slope is 18.501 (thousands of dollars per one unit change in If we expressed in decimal form, the slope would be 1850.1. A change in from 0.10 to 0.11 – a one per cent increase in ROE–would be associated with a change in salary of (0.01)(1850.1)=18.501 thousand dollars. Again, the correlation between salary and ROE would not be altered. This also applies for a transformation such as it would not matter whether we viewed temperature in degrees or degrees as a causal factor in estimating the demand for heating oil, since the correlation between the dependent variable and temperature would be unchanged by switching from Fahrenheit to Celsius degrees.

Simple linear regression would seem to be a workable tool if we have a presumed linear relationship between and but what if theory suggests that the relation should be nonlinear?



y,x

 

  

= exp ( ) log = log + = +

log

log = log + log = +

= log log

y x y x y t

y A rt y A rt y A rt

r ∂ y/∂t.

y Ax y A  x y A x  y x,  ∂ y/∂ x

y /x

It turns out that the “linearity” of regression refers to being expressed as a linear function of but neither nor need be the “raw data” of our analysis. For instance, regressing on (a time trend) would allow us to analyse a linear trend, or constant growth, in the data. What if we expect the data to exhibit exponential growth, as would population, or sums earning compound interest? If the underlying model is

(15)

(16) so that the “single-log” transformation may be used to express a constant-growth relationship, in which is the regression slope coefficient that directly estimates Likewise, the “double-log” transformation can be used to express a constant-elasticity relationship, such as that of a Cobb-Douglas function:

(17)

In this context, the slope coefficient is an estimate of the elasticity of with respect to given that by the definition of elasticity. The original equation is nonlinear, but the transformed equation is a linear function which may be estimated by OLS regression. Likewise, a model in which is thought to depend on

1 =

2

= 2 2

1 =1 2

=1 0 1 2 (^0) =1 1 =1 = 2

=

(^2 )

1 2

(^1 1 )

1

(^1 ) (^1 )

n i i^ i n i i

n i i^ i x x

n i i^ i n x i i^ i^ i

n x i i^

n i i^ i^

n i i^ i x

n i i^ x x

x

n

i

i i

b

x x y y x x

x x y s s x x.

b

x x y s x x   x u s  x x  x x x x x u s

x x s ,  s.

b  s

x x u

b

x x u

E b  , b  ,

population coefficients and errors:

(21)

where we have defined as the total variation in (not the variance of Substituting, we can write the slope estimator as:

We can show that the first term in the numerator is algebraically zero, given that the deviations around the mean sum to zero. The second term can be written as so that the second term is merely when divided by Thus this expression can be rewritten as:

showing that any randomness in the estimates of is derived from the errors in the sample, weighted by the deviations of their respective values. Given the assumed independence of the distributions of and implied by (19), this expression implies that:

or that is an unbiased estimate of given the propositions

ni i  x

SLR5: ( ): 2

2

1

2

=

2

2 2

Proposition 5

homoskedasticity

x u.

u

V ar u x V ar u .

x,

u

V ar b

x x

s

x x

x

above. The four propositions listed above are all crucial for this result, but the key assumption is the independence of and We are also concerned about the precision of the OLS estimators. To derive an estimator of the precision, we must add an assumption on the distribution of the error

This assumption states that the variance of the error term is constant over the population, and thus within the sample. Given (19), the conditional variance is also the unconditional variance. The errors are considered drawn from a fixed distribution, with a mean of zero and a constant variance of If this assumption is violated, we have the condition of heteroskedasticity, which will often involve the magnitude of the error variance relating to the magnitude of or to some other measurable factor. Given this additional assumption–but no further assumptions on the nature of the distribution of we may demonstrate that:

(22)

so that the precision of our estimate of the slope is dependent upon the overall error variance, and is inversely related to the variation in the variable. The magnitude of does not matter, but its variability in the sample does matter. If we are conducting a controlled experiment (quite unlikely in economic analysis) we would want to choose widely spread values of to generate the

1 √ 2 √ 2

0

b x

x (^) x

s

s s s s , s s.

 , y x.

estimated standard error

Regression through the origin

more usefully, the of the regression slope:

where is the standard deviation, or standard error, of the disturbance process (that is, and is It is this estimated standard error that will be displayed on the computer printout when you run a regression, and used to construct confidence intervals and hypothesis tests about the slope coefficient. We can calculate the estimated standard error of the intercept term by the same means.

We could also consider a special case of the model above where we impose a constraint that so that is taken to be proportional to This will often be inappropriate; it is generally more sensible to let the data calculate the appropriate intercept term, and reestimate the model subject to that constraint only if that is a reasonable course of action. Otherwise, the resulting estimate of the slope coefficient will be biased. Unless theory suggests that a strictly proportional relationship is appropriate, the intercept should be included in the model.