OLS Assumptions BLUE
Sam
Basics: OLS is the method that minimizes \(R^2\)
Summary¶
-
Estimator vs. Estimate: Estimators are functions of the random sample; estimates are realized values from a specific sample.
-
Good Estimator Properties: Unbiasedness, consistency, efficiency.
-
Gauss-Markov Theorem: Under classical linear regression assumptions, OLS is the “Best Linear Unbiased Estimator,” achieving minimum variance among linear unbiased estimators.
-
OLS Steps:
-
Specify the linear model.
-
Minimize the SSR.
-
Solve for \(\hat{\beta}\).
-
Interpret coefficients, residuals, fitted values.
-
Check assumptions.
-
Conclude or adjust if assumptions are violated.
-
Assumptions: Linearity, exogeneity, i.i.d. sampling, no perfect collinearity, homoskedastic errors.
1. Estimators and Estimates¶
Key Concept 3.1¶
-
Estimator: a function of the sample data used to infer an unknown population parameter. (eg, sample mean)
-
Estimate: the realized numerical value
-
Are they RVs?
-
Estimator: Yes. Its value changes with the randomness of the sample.
-
Estimate: No. It is fixed once the sample is observed.
2. Choosing the “Best” Estimator¶
Key Concept 3.2¶
| Estimator Property | Definition / Criterion | Formal Statement | Why It Matters |
|---|---|---|---|
| Unbiasedness | The estimator’s expected value equals the true parameter. | \(E[\hat{\mu}] = \mu\) | Avoids systematic error. |
| Consistency | The estimator converges to the true parameter as the sample size \(n\) goes to infinity. | \(\hat{\mu}_n \xrightarrow{p} \mu\) | Guarantees that with enough data, the estimator is “close” to the true value. |
| Efficiency | Among all unbiased estimators, it has the smallest variance. | \(\mathrm{Var}(\hat{\mu}) \le \mathrm{Var}(\tilde{\mu})\) | Minimizes uncertainty (variance) among unbiased estimators. |
Under classical linear regression assumptions, the OLS estimator is the Best Linear Unbiased Estimator (BLUE). Specifically:
Gauss-Markov Theorem: Under certain assumptions, the OLS estimators of the coefficients in a linear regression model are the best (minimum-variance) linear unbiased estimators.
3. Gauss-Markov Theorem in Context¶
Understanding OLS and Gauss-Markov¶
-
Set Up the Linear Regression Model
-
Typically: \(Y_i = \beta_0 + \beta_1 X_{i1} + \dots + \beta_k X_{ik} + u_i\), where \(u_i\) is the error term.
-
Objective of OLS
-
Ordinary Least Squares chooses \(\hat{\beta}\) to minimize the sum of squared residuals (SSR):
\(\text{SSR} = \sum_{i=1}^n \bigl(Y_i - \hat{Y}_i\bigr)^2\)where \(\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_{i1} + \dots + \hat{\beta}_k X_{ik}.\)
-
Derive the OLS Estimator
-
Take partial derivatives of SSR with respect to each \(\beta_j\), set them to zero, and solve for \(\hat{\beta}_j\).
-
Interpret the OLS Estimator
-
Once estimated, you have:
-
Fitted values: \(\hat{Y}_i\)
-
Residuals: \(\hat{u}_i = Y_i - \hat{Y}_i\)
-
-
Each \(\hat{\beta}_j\) measures the estimated effect of \(X_j\) on \(Y\).
-
Use the Gauss-Markov Theorem Assumptions
-
Check whether all required assumptions (listed in the next section) hold.
-
If they hold, OLS is the BLUE; if not, OLS may still be unbiased but no longer guaranteed to be the minimum-variance linear estimator.
-
Make Inferences / Predictions
-
Use the estimated model for hypothesis tests, confidence intervals, or predictions.
-
If assumptions fail (e.g., heteroskedasticity, autocorrelation, endogeneity), adopt corrective methods (e.g., robust standard errors, instrumental variables, etc.).
4. Gauss-Markov (Classical Linear Model) Assumptions¶
For OLS to be the Best Linear Unbiased Estimator, these assumptions are typically required:
-
Linearity of the Model in Parameters
-
\(Y_i = \beta_0 + \beta_1 X_{i1} + \dots + \beta_k X_{ik} + u_i\).
-
Exogeneity
-
\(\mathbb{E}[u_i \mid X_i] = 0\) (or equivalently \(\operatorname{Cov}(u_i, X_i) = 0\)).
-
The regressors \(X_i\) must be uncorrelated (or independent) of the error term \(u_i\).
-
i.i.d. Sampling
-
Observations \(\{(X_i, Y_i)\}_{i=1}^n\) are independently & identically distributed.
-
No Perfect Multicollinearity
-
Regressors are not perfectly collinear; in matrix form, \(X'X\) is invertible (full column rank).
-
Homoskedastic Errors
-
\(\mathrm{Var}(u_i \mid X_i) = \sigma^2\), a constant.
-
No heteroskedasticity (errors do not depend on \(X\)).
If these assumptions hold:
-
The OLS estimator is unbiased (\(\mathbb{E}[\hat{\beta}_j] = \beta_j\)).
-
Within the class of linear estimators, OLS has the smallest variance (the “Gauss-Markov” result).
If any assumptions fail:
-
OLS might lose its efficiency or even its unbiasedness (e.g., with endogeneity).
-
Corrective measures (robust standard errors, additional regressors, transformations, instrumental variables, etc.) may be needed.