Stats Glossary

Binomial distribution: discrete probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials, each with the same probability of success.
- The name "binomial" comes from the binomial theorem, which is used to expand powers of binomials.
  - The Binomial Theorem provides a formula for expanding powers of a binomial expression.
    - A binomial expression is an algebraic expression that contains exactly two terms joined by either addition or subtraction. The word "binomial" is derived from the Latin roots "bi-" and "nomial" (meaning terms).
      - The term "nomial" comes from the Latin word "nomen", which means "name".

Distribution: how a variable's values are distributed across its possible range Probability distribution: for the possible outcomes of an RV, its how probabilities are assigned

Pooled standard error

Method: a structured approach, algorithm, or technique used to analyze, model, or interpret data.

Textbook¶

Ch 1¶

Uncategorized¶

Stock Watson Textbook:

ADF: See augmented Dickey–Fuller (ADF) test.
Adjusted R² (1R²): A modified version of R² that does not necessarily increase when a new regressor is added to the regression.
ADL(p, q): See autoregressive distributed lag (ADL) model.
Akaike information criterion (AIC): See information criterion.
ARCH: See autoregressive conditional heteroskedasticity (ARCH).
AR(p): See autoregression.
Attrition: The loss of subjects from a study after assignment to the treatment or the control group.
Augmented Dickey–Fuller (ADF) statistic: A regression-based statistic used to test for a unit root in an AR(p) model.
Autocorrelation: The correlation between a time series variable and its lagged value. The jᵗʰ autocorrelation of Y is the correlation between Yₜ and Yₜ₋ⱼ.
Autocovariance: The covariance between a time series variable and its lagged value. The jᵗʰ autocovariance of Y is the covariance between Yₜ and Yₜ₋ⱼ.
Autoregression: A linear regression model that relates a time series variable to its past (that is, lagged) values. An autoregression with p lagged values as regressors is denoted AR(p).
Autoregressive conditional heteroskedasticity (ARCH): A time series model of conditional heteroskedasticity.
Autoregressive distributed lag (ADL) model: A linear regression model in which the time series variable Yₜ is expressed as a function of lags of Yₜ and of another variable, Xₜ. The model is denoted ADL(p, q), where p denotes the number of lags of Yₜ and q denotes the number of lags of Xₜ.
Average causal effect: The population average of the individual causal effects in a heterogeneous population. Also called the average treatment effect.
Average treatment effect: See average causal effect.
Balanced panel: A panel data set with no missing observations; that is, the variables are observed for each entity and each time period.
Base specification: A baseline or benchmark regression specification that includes a set of regressors chosen using a combination of expert judgment, economic theory, and knowledge of how the data were collected.
Bayes information criterion (BIC): See information criterion.
BIC: See information criterion.
Binary variable: A variable that is either 0 or 1. A binary variable is used to indicate a binary outcome. For example, X is a binary (or indicator, or dummy) variable for a person’s sex if X = 1 if the person is female and X = 0 if the person is male.
Bonferroni test: A way to test a joint hypothesis by testing the component individual hypotheses one at a time, using an adjusted critical value that accounts for the multiple hypotheses being tested.
Break date: The date of a discrete change in population time series regression coefficient(s).
Causal inference: Tests, confidence intervals, and/or estimation of a causal effect.
Chow test: A test for a break in a time series regression at a known break date.
Classical measurement error model: The observed value of a RV equals its true, unobserved value plus independent measurement error.
Clustered standard errors: A method of computing standard errors that is appropriate for panel data.
Coefficient of determination: See R².
Cointegration: When two or more time series variables share a common stochastic trend.
Common component: In a dynamic factor model, the part of a time series variable that is explained by the common unobserved factors.
Common trend: A trend shared by two or more time series.
Conditional heteroskedasticity: The variance, usually of an error term, depends on other variables.
Constant regressor: The regressor associated with the regression intercept; this regressor is always equal to 1.
Constant term: The regression intercept.
Continuous mapping theorem: If a RV Sn converges in distribution to S, then a continuous function of that RV, g(Sn), converges in distribution to g(S).
Control variable: A regressor that controls for an omitted factor that determines the dependent variable.
Convergence in distribution: When a sequence of distributions converges to a limit.
Covariance matrix: A matrix composed of the variances and covariances of a vector of RVs.
Cubic regression model: A nonlinear regression function that includes X, X², and X³ as regressors.
Cumulative dynamic multiplier: The cumulative effect of a unit change in the time series variable X on Y. The h-period cumulative dynamic multiplier is the effect of a unit change in Xₜ on Yₜ + Yₜ₊₁ + ... + Yₜ₊ₕ.
Dependent variable: The variable to be explained in a regression or other statistical model; the variable appearing on the left-hand side in a regression.
Deterministic trend: A persistent long-term movement of a variable over time that can be represented as a nonrandom function of time.
DFM: See dynamic factor model (DFM).
Dickey–Fuller statistic: A regression-based statistic used to test for a unit root in a first-order autoregression [AR(1)].
Differences estimator: An estimator of the causal effect constructed as the difference in the sample average outcomes between the treatment and control groups.
Differences-in-differences estimator: The average change in Y for those in the treatment group minus the average change in Y for those in the control group.
Distributed lag model: A regression model in which the regressors are current and lagged values of X.
Dummy variable: See binary variable.
Dummy variable trap: A problem caused by including a full set of binary variables in a regression together with a constant regressor (intercept), leading to perfect multicollinearity.
Dynamic causal effect: The causal effect of one variable on current and future values of another variable.
Dynamic factor model (DFM): A representation of N time series variables, where each variable is expressed as the sum of a reduced number (r) of common unobserved factors plus an idiosyncratic disturbance that is uncorrelated with the factors and the idiosyncratic disturbances of the other variables.
Dynamic multiplier: The h-period dynamic multiplier is the effect of a unit change in the time series variable Xₜ on Yₜ₊ₕ.
Endogenous variable: A variable that is correlated with the error term.
Entity and time fixed effects regression model: A panel data regression that includes both entity fixed effects and time fixed effects.
Entity fixed effects: A set of variables that provide for each entity in a panel data regression to have its own intercept.
Errors-in-variables bias: The bias in an estimator of a regression coefficient that arises from measurement errors in the regressors.
Error term: The difference between Y and the population regression function, denoted u.
ESS: See explained sum of squares (ESS).
Exact identification: When the number of instrumental variables equals the number of endogenous regressors.
Exogenous variable: A variable that is uncorrelated with the regression error term.
Explained sum of squares (ESS): The sum of squared deviations of the predicted values of Yᵢ from their average.
Explanatory variable: See regressor.
External validity: Inferences and conclusions from a statistical study are externally valid if they can be generalized from the population and the setting studied to other populations and settings.
Fan chart: A time series plot that displays a forecast distribution (forecast uncertainty) as a function of the forecast horizon.
Feasible GLS estimator: A version of the generalized least squares (GLS) estimator that uses an estimator of the conditional variance of the regression errors and covariance between the regression errors at different observations.
Feasible WLS: A version of the weighted least squares (WLS) estimator that uses an estimator of the conditional variance of the regression errors.
Final prediction error (FPE): An estimator of the mean squared forecast error when the regression coefficients are estimated by ordinary least squares.
First difference: The first difference of a time series variable $Y_t$ is $Y_t - Y_{t-1}$, denoted $\Delta Y_t$.
First-stage regression: The regression of an included endogenous variable on the included exogenous variables, if any, and the instrumental variable(s) in two-stage least squares.
Fitted value: See predicted value.
Fixed effects: Binary variables indicating the entity or time period in a panel data regression.
Fixed effects regression model: A panel data regression that includes entity fixed effects.
$F_{m,n}$ distribution: The distribution of a ratio of independent RVs, where the numerator is a chi-squared RV with $m$ degrees of freedom, divided by $m$, and the denominator is an independently distributed chi-squared RV with $n$ degrees of freedom, divided by $n$.
$F_{m,H}$ distribution: The distribution of a RV with a chi-squared distribution with $m$ degrees of freedom, divided by $m$.
Forecast error: The difference between the value of the variable that actually occurs and its forecasted value.
Forecast interval: An interval that contains the future value of a time series variable with a prespecified probability.
FPE: See final prediction error.
F-statistic: A statistic used to test a joint hypothesis concerning more than one of the regression coefficients.
Functional form misspecification: When the form of the estimated regression function does not match the form of the population regression function; for example, when a linear specification is used but the true population regression function is quadratic.
GARCH: See generalized autoregressive conditional heteroskedasticity (GARCH).
Gauss–Markov theorem: Under certain conditions, the ordinary least squares estimator is the best linear unbiased estimator of the regression coefficients conditional on the values of the regressors.
Generalized autoregressive conditional heteroskedasticity (GARCH): A time series model for conditional heteroskedasticity.
Generalized least squares (GLS): A generalization of ordinary least squares that is appropriate when the regression errors have a known form of heteroskedasticity (in which case GLS is also referred to as weighted least squares, or WLS) or a known form of serial correlation.
Generalized method of moments (GMM): A method for estimating parameters by fitting sample moments to population moments that are functions of the unknown parameters. Instrumental variables estimators are an important special case.
GLS: See generalized least squares (GLS).
GMM: See generalized method of moments (GMM).
Granger causality test: A procedure for testing whether current and lagged values of one time series help predict future values of another time series.
HAC standard errors: See heteroskedasticity- and autocorrelation-consistent (HAC) standard errors.
Hawthorne effect: The phenomenon that experimental subjects change their behavior because they know they are subjects in an experiment.
Heteroskedasticity: The variance of the regression error term $u_i$, conditional on the regressors, is not constant.
Heteroskedasticity- and autocorrelation-consistent (HAC) standard errors: Standard errors for ordinary least squares estimators that are consistent whether or not the regression errors are heteroskedastic and/or autocorrelated.
Heteroskedasticity- and autocorrelation-robust (HAR) standard errors: Another term for HAC standard errors.
Heteroskedasticity-robust standard error: A standard error for the ordinary least squares estimator that is appropriate whether the error term is homoskedastic or heteroskedastic.
Heteroskedasticity-robust t-statistic: A t-statistic constructed using a heteroskedasticity-robust standard error.
Homoskedasticity: The variance of the regression error term $u_i$, conditional on the regressors, is constant.
Homoskedasticity-only F-statistic: A form of the F-statistic that is valid only when the regression errors are homoskedastic.
Homoskedasticity-only standard errors: Standard errors for the ordinary least squares estimator that are appropriate only when the error term is homoskedastic.
I(0), I(1), and I(2): See order of integration.
Idiosyncratic component: In a dynamic factor model, the part of a time series variable that is not explained by the common unobserved factors.
Impact effect: The contemporaneous, or immediate, effect of a unit change in the time series variable $X_t$ on $Y_t$.
Imperfect multicollinearity: The condition in which two or more regressors are highly correlated.
Included endogenous variables: Regressors that are correlated with the error term (usually in the context of instrumental variable regression).
Included exogenous variables: Regressors that are uncorrelated with the error term (usually in the context of instrumental variable regression).
Indicator variable: See binary variable.
Information criterion: A statistic used to estimate the number of lagged variables to include in an autoregression or a distributed lag model. Leading examples are the Akaike information criterion (AIC) and the Bayes information criterion (BIC).
In-sample prediction: The predicted value of the dependent variable for an observation in the sample used to estimate the prediction model.
Instrument: See instrumental variable.
Instrument exogeneity condition: The requirement that an instrumental variable is uncorrelated with the error term in the instrumental variables regression equation.
Instrument relevance condition: The requirement that an instrumental variable is correlated with the included endogenous regressor.
Instrumental variable: A variable that is correlated with an endogenous regressor (instrument relevance) and is uncorrelated with the regression error (instrument exogeneity).
Instrumental variables (IV) regression: A way to obtain a consistent estimator of the unknown coefficients of the function relating $Y$ to $X$ when the regressor, $X$, is correlated with the error term, $u$.
Interaction term: A regressor that is formed as the product of two other regressors, such as $X_{1i} \cdot X_{2i}$.
Intercept: The value of $b_0$ in the linear regression model.
Internal validity: When inferences about causal effects in a statistical study are valid for the population being studied.
IV: See instrumental variables (IV) regression.
J-statistic: A statistic for testing overidentifying restrictions in instrumental variables regression.
Lag: The value of a time series variable in a previous time period. The $j$th lag of $Y_t$ is $Y_{t-j}$.
Lasso (least absolute shrinkage and selection operator): The regression estimator that minimizes a penalized sum of squared residuals, where the penalty term is proportional to the sum of the absolute values of the regression coefficients.
Least squares assumptions: The assumptions for the linear regression models listed in Key Concept 4.3 (single variable regression model) and Key Concept 6.4 (multiple regression model).
Likelihood function: The joint probability distribution of the data, treated as a function of the unknown coefficients.
Limited dependent variable: A dependent variable that can take on only a limited set of values. For example, the variable might be a 0–1 binary variable or arise from one of the models described in Appendix 11.3.
Linear-log model: A nonlinear regression function in which the dependent variable is $Y$ and the independent variable is $\ln(X)$.
Linear probability model: A regression model in which $Y$ is a binary variable.
Linear regression function: A regression function with a constant slope.
Local average treatment effect: A weighted average treatment effect estimated, for example, by two-stage least squares.
Logarithm: See natural logarithm.
Logit regression: A nonlinear regression model for a binary dependent variable in which the population regression function is modeled using the cumulative logistic distribution function.
Log-linear model: A nonlinear regression function in which the dependent variable is $\ln(Y)$ and the independent variable is $X$.
Log-log model: A nonlinear regression function in which the dependent variable is $\ln(Y)$ and the independent variable is $\ln(X)$.
Long-run cumulative dynamic multiplier: The cumulative long-run effect on the time series variable $Y$ of a change in $X$.
Maximum likelihood estimator (MLE): An estimator of unknown parameters that is obtained by maximizing the likelihood function; see Appendix 11.2.
Mean squared forecast error (MSFE): The expected value of the square of the time series forecast error for an observation not in the data set used for estimating the forecasting model.
Mean squared prediction error (MSPE): The expected value of the square of the prediction error for an observation not in the data set used for estimating the prediction model.
m-fold cross validation: A method for estimating the mean squared prediction error by first dividing the in-sample data into $m$ subsamples and then sequentially forming predictions for the observations in each subsample using the data not in that subsample.
MLE: See maximum likelihood estimator (MLE).
MSFE: See mean squared forecast error (MSFE).
MSPE: See mean squared prediction error (MSPE).
Multicollinearity: See perfect multicollinearity and imperfect multicollinearity.
Multiple regression model: An extension of the single variable regression model that allows $Y$ to depend on $k$ regressors.
Multi-step ahead forecast: A forecast made for more than one period beyond the final observation used to make the forecast.
Natural experiment: See quasi-experiment.
Natural logarithm: A mathematical function defined for a positive argument; its slope is always positive but tends to zero. The natural logarithm is the inverse of the exponential function; that is, $X = \ln(e^X)$.
95% confidence set: A confidence set with a 95% confidence level. See confidence interval.
Nonlinear least squares: The analog of ordinary least squares that applies when the regression function is a nonlinear function of the unknown parameters.
Nonlinear least squares estimator: The estimator obtained by minimizing the sum of squared residuals when the regression function is nonlinear in the parameters.
Nonlinear regression function: A regression function with a slope that is not constant.
Nonstationary: When the joint distribution of one or more time series variables and their lagged values changes over time.
Nowcast: The forecast of the value of a time series variable for the current period—that is, the period in which the forecast is made.
OLS estimator: See ordinary least squares (OLS) estimator.
OLS regression line: The regression line with population coefficients replaced by the ordinary least squares estimators.
OLS residual: The difference between $Y_i$ and the ordinary least squares regression line, denoted $u_i$ in this text.
Omitted variables bias: The bias in an estimator that arises because a variable that is a determinant of $Y$ and is correlated with a regressor has been omitted from the regression.
One-step ahead forecast: A forecast made for the period immediately following the final observation used to make the forecast.
Oracle prediction: The infeasible best-possible prediction, which is made using the unknown conditional mean of the variable to be predicted given the predictors.
Order of integration: The number of times that a time series variable must be differenced to make it stationary. A time series variable that is integrated of order $d$ must be differenced $d$ times and is denoted $I(d)$.
Ordinary least squares (OLS) estimators: The estimators of the regression intercept and slope(s) that minimize the sum of squared residuals.
Out-of-sample prediction: The predicted value of the dependent variable for an observation not in the sample used to estimate the prediction model.
Overidentification: When the number of instrumental variables exceeds the number of included endogenous regressors.
Parameters: Constants that determine a characteristic of a probability distribution or population regression function.
Partial compliance: The failure of some participants to follow the treatment protocol in a randomized experiment.
Partial effect: The effect on $Y$ of changing one of the regressors while holding the other regressors constant.
Penalized sum of squared residuals: The sum of the squared residuals and a penalty term that increases with the number and/or values of the regression coefficients.
Penalty term: A term that, when added to the sum of squared residuals, penalizes the estimator for choosing a large number of regressors and/or coefficients with large values.
Perfect multicollinearity: A situation in which one of the regressors is an exact linear function of the other regressors.
Polynomial regression model: A nonlinear regression function that includes $X$, $X^2$, ..., and $X^r$ as regressors, where $r$ is an integer.
Population coefficients: See population intercept and slope.
Population intercept and slope: The true, or population, values of $b_0$ (the intercept) and $b_1$ (the slope) in a single-variable regression. In a multiple regression, there are multiple slope coefficients ($b_1$, $b_2$, ..., $b_k$), one for each regressor.
Population multiple regression model: The multiple regression model in Key Concept 6.2.
Population regression line: In a single-variable regression, the population regression line is $b_0 + b_1X_i$. In a multiple regression, it is $b_0 + b_1X_{1i} + b_2X_{2i} + \ldots + b_kX_{ki}$.
Potential outcomes: The set of outcomes that might occur to an individual (treatment unit) after receiving, or not receiving, an experimental treatment.
Predicted value: The value of $Y_i$ that is predicted by the ordinary least squares regression line, denoted $\hat{Y}_i$ in this text.
Price elasticity of demand: The percentage change in the quantity demanded resulting from a 1% increase in price.
Principal components: The linear combinations of a set of standardized variables for which the $j$th linear combination maximizes its variance, subject to being uncorrelated with the previous $j − 1$ linear combinations.
Probit regression: A nonlinear regression model for a binary dependent variable in which the population regression function is modeled using the cumulative standard normal distribution function.
Program evaluation: The field of study concerned with estimating the effect of a program, policy, or some other intervention or “treatment.”
Pseudo out-of-sample forecast: A forecast computed over part of the sample using a procedure that is as if these sample data have not yet been realized.
Quadratic regression model: A nonlinear regression function that includes $X$ and $X^2$ as regressors.
Quandt likelihood ratio statistic: A statistic used with time series data to test for a break in the regression model at an unknown date.
Quasi-experiment: A circumstance in which randomness is introduced by variations in individual circumstances that make it appear as if the treatment is randomly assigned.
$R^2$ : In a regression, the fraction of the sample variance of the dependent variable that is explained by the regressors.
Adjusted $R^2$: See adjusted $R^2$.
Random walk: A time series process in which the value of the variable equals its value in the previous period plus an unpredictable error term.
Random walk with drift: A generalization of the random walk in which the change in the variable has a nonzero mean but is otherwise unpredictable.
Realized volatility: The sample root mean square of a time series variable computed over consecutive time periods.
Regressand: See dependent variable.
Regression discontinuity: A regression involving a quasi-experiment in which treatment depends on whether an observable variable crosses a threshold.
Regression specification: A description of a regression that includes the set of regressors and any nonlinear transformation that has been applied.
Regressor: A variable appearing on the right-hand side of a regression; an independent variable in a regression.
Repeated cross-sectional data: A collection of cross-sectional data sets, where each cross-sectional data set corresponds to a different time period.
Residual: The difference between the observed value of the dependent variable and its value predicted by an estimated regression, for an observation in the sample used to estimate the regression coefficients, denoted $u_i$ in the text.
Restricted regression: A regression in which the coefficients are restricted to satisfy some condition. For example, when computing the homoskedasticity-only F-statistic, it is the regression with coefficients restricted to satisfy the null hypothesis.
Ridge regression: The regression estimator that minimizes a penalized sum of squared residuals, where the penalty term is proportional to the sum of the squared regression coefficients.
RMSFE: See root mean squared forecast error (RMSFE).
Root mean squared forecast error (RMSFE): The square root of the mean squared forecast error.
Sample selection bias: The bias in an estimator of a regression coefficient that arises when a selection process influences the availability of data and that process is related to the dependent variable. This bias induces correlation between one or more regressors and the regression error.
Scree plot: The normalized variance of the ordered principal components of a set of variables $X$, plotted against the principal component number, where the variance is normalized by the sum of the variances of the $X$'s.
SER: See standard error of the regression (SER).
Serial correlation: See autocorrelation.
Serially uncorrelated: A time series variable with all autocorrelations equal to 0.
Shrinkage estimator: An estimator that introduces bias by shrinking the OLS estimator toward a specific point (usually 0) and thereby reducing the variance of the estimator.
Simultaneous causality: When, in addition to the causal link of interest from $X$ to $Y$, there is a causal link from $Y$ to $X$. Simultaneous causality makes $X$ correlated with the error term in the function of interest that relates $Y$ to $X$.
Simultaneous equations bias: See simultaneous causality.
Sparse model: A regression model in which the coefficients are nonzero for only a small fraction of the predictors.
SSR: See sum of squared residuals (SSR).
Standard deviation: The square root of the variance. The standard deviation of the RV $Y$, denoted $s_Y$, has the same units as $Y$ and is a measure of the spread of the distribution of $Y$ around its mean.
Standard error of the regression (SER): An estimator of the standard deviation of the regression error $u$.
Standardized predictive regression model: A special case of the linear multiple regression model in which the regressors are standardized and the dependent variable is demeaned so that it has mean 0.
Stationarity: When the joint distribution of a time series variable and its lagged values does not change over time.
Statistically insignificant: The null hypothesis (typically, that a regression coefficient is 0) cannot be rejected at a given significance level.
Statistically significant: The null hypothesis (typically, that a regression coefficient is 0) is rejected at a given significance level.
Stochastic trend: A persistent but random long-term movement of a variable over time.
Strict exogeneity: The requirement that the regression error have a mean of 0 conditional on current, future, and past values of the regressor in a distributed lag model.
Sum of squared residuals (SSR): The sum of the squared ordinary least squares residuals.
Time effects: Binary variables indicating the time period in a panel data regression.
Time fixed effects: See time effects.
Total sum of squares (TSS): The sum of squared deviations of $Y_i$ from its average.
t-ratio: See t-statistic.
Treatment effect: The causal effect in an experiment or a quasi-experiment. See causal effect.
TSLS: See two stage least squares.
TSS: See total sum of squares (TSS).
Two stage least squares (TSLS): An instrumental variable estimator, described in Key Concept 12.2.
Unbalanced panel: A panel data set in which data for some entities are missing for some time periods.
Unbiased estimator: An estimator with a bias that is equal to 0.
Underidentification: When the number of instrumental variables is less than the number of endogenous regressors.
Unit root: An autoregression with a largest root equal to 1.
Unrestricted regression: A regression in which the coefficients are not restricted to satisfy some condition. When computing the homoskedasticity-only F-statistic, it is the regression that applies under the alternative hypothesis, so that the coefficients are not restricted to satisfy the null hypothesis.
VAR: See vector autoregression.
Vector autoregression (VAR): A model of $k$ time series variables consisting of $k$ equations, one for each variable, in which the regressors in all equations are lagged values of all the variables.
Volatility clustering: When a time series variable exhibits some clustered periods of high variance and other clustered periods of low variance.
Weak instruments: Instrumental variables that have a low correlation with the endogenous regressor(s).
Weighted least squares (WLS): An alternative to ordinary least squares that can be used when the regression error is heteroskedastic and the form of the heteroskedasticity is known or can be estimated.
WLS: See weighted least squares (WLS).