Skip to content

ML 04 Regression

Sam

Fit Linear Regression via either:

  1. a closed-form solution (Normal Equation): a mathematical equation that gives the result directly

  2. iterative optimization (GD: batch, stochastic, mini-batch): initialize model parameters randomly ---> tweak to min cost function

Iterative optimization (gradient descent)

  • Batch: full dataset per step; converges on convex MSE “bowl.” Image

  • Stochastic: computes the gradients 1 instance at a time

  • Mini-batch: computes the gradients on small random sets of instances

Model complexity / Regularization

Diagnose under/overfit using a learning curve. Image

  • Overfitting: use early stopping or regularization.

Sam

Regularization

  • Ridge (\(\ell_2\)): shrinks weights smoothly; good default; sensitive to scaling.

    • Corresponds to RMSE and the Euclidean norm
  • Lasso (\(\ell_1\)): drives some weights to zero (feature selection); can “bounce” near optimum—reduce LR over time.

    • Corresponds to MAE and the Manhattan norm
  • Elastic Net: mix of L1/L2; often preferred over pure Lasso when p>m or features are correlated.

Start with Ridge, consider Lasso/Elastic Net if you expect sparsity.

Evaluation

Sam

  • MAE = Mean absolute error

  • MAPE = Mean absolute pct error

  • RMSE = Root mean squared error

  • SMAPE = Symmetric mean absolute percentage error.

    • Goes from 0 to 200%. Apply when comparing average error of different models. Does not apply when we are looking at each observation.

Lift: Ranked by their predicted number, comparing to the average. Image