ML 04 Regression
Sam
Fit Linear Regression via either:
-
a closed-form solution (Normal Equation): a mathematical equation that gives the result directly
-
iterative optimization (GD: batch, stochastic, mini-batch): initialize model parameters randomly ---> tweak to min cost function
Iterative optimization (gradient descent)
-
Batch: full dataset per step; converges on convex MSE “bowl.” Image
-
Stochastic: computes the gradients 1 instance at a time
-
Mini-batch: computes the gradients on small random sets of instances
Model complexity / Regularization¶
Diagnose under/overfit using a learning curve. Image
- Overfitting: use early stopping or regularization.
Sam
Regularization
-
Ridge (\(\ell_2\)): shrinks weights smoothly; good default; sensitive to scaling.
- Corresponds to RMSE and the Euclidean norm
-
Lasso (\(\ell_1\)): drives some weights to zero (feature selection); can “bounce” near optimum—reduce LR over time.
- Corresponds to MAE and the Manhattan norm
-
Elastic Net: mix of L1/L2; often preferred over pure Lasso when p>m or features are correlated.
Start with Ridge, consider Lasso/Elastic Net if you expect sparsity.
Evaluation¶
Sam
-
MAE = Mean absolute error
-
MAPE = Mean absolute pct error
-
RMSE = Root mean squared error
-
SMAPE = Symmetric mean absolute percentage error.
- Goes from 0 to 200%. Apply when comparing average error of different models. Does not apply when we are looking at each observation.
Lift: Ranked by their predicted number, comparing to the average. Image