Skip to content

EDA

Missing Values

  1. Completely at random (collection system issue?)

  2. At random (people not entering their info)

  3. Not at random (happening for a reason)

Noise Techniques

Smoothing

  • Separate into bins, replace with bin mean or with nearest bin boundary

  • Regression line

  • Distribution line

Discretization

  • Helpful for concept hierarchies, think age 67 vs 68

Normalization / Standardization