EDA
Missing Values¶
-
Completely at random (collection system issue?)
-
At random (people not entering their info)
-
Not at random (happening for a reason)
Noise Techniques¶
Smoothing
-
Separate into bins, replace with bin mean or with nearest bin boundary
-
Regression line
-
Distribution line
Discretization
- Helpful for concept hierarchies, think age 67 vs 68
Normalization / Standardization