Contenido del Curso
Data Anomaly Detection
Data Anomaly Detection
Median Absolute Deviation
The MAD (Median Absolute Deviation) rule is a statistical outlier detection method that uses the median and the median absolute deviation as robust estimators to identify outliers in a dataset.
It is particularly useful when dealing with data that may not follow a normal distribution or when there are potential outliers that can significantly impact the mean and standard deviation.
How to use MAD rule
- Calculate the Median: Compute the median of the dataset, which is the middle value when the data is sorted;
- Calculate the Median Absolute Deviation (MAD): For each data point, find the absolute difference between the data point and the median. The MAD is the median of these absolute differences;
- Define a Threshold: Choose a threshold value (usually a constant, e.g., 2 or 3 times the MAD) to determine how far a data point can deviate from the median before being considered an outlier;
- Identify Outliers: Any data point that has an absolute difference from the median greater than the threshold is considered an outlier.
Note
Mathematically, the absolute difference between two values,
A
andB
, is denoted as|A - B|
, where"|"
represents the absolute value function. This function returns the positive value of the difference between A and B.
MAD rule implementation
MAD vs 1.5 IQR rule
¡Gracias por tus comentarios!