Removing Outliers Using IQR Method
In real-world datasets, extreme values or outliers can distort statistical results and visualizations. One common and effective way to detect and remove these outliers is by using the Interquartile Range (IQR) Method.
What Is IQR?
The Interquartile Range (IQR) is a measure of statistical dispersion and is calculated as:
The Interquartile Range (IQR) is a measure of statistical dispersion and is calculated as:
IQR=Q3−Q1IQR=Q3−Q1
Q1 = 25th percentile (first quartile)
Q3 = 75th percentile (third quartile)
Values lying below Q1−1.5×IQR or above Q3+1.5×IQR are typically considered outliers.
Calculating IQR and Detecting Outliers
Step 1: Calculate Quartiles and IQR
q1_placement <- quantile(df$placement_exam_marks, 0.25)
q3_placement <- quantile(df$placement_exam_marks, 0.75)
iqr_placement <- q3_placement - q1_placement
Step 2: Define Upper and Lower Boundaries
Thresh_hold <- 1.5
upper_boundary <- q3_placement + (Thresh_hold * iqr_placement)
lower_boundary <- q1_placement - (Thresh_hold * iqr_placement)
Step 3: Identify and Remove Outliers
# Display Outliers
df[df$placement_exam_marks > upper_boundary | df$placement_exam_marks < lower_boundary,]
# Create Cleaned Dataset
df2 <- df[df$placement_exam_marks <= upper_boundary & df$placement_exam_marks >= lower_boundary,]
View(df2)
Summary
- IQR method is useful when the data is not normally distributed;
- It is non-parametric, meaning it does not assume a specific data distribution;
- Best suited for small to medium-sized datasets with clear extremes.
Tack för dina kommentarer!
Fråga AI
Fråga AI
Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal
Awesome!
Completion rate improved to 4
Removing Outliers Using IQR Method
Svep för att visa menyn
In real-world datasets, extreme values or outliers can distort statistical results and visualizations. One common and effective way to detect and remove these outliers is by using the Interquartile Range (IQR) Method.
What Is IQR?
The Interquartile Range (IQR) is a measure of statistical dispersion and is calculated as:
The Interquartile Range (IQR) is a measure of statistical dispersion and is calculated as:
IQR=Q3−Q1IQR=Q3−Q1
Q1 = 25th percentile (first quartile)
Q3 = 75th percentile (third quartile)
Values lying below Q1−1.5×IQR or above Q3+1.5×IQR are typically considered outliers.
Calculating IQR and Detecting Outliers
Step 1: Calculate Quartiles and IQR
q1_placement <- quantile(df$placement_exam_marks, 0.25)
q3_placement <- quantile(df$placement_exam_marks, 0.75)
iqr_placement <- q3_placement - q1_placement
Step 2: Define Upper and Lower Boundaries
Thresh_hold <- 1.5
upper_boundary <- q3_placement + (Thresh_hold * iqr_placement)
lower_boundary <- q1_placement - (Thresh_hold * iqr_placement)
Step 3: Identify and Remove Outliers
# Display Outliers
df[df$placement_exam_marks > upper_boundary | df$placement_exam_marks < lower_boundary,]
# Create Cleaned Dataset
df2 <- df[df$placement_exam_marks <= upper_boundary & df$placement_exam_marks >= lower_boundary,]
View(df2)
Summary
- IQR method is useful when the data is not normally distributed;
- It is non-parametric, meaning it does not assume a specific data distribution;
- Best suited for small to medium-sized datasets with clear extremes.
Tack för dina kommentarer!