Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Removing Outliers Using IQR Method | Basic Statistical Analysis
Data Analysis with R

bookRemoving Outliers Using IQR Method

In real-world datasets, extreme values or outliers can distort statistical results and visualizations. One common and effective way to detect and remove these outliers is by using the Interquartile Range (IQR) Method.

What Is IQR?

The Interquartile Range (IQR) is a measure of statistical dispersion and is calculated as:

The Interquartile Range (IQR) is a measure of statistical dispersion and is calculated as:

IQR=Q3βˆ’Q1IQR=Q3βˆ’Q1IQR = Q3βˆ’Q1IQR = Q3 - Q1

Q1 = 25th percentile (first quartile)

Q3 = 75th percentile (third quartile)

Values lying below Q1βˆ’1.5Γ—IQRQ1βˆ’1.5Γ—IQR or above Q3+1.5Γ—IQRQ3+1.5Γ—IQR are typically considered outliers.

Calculating IQR and Detecting Outliers

Step 1: Calculate Quartiles and IQR

q1_placement <- quantile(df$placement_exam_marks, 0.25)
q3_placement <- quantile(df$placement_exam_marks, 0.75)
iqr_placement <- q3_placement - q1_placement

Step 2: Define Upper and Lower Boundaries

Thresh_hold <- 1.5

upper_boundary <- q3_placement + (Thresh_hold * iqr_placement)
lower_boundary <- q1_placement - (Thresh_hold * iqr_placement)

Step 3: Identify and Remove Outliers

# Display Outliers
df[df$placement_exam_marks > upper_boundary | df$placement_exam_marks < lower_boundary,]
# Create Cleaned Dataset
df2 <- df[df$placement_exam_marks <= upper_boundary & df$placement_exam_marks >= lower_boundary,]
View(df2)

Summary

  • IQR method is useful when the data is not normally distributed;
  • It is non-parametric, meaning it does not assume a specific data distribution;
  • Best suited for small to medium-sized datasets with clear extremes.
question mark

What does IQR stand for?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 4

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Awesome!

Completion rate improved to 4

bookRemoving Outliers Using IQR Method

Swipe to show menu

In real-world datasets, extreme values or outliers can distort statistical results and visualizations. One common and effective way to detect and remove these outliers is by using the Interquartile Range (IQR) Method.

What Is IQR?

The Interquartile Range (IQR) is a measure of statistical dispersion and is calculated as:

The Interquartile Range (IQR) is a measure of statistical dispersion and is calculated as:

IQR=Q3βˆ’Q1IQR=Q3βˆ’Q1IQR = Q3βˆ’Q1IQR = Q3 - Q1

Q1 = 25th percentile (first quartile)

Q3 = 75th percentile (third quartile)

Values lying below Q1βˆ’1.5Γ—IQRQ1βˆ’1.5Γ—IQR or above Q3+1.5Γ—IQRQ3+1.5Γ—IQR are typically considered outliers.

Calculating IQR and Detecting Outliers

Step 1: Calculate Quartiles and IQR

q1_placement <- quantile(df$placement_exam_marks, 0.25)
q3_placement <- quantile(df$placement_exam_marks, 0.75)
iqr_placement <- q3_placement - q1_placement

Step 2: Define Upper and Lower Boundaries

Thresh_hold <- 1.5

upper_boundary <- q3_placement + (Thresh_hold * iqr_placement)
lower_boundary <- q1_placement - (Thresh_hold * iqr_placement)

Step 3: Identify and Remove Outliers

# Display Outliers
df[df$placement_exam_marks > upper_boundary | df$placement_exam_marks < lower_boundary,]
# Create Cleaned Dataset
df2 <- df[df$placement_exam_marks <= upper_boundary & df$placement_exam_marks >= lower_boundary,]
View(df2)

Summary

  • IQR method is useful when the data is not normally distributed;
  • It is non-parametric, meaning it does not assume a specific data distribution;
  • Best suited for small to medium-sized datasets with clear extremes.
question mark

What does IQR stand for?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 3. ChapterΒ 4
some-alt