Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Removing Outliers Using IQR Method | Basic Statistical Analysis
Data Analysis with R

bookRemoving Outliers Using IQR Method

メニューを表示するにはスワイプしてください

Another effective way to detect and remove outliers is by using the interquartile range (IQR) method.

What Is IQR?

The interquartile range (IQR) is a measure of statistical dispersion and is calculated as:

IQR=Q3Q1IQR = Q3−Q1

Where:

  • Q1Q1: 25th percentile (first quartile);
  • Q3Q3: 75th percentile (third quartile).

Values lying below Q11.5×IQRQ1 − 1.5 \times IQR or above Q3+1.5×IQRQ3 + 1.5 \times IQR are typically considered outliers.

Calculating IQR

To calculate the IQR value and detect the outliers, you first need to know the 25th percentile and 75th percentile values. They can be obtained with the quantile() function. Then, you can compute the IQR value by following the formula.

q1_placement <- quantile(df$placement_exam_marks, 0.25)
q3_placement <- quantile(df$placement_exam_marks, 0.75)
iqr_placement <- q3_placement - q1_placement

Identifying Outliers

Similar to the z-score method, you need to identify the lower and upper boundaries:

Thresh_hold <- 1.5
upper_boundary <- q3_placement + (Thresh_hold * iqr_placement)
lower_boundary <- q1_placement - (Thresh_hold * iqr_placement)

Then you can either select all outliers to analyze them:

df[df$placement_exam_marks > upper_boundary | df$placement_exam_marks < lower_boundary,]

Or create an outlier-free dataset:

df2 <- df[df$placement_exam_marks <= upper_boundary & df$placement_exam_marks >= lower_boundary,]
question mark

What does IQR stand for?

正しい答えを選んでください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 3.  4

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 3.  4
some-alt