Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Removing Outliers Using Z-Score Method | Basic Statistical Analysis
/
Data Analysis with R

bookRemoving Outliers Using Z-Score Method

メニューを表示するにはスワイプしてください

One common method for detecting and removing outliers is the z-score method. This technique identifies how far a data point is from the mean in terms of standard deviations. If a data point lies beyond a certain threshold (commonly ±3), it is considered an outlier.

What Is a Z-Score?

A z-score (also known as a standard score) is calculated using the formula:

Z=XμσZ = \frac{X - \mu}{\sigma}

Where:

  • XX: the original data point;
  • μ\mu: the mean of the dataset;
  • σ\sigma: the standard deviation of the dataset.

Calculating Z-Scores

You can either compute z-scores manually by following the formula:

mean_cgpa <- mean(df$cgpa)
sd_cgpa <- sd(df$cgpa)
df$cgpa_zscore <- (df$cgpa - mean_cgpa) / sd_cgpa

Or you can use the built-in function:

df$cgpa_zscore <- scale(df$cgpa)

Identifying Outliers

After calculating the z-scores, you can choose a threshold (±3 in this case) and apply a simple filtering operation to select all entries outside of the range:

thresh_hold <- 3
outliers <- df[df$cgpa_zscore > thresh_hold | df$cgpa_zscore < -thresh_hold, ]

Or you can select all entries inside the range to create an outlier-free dataset:

df2 <- df[df$cgpa_zscore < thresh_hold & df$cgpa_zscore > -thresh_hold, ]
question mark

What happens to values with z-scores beyond ±3?

正しい答えを選んでください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 3.  3

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 3.  3
some-alt