Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Removing Outliers Using Z-Score Method | Basic Statistical Analysis
Data Analysis with R

bookRemoving Outliers Using Z-Score Method

Outliers can heavily influence statistical analyses and models. One common method for detecting and removing them is the Z-Score Method. This technique identifies how far a data point is from the mean in terms of standard deviations. If a data point lies beyond a certain threshold (commonly ±3), it is considered an outlier.

What Is a Z-Score?

A Z-score (also known as a standard score) is calculated using the formula:

Z=XμσZ = \frac{X - \mu}{\sigma}

Where:

  • X: the original data point;
  • μ: the mean of the dataset;
  • σ: the standard deviation of the dataset.

Calculating Z-Scores for CGPA

# Step 1: Calculate mean and standard deviation
mean_cgpa <- mean(df$cgpa)
sd_cgpa <- sd(df$cgpa)
# Step 2: Calculate Z-scores manually
df$cgpa_zscore <- (df$cgpa - mean_cgpa) / sd_cgpa
# OR use the built-in function
df$cgpa_zscore <- scale(df$cgpa)
head(df$cgpa_zscore)  # View first few Z-scores

Identifying Outliers

thresh_hold <- 3  # Common threshold for Z-score outliers

# Filter out outliers
outliers <- df[df$cgpa_zscore > thresh_hold | df$cgpa_zscore < -thresh_hold, ]
print(outliers)  # View outlier rows

Creating an Outlier-Free Dataset

df2 <- df[df$cgpa_zscore < thresh_hold & df$cgpa_zscore > -thresh_hold, ]
View(df2)  # View cleaned data
question mark

What happens to values with Z-scores beyond ±3?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 3. Luku 3

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Suggested prompts:

Can you explain why a Z-score threshold of 3 is commonly used for outlier detection?

How does changing the Z-score threshold affect the number of outliers detected?

What should I do if my data is not normally distributed?

Awesome!

Completion rate improved to 4

bookRemoving Outliers Using Z-Score Method

Pyyhkäise näyttääksesi valikon

Outliers can heavily influence statistical analyses and models. One common method for detecting and removing them is the Z-Score Method. This technique identifies how far a data point is from the mean in terms of standard deviations. If a data point lies beyond a certain threshold (commonly ±3), it is considered an outlier.

What Is a Z-Score?

A Z-score (also known as a standard score) is calculated using the formula:

Z=XμσZ = \frac{X - \mu}{\sigma}

Where:

  • X: the original data point;
  • μ: the mean of the dataset;
  • σ: the standard deviation of the dataset.

Calculating Z-Scores for CGPA

# Step 1: Calculate mean and standard deviation
mean_cgpa <- mean(df$cgpa)
sd_cgpa <- sd(df$cgpa)
# Step 2: Calculate Z-scores manually
df$cgpa_zscore <- (df$cgpa - mean_cgpa) / sd_cgpa
# OR use the built-in function
df$cgpa_zscore <- scale(df$cgpa)
head(df$cgpa_zscore)  # View first few Z-scores

Identifying Outliers

thresh_hold <- 3  # Common threshold for Z-score outliers

# Filter out outliers
outliers <- df[df$cgpa_zscore > thresh_hold | df$cgpa_zscore < -thresh_hold, ]
print(outliers)  # View outlier rows

Creating an Outlier-Free Dataset

df2 <- df[df$cgpa_zscore < thresh_hold & df$cgpa_zscore > -thresh_hold, ]
View(df2)  # View cleaned data
question mark

What happens to values with Z-scores beyond ±3?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 3. Luku 3
some-alt