Summary  
This chapter demonstrates how to calculate z-scores for numeric data fields and filter rows that exceed a configurable threshold to identify and remove outliers.

General domain of usage  
Data preprocessing

One common method for detecting and removing outliers is the **z-score method**. This technique identifies how far a data point is from the mean in terms of standard deviations. If a data point lies beyond a certain threshold (commonly ±3), it is considered an outlier.

## What Is a Z-Score?
A z-score (also known as a standard score) is calculated using the formula:

$$
Z = \frac{X - \mu}{\sigma}
$$

Where:
- $$X$$: the original data point;
- $$\mu$$: the mean of the dataset;
- $$\sigma$$: the standard deviation of the dataset.

## Calculating Z-Scores
You can either compute z-scores manually by following the formula:
```
mean_cgpa <- mean(df$cgpa)
sd_cgpa <- sd(df$cgpa)
df$cgpa_zscore <- (df$cgpa - mean_cgpa) / sd_cgpa
```

Or you can use the built-in function:
```
df$cgpa_zscore <- scale(df$cgpa)
```

## Identifying Outliers
After calculating the z-scores, you can choose a threshold (±3 in this case) and apply a simple filtering operation to select all entries outside of the range:
```
thresh_hold <- 3
outliers <- df[df$cgpa_zscore > thresh_hold | df$cgpa_zscore < -thresh_hold, ]
```

Or you can select all entries inside the range to create an outlier-free dataset:
```
df2 <- df[df$cgpa_zscore < thresh_hold & df$cgpa_zscore > -thresh_hold, ]
```

What happens to values with z-scores beyond ±3?


Gain practical experience in data analysis with R by learning how to clean, transform, and visualize datasets. Explore essential workflows such as selecting and filtering data, handling missing values, and summarizing results. Build confidence in preparing data for insights, reporting, and deeper statistical exploration.

Explore the foundations of data analysis with R. Learn how to install the tools, load and inspect datasets, select and filter information, sort and transform data, handle missing values, and summarize results for deeper insights.

Learn to create compelling visualizations with ggplot2. Build bar charts, histograms, density plots, and scatter plots, then customize and refine them with styling options and faceting to reveal deeper insights in your data.

Strengthen your understanding of statistics for data analysis. Apply descriptive measures, identify and treat outliers, and use correlation techniques with visual tools like heatmaps and scatter plots to uncover meaningful relationships.

Removing Outliers Using Z-Score Method

What Is a Z-Score?

Calculating Z-Scores

Identifying Outliers

Removing Outliers Using Z-Score Method

What Is a Z-Score?

Calculating Z-Scores

Identifying Outliers