Descriptive Statistics
Understanding your data begins with descriptive statistics — these provide essential summaries about the distribution, central tendency, and spread of variables. This chapter guides you through basic statistical calculations and grouped summaries using base R
and dplyr
.
Basic Descriptive Statistics (Base R)
The most common statistical measures are:
- Mean: average value;
- Median: middle value;
- Min / Max: smallest and largest values.
mean(df$max_power, na.rm = TRUE) # Average max power
median(df$selling_price, na.rm = TRUE) # Median selling price
min(df$mileage, na.rm = TRUE) # Minimum mileage
max(df$mileage, na.rm = TRUE) # Maximum mileage
summary(df) # Quick summary for all numeric columns
Descriptive Statistics using dplyr
Using dplyr makes calculations more readable and efficient.
df %>%
summarise(
avg_power = mean(max_power, na.rm = TRUE),
sd_power = sd(max_power, na.rm = TRUE),
median_power = median(max_power, na.rm = TRUE)
)
Bedankt voor je feedback!
Vraag AI
Vraag AI
Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.
Can you explain the difference between mean, median, and mode?
How do I use dplyr to calculate grouped summaries?
What does the summary() function output look like?
Awesome!
Completion rate improved to 4
Descriptive Statistics
Veeg om het menu te tonen
Understanding your data begins with descriptive statistics — these provide essential summaries about the distribution, central tendency, and spread of variables. This chapter guides you through basic statistical calculations and grouped summaries using base R
and dplyr
.
Basic Descriptive Statistics (Base R)
The most common statistical measures are:
- Mean: average value;
- Median: middle value;
- Min / Max: smallest and largest values.
mean(df$max_power, na.rm = TRUE) # Average max power
median(df$selling_price, na.rm = TRUE) # Median selling price
min(df$mileage, na.rm = TRUE) # Minimum mileage
max(df$mileage, na.rm = TRUE) # Maximum mileage
summary(df) # Quick summary for all numeric columns
Descriptive Statistics using dplyr
Using dplyr makes calculations more readable and efficient.
df %>%
summarise(
avg_power = mean(max_power, na.rm = TRUE),
sd_power = sd(max_power, na.rm = TRUE),
median_power = median(max_power, na.rm = TRUE)
)
Bedankt voor je feedback!