Descriptive Statistics
Understanding your data begins with descriptive statistics — these provide essential summaries about the distribution, central tendency, and spread of variables. This chapter guides you through basic statistical calculations and grouped summaries using base R
and dplyr
.
Basic Descriptive Statistics (Base R)
The most common statistical measures are:
- Mean: average value;
- Median: middle value;
- Min / Max: smallest and largest values.
mean(df$max_power, na.rm = TRUE) # Average max power
median(df$selling_price, na.rm = TRUE) # Median selling price
min(df$mileage, na.rm = TRUE) # Minimum mileage
max(df$mileage, na.rm = TRUE) # Maximum mileage
summary(df) # Quick summary for all numeric columns
Descriptive Statistics using dplyr
Using dplyr makes calculations more readable and efficient.
df %>%
summarise(
avg_power = mean(max_power, na.rm = TRUE),
sd_power = sd(max_power, na.rm = TRUE),
median_power = median(max_power, na.rm = TRUE)
)
Takk for tilbakemeldingene dine!
Spør AI
Spør AI
Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår
Can you explain the difference between mean, median, and mode?
How do I use dplyr to calculate grouped summaries?
What does the summary() function output look like?
Awesome!
Completion rate improved to 4
Descriptive Statistics
Sveip for å vise menyen
Understanding your data begins with descriptive statistics — these provide essential summaries about the distribution, central tendency, and spread of variables. This chapter guides you through basic statistical calculations and grouped summaries using base R
and dplyr
.
Basic Descriptive Statistics (Base R)
The most common statistical measures are:
- Mean: average value;
- Median: middle value;
- Min / Max: smallest and largest values.
mean(df$max_power, na.rm = TRUE) # Average max power
median(df$selling_price, na.rm = TRUE) # Median selling price
min(df$mileage, na.rm = TRUE) # Minimum mileage
max(df$mileage, na.rm = TRUE) # Maximum mileage
summary(df) # Quick summary for all numeric columns
Descriptive Statistics using dplyr
Using dplyr makes calculations more readable and efficient.
df %>%
summarise(
avg_power = mean(max_power, na.rm = TRUE),
sd_power = sd(max_power, na.rm = TRUE),
median_power = median(max_power, na.rm = TRUE)
)
Takk for tilbakemeldingene dine!