Descriptive Statistics
Understanding your data begins with descriptive statistics — these provide essential summaries about the distribution, central tendency, and spread of variables. This chapter guides you through basic statistical calculations and grouped summaries using base R
and dplyr
.
Basic Descriptive Statistics (Base R)
The most common statistical measures are:
- Mean: average value;
- Median: middle value;
- Min / Max: smallest and largest values.
mean(df$max_power, na.rm = TRUE) # Average max power
median(df$selling_price, na.rm = TRUE) # Median selling price
min(df$mileage, na.rm = TRUE) # Minimum mileage
max(df$mileage, na.rm = TRUE) # Maximum mileage
summary(df) # Quick summary for all numeric columns
Descriptive Statistics using dplyr
Using dplyr makes calculations more readable and efficient.
df %>%
summarise(
avg_power = mean(max_power, na.rm = TRUE),
sd_power = sd(max_power, na.rm = TRUE),
median_power = median(max_power, na.rm = TRUE)
)
Tack för dina kommentarer!
Fråga AI
Fråga AI
Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal
Can you explain the difference between mean, median, and mode?
How do I use dplyr to calculate grouped summaries?
What does the summary() function output look like?
Awesome!
Completion rate improved to 4
Descriptive Statistics
Svep för att visa menyn
Understanding your data begins with descriptive statistics — these provide essential summaries about the distribution, central tendency, and spread of variables. This chapter guides you through basic statistical calculations and grouped summaries using base R
and dplyr
.
Basic Descriptive Statistics (Base R)
The most common statistical measures are:
- Mean: average value;
- Median: middle value;
- Min / Max: smallest and largest values.
mean(df$max_power, na.rm = TRUE) # Average max power
median(df$selling_price, na.rm = TRUE) # Median selling price
min(df$mileage, na.rm = TRUE) # Minimum mileage
max(df$mileage, na.rm = TRUE) # Maximum mileage
summary(df) # Quick summary for all numeric columns
Descriptive Statistics using dplyr
Using dplyr makes calculations more readable and efficient.
df %>%
summarise(
avg_power = mean(max_power, na.rm = TRUE),
sd_power = sd(max_power, na.rm = TRUE),
median_power = median(max_power, na.rm = TRUE)
)
Tack för dina kommentarer!