Descriptive Statistics
Understanding your data begins with descriptive statistics - these provide essential summaries about the distribution, central tendency, and spread of variables.
Basic Descriptive Statistics
The most common statistical measures are:
- Mean: average value;
- Standard deviation: how much values deviate from the mean;
- Median: middle value;
- Min / max: smallest and largest values.
These give a quick overview of how your variables are distributed.
Base R
Base R provides simple functions for calculating descriptive statistics. The summary() function also produces a quick statistical overview of all numeric columns.
mean(df$max_power, na.rm = TRUE)
median(df$selling_price, na.rm = TRUE)
min(df$mileage, na.rm = TRUE)
max(df$mileage, na.rm = TRUE)
summary(df)
dplyr
With dplyr, you can use summarise() to calculate multiple statistics at once in a clean and readable format.
df %>%
summarise(
avg_power = mean(max_power, na.rm = TRUE),
sd_power = sd(max_power, na.rm = TRUE),
median_power = median(max_power, na.rm = TRUE)
)
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 4
Descriptive Statistics
Swipe to show menu
Understanding your data begins with descriptive statistics - these provide essential summaries about the distribution, central tendency, and spread of variables.
Basic Descriptive Statistics
The most common statistical measures are:
- Mean: average value;
- Standard deviation: how much values deviate from the mean;
- Median: middle value;
- Min / max: smallest and largest values.
These give a quick overview of how your variables are distributed.
Base R
Base R provides simple functions for calculating descriptive statistics. The summary() function also produces a quick statistical overview of all numeric columns.
mean(df$max_power, na.rm = TRUE)
median(df$selling_price, na.rm = TRUE)
min(df$mileage, na.rm = TRUE)
max(df$mileage, na.rm = TRUE)
summary(df)
dplyr
With dplyr, you can use summarise() to calculate multiple statistics at once in a clean and readable format.
df %>%
summarise(
avg_power = mean(max_power, na.rm = TRUE),
sd_power = sd(max_power, na.rm = TRUE),
median_power = median(max_power, na.rm = TRUE)
)
Thanks for your feedback!