Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Summarizing Data | Data Manipulation and Cleaning
Data Analysis with R

bookSummarizing Data

Summarizing data is essential for getting a quick understanding of its structure and key patterns. In this chapter, you'll learn how to compute statistics such as mean, median, and standard deviation, as well as group-wise summaries using both base R and dplyr.

Quick summary of the dataset

To start with, use summary() to get a general overview of all numerical and categorical variables:

library(tidyverse)
library(dplyr)
df <- read_csv("car_details.csv")
view(df)
summary(df)

Summary statistics for a single column

Let’s compute the mean, median, and standard deviation for the selling_price column:

# Base R
mean(df$selling_price, na.rm = TRUE)
median(df$selling_price, na.rm = TRUE)
sd(df$selling_price, na.rm = TRUE)
# dplyr
df %>%
  summarise(
    mean_price = mean(selling_price, na.rm = TRUE),
    median_price = median(selling_price, na.rm = TRUE),
    sd_price = sd(selling_price, na.rm = TRUE)
  )

Summarizing multiple columns by group

Let’s say you want the average selling price and average mileage for each fuel type. First, ensure mileage is numeric:

df$mileage <- as.numeric(gsub(" km.*", "", df$mileage))
str(df$mileage)

Then summarize:

# Base R
aggregate(cbind(selling_price, mileage) ~ fuel, data = df, FUN = mean, na.rm = TRUE)
# dplyr
df %>%
  group_by(fuel) %>%
  summarise(
    mean_price = mean(selling_price, na.rm = TRUE),
    mean_mileage = mean(mileage, na.rm = TRUE)
  )
question mark

aggregate() function is used in base R to:

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 11

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Awesome!

Completion rate improved to 4

bookSummarizing Data

Deslize para mostrar o menu

Summarizing data is essential for getting a quick understanding of its structure and key patterns. In this chapter, you'll learn how to compute statistics such as mean, median, and standard deviation, as well as group-wise summaries using both base R and dplyr.

Quick summary of the dataset

To start with, use summary() to get a general overview of all numerical and categorical variables:

library(tidyverse)
library(dplyr)
df <- read_csv("car_details.csv")
view(df)
summary(df)

Summary statistics for a single column

Let’s compute the mean, median, and standard deviation for the selling_price column:

# Base R
mean(df$selling_price, na.rm = TRUE)
median(df$selling_price, na.rm = TRUE)
sd(df$selling_price, na.rm = TRUE)
# dplyr
df %>%
  summarise(
    mean_price = mean(selling_price, na.rm = TRUE),
    median_price = median(selling_price, na.rm = TRUE),
    sd_price = sd(selling_price, na.rm = TRUE)
  )

Summarizing multiple columns by group

Let’s say you want the average selling price and average mileage for each fuel type. First, ensure mileage is numeric:

df$mileage <- as.numeric(gsub(" km.*", "", df$mileage))
str(df$mileage)

Then summarize:

# Base R
aggregate(cbind(selling_price, mileage) ~ fuel, data = df, FUN = mean, na.rm = TRUE)
# dplyr
df %>%
  group_by(fuel) %>%
  summarise(
    mean_price = mean(selling_price, na.rm = TRUE),
    mean_mileage = mean(mileage, na.rm = TRUE)
  )
question mark

aggregate() function is used in base R to:

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 11
some-alt