Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lära Summarizing Data | Data Manipulation and Cleaning
Data Analysis with R

bookSummarizing Data

Summarizing data is essential for getting a quick understanding of its structure and key patterns. In this chapter, you'll learn how to compute statistics such as mean, median, and standard deviation, as well as group-wise summaries using both base R and dplyr.

Quick summary of the dataset

To start with, use summary() to get a general overview of all numerical and categorical variables:

library(tidyverse)
library(dplyr)
df <- read_csv("car_details.csv")
view(df)
summary(df)

Summary statistics for a single column

Let’s compute the mean, median, and standard deviation for the selling_price column:

# Base R
mean(df$selling_price, na.rm = TRUE)
median(df$selling_price, na.rm = TRUE)
sd(df$selling_price, na.rm = TRUE)
# dplyr
df %>%
  summarise(
    mean_price = mean(selling_price, na.rm = TRUE),
    median_price = median(selling_price, na.rm = TRUE),
    sd_price = sd(selling_price, na.rm = TRUE)
  )

Summarizing multiple columns by group

Let’s say you want the average selling price and average mileage for each fuel type. First, ensure mileage is numeric:

df$mileage <- as.numeric(gsub(" km.*", "", df$mileage))
str(df$mileage)

Then summarize:

# Base R
aggregate(cbind(selling_price, mileage) ~ fuel, data = df, FUN = mean, na.rm = TRUE)
# dplyr
df %>%
  group_by(fuel) %>%
  summarise(
    mean_price = mean(selling_price, na.rm = TRUE),
    mean_mileage = mean(mileage, na.rm = TRUE)
  )
question mark

aggregate() function is used in base R to:

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 11

Fråga AI

expand

Fråga AI

ChatGPT

Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal

Awesome!

Completion rate improved to 4

bookSummarizing Data

Svep för att visa menyn

Summarizing data is essential for getting a quick understanding of its structure and key patterns. In this chapter, you'll learn how to compute statistics such as mean, median, and standard deviation, as well as group-wise summaries using both base R and dplyr.

Quick summary of the dataset

To start with, use summary() to get a general overview of all numerical and categorical variables:

library(tidyverse)
library(dplyr)
df <- read_csv("car_details.csv")
view(df)
summary(df)

Summary statistics for a single column

Let’s compute the mean, median, and standard deviation for the selling_price column:

# Base R
mean(df$selling_price, na.rm = TRUE)
median(df$selling_price, na.rm = TRUE)
sd(df$selling_price, na.rm = TRUE)
# dplyr
df %>%
  summarise(
    mean_price = mean(selling_price, na.rm = TRUE),
    median_price = median(selling_price, na.rm = TRUE),
    sd_price = sd(selling_price, na.rm = TRUE)
  )

Summarizing multiple columns by group

Let’s say you want the average selling price and average mileage for each fuel type. First, ensure mileage is numeric:

df$mileage <- as.numeric(gsub(" km.*", "", df$mileage))
str(df$mileage)

Then summarize:

# Base R
aggregate(cbind(selling_price, mileage) ~ fuel, data = df, FUN = mean, na.rm = TRUE)
# dplyr
df %>%
  group_by(fuel) %>%
  summarise(
    mean_price = mean(selling_price, na.rm = TRUE),
    mean_mileage = mean(mileage, na.rm = TRUE)
  )
question mark

aggregate() function is used in base R to:

Select the correct answer

Var allt tydligt?

Hur kan vi förbättra det?

Tack för dina kommentarer!

Avsnitt 1. Kapitel 11
some-alt