Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Apprendre Data Selection - Advanced Techniques | Data Manipulation and Cleaning
Data Analysis with R

bookData Selection - Advanced Techniques

In the previous chapter, we explored how to select single rows and columns using basic indexing. Now, we’ll go a step further and learn how to select multiple rows and columns using both base R and the dplyr package. These techniques are essential when you want to focus on specific parts of a dataset or prepare your data for further analysis.

Selecting multiple columns (Base R)

  • Use the c() function to combine multiple column positions or names;

  • This allows you to select several columns at once;

  • The result is a smaller data frame with only the specified columns.

Example using column positions:

selected_data_base <- df[, c(1, 2, 3)]
head(selected_data_base)

Example using column names:

selected_data_base <- df[, c("name", "selling_price", "transmission")]
head(selected_data_base)

Indexing single values

  • You can access a specific value using row and column numbers;

  • This is helpful for checking or debugging individual data points.

df[1, 2]  # accesses the value in row 1, column 2

Slicing rows

Sometimes you only want to work with the first few rows, or specific rows by position.

Select first 5 rows: Using base R:

first_5_rows_base <- df[1:5, ]
head(first_5_rows_base)

Using dplyr:

first_5_rows_dplyr <- df %>%
  slice(1:5)
head(first_5_rows_dplyr)
question mark

What does df[1:5, ] do?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 1. Chapitre 5

Demandez à l'IA

expand

Demandez à l'IA

ChatGPT

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Awesome!

Completion rate improved to 4

bookData Selection - Advanced Techniques

Glissez pour afficher le menu

In the previous chapter, we explored how to select single rows and columns using basic indexing. Now, we’ll go a step further and learn how to select multiple rows and columns using both base R and the dplyr package. These techniques are essential when you want to focus on specific parts of a dataset or prepare your data for further analysis.

Selecting multiple columns (Base R)

  • Use the c() function to combine multiple column positions or names;

  • This allows you to select several columns at once;

  • The result is a smaller data frame with only the specified columns.

Example using column positions:

selected_data_base <- df[, c(1, 2, 3)]
head(selected_data_base)

Example using column names:

selected_data_base <- df[, c("name", "selling_price", "transmission")]
head(selected_data_base)

Indexing single values

  • You can access a specific value using row and column numbers;

  • This is helpful for checking or debugging individual data points.

df[1, 2]  # accesses the value in row 1, column 2

Slicing rows

Sometimes you only want to work with the first few rows, or specific rows by position.

Select first 5 rows: Using base R:

first_5_rows_base <- df[1:5, ]
head(first_5_rows_base)

Using dplyr:

first_5_rows_dplyr <- df %>%
  slice(1:5)
head(first_5_rows_dplyr)
question mark

What does df[1:5, ] do?

Select the correct answer

Tout était clair ?

Comment pouvons-nous l'améliorer ?

Merci pour vos commentaires !

Section 1. Chapitre 5
some-alt