Data Selection - Advanced Techniques
In the previous chapter, we explored how to select single rows and columns using basic indexing. Now, weβll go a step further and learn how to select multiple rows and columns using both base R and the dplyr
package. These techniques are essential when you want to focus on specific parts of a dataset or prepare your data for further analysis.
Selecting multiple columns (Base R)
-
Use the
c()
function to combine multiple column positions or names; -
This allows you to select several columns at once;
-
The result is a smaller data frame with only the specified columns.
Example using column positions:
selected_data_base <- df[, c(1, 2, 3)]
head(selected_data_base)
Example using column names:
selected_data_base <- df[, c("name", "selling_price", "transmission")]
head(selected_data_base)
Indexing single values
-
You can access a specific value using row and column numbers;
-
This is helpful for checking or debugging individual data points.
df[1, 2] # accesses the value in row 1, column 2
Slicing rows
Sometimes you only want to work with the first few rows, or specific rows by position.
Select first 5 rows: Using base R:
first_5_rows_base <- df[1:5, ]
head(first_5_rows_base)
Using dplyr:
first_5_rows_dplyr <- df %>%
slice(1:5)
head(first_5_rows_dplyr)
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain the difference between selecting columns by position and by name in R?
How does the pipe operator in dplyr improve code readability?
Can you show more examples of slicing rows using dplyr?
Awesome!
Completion rate improved to 4
Data Selection - Advanced Techniques
Swipe to show menu
In the previous chapter, we explored how to select single rows and columns using basic indexing. Now, weβll go a step further and learn how to select multiple rows and columns using both base R and the dplyr
package. These techniques are essential when you want to focus on specific parts of a dataset or prepare your data for further analysis.
Selecting multiple columns (Base R)
-
Use the
c()
function to combine multiple column positions or names; -
This allows you to select several columns at once;
-
The result is a smaller data frame with only the specified columns.
Example using column positions:
selected_data_base <- df[, c(1, 2, 3)]
head(selected_data_base)
Example using column names:
selected_data_base <- df[, c("name", "selling_price", "transmission")]
head(selected_data_base)
Indexing single values
-
You can access a specific value using row and column numbers;
-
This is helpful for checking or debugging individual data points.
df[1, 2] # accesses the value in row 1, column 2
Slicing rows
Sometimes you only want to work with the first few rows, or specific rows by position.
Select first 5 rows: Using base R:
first_5_rows_base <- df[1:5, ]
head(first_5_rows_base)
Using dplyr:
first_5_rows_dplyr <- df %>%
slice(1:5)
head(first_5_rows_dplyr)
Thanks for your feedback!