Data Selection - Advanced Techniques
In the previous chapter, we explored how to select single rows and columns using basic indexing. Now, we’ll go a step further and learn how to select multiple rows and columns using both base R and the dplyr
package. These techniques are essential when you want to focus on specific parts of a dataset or prepare your data for further analysis.
Selecting multiple columns (Base R)
-
Use the
c()
function to combine multiple column positions or names; -
This allows you to select several columns at once;
-
The result is a smaller data frame with only the specified columns.
Example using column positions:
selected_data_base <- df[, c(1, 2, 3)]
head(selected_data_base)
Example using column names:
selected_data_base <- df[, c("name", "selling_price", "transmission")]
head(selected_data_base)
Indexing single values
-
You can access a specific value using row and column numbers;
-
This is helpful for checking or debugging individual data points.
df[1, 2] # accesses the value in row 1, column 2
Slicing rows
Sometimes you only want to work with the first few rows, or specific rows by position.
Select first 5 rows: Using base R:
first_5_rows_base <- df[1:5, ]
head(first_5_rows_base)
Using dplyr:
first_5_rows_dplyr <- df %>%
slice(1:5)
head(first_5_rows_dplyr)
Tak for dine kommentarer!
Spørg AI
Spørg AI
Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat
Awesome!
Completion rate improved to 4
Data Selection - Advanced Techniques
Stryg for at vise menuen
In the previous chapter, we explored how to select single rows and columns using basic indexing. Now, we’ll go a step further and learn how to select multiple rows and columns using both base R and the dplyr
package. These techniques are essential when you want to focus on specific parts of a dataset or prepare your data for further analysis.
Selecting multiple columns (Base R)
-
Use the
c()
function to combine multiple column positions or names; -
This allows you to select several columns at once;
-
The result is a smaller data frame with only the specified columns.
Example using column positions:
selected_data_base <- df[, c(1, 2, 3)]
head(selected_data_base)
Example using column names:
selected_data_base <- df[, c("name", "selling_price", "transmission")]
head(selected_data_base)
Indexing single values
-
You can access a specific value using row and column numbers;
-
This is helpful for checking or debugging individual data points.
df[1, 2] # accesses the value in row 1, column 2
Slicing rows
Sometimes you only want to work with the first few rows, or specific rows by position.
Select first 5 rows: Using base R:
first_5_rows_base <- df[1:5, ]
head(first_5_rows_base)
Using dplyr:
first_5_rows_dplyr <- df %>%
slice(1:5)
head(first_5_rows_dplyr)
Tak for dine kommentarer!