Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Leer Joining and Combining Data Frames | Section
Oefenen
Projecten
Quizzen & Uitdagingen
Quizzen
Uitdagingen
/
Data Wrangling with Tidyverse in R

bookJoining and Combining Data Frames

Veeg om het menu te tonen

Combining data from multiple sources is a common and essential task in data wrangling. The dplyr package provides several join functions that allow you to merge data frames (or tibbles) based on shared columns, known as keys. The most frequently used join types in dplyr are inner_join, left_join, right_join, and full_join:

  • inner_join: returns only the rows that have matching keys in both data frames;
  • left_join: returns all rows from the left data frame and the matching rows from the right data frame; if there is no match, the result will contain NA for columns from the right;
  • right_join: returns all rows from the right data frame and the matching rows from the left data frame; if there is no match, the result will contain NA for columns from the left;
  • full_join: returns all rows from both data frames; if there is no match, the missing side will have NA values.

These join operations are foundational for integrating datasets in a tidy workflow.

123456789101112131415161718
library(dplyr) library(tibble) options(crayon.enabled = FALSE) # Create two example tibbles students <- tibble( student_id = c(1, 2, 3), name = c("Alice", "Bob", "Carol") ) scores <- tibble( student_id = c(1, 2, 4), score = c(88, 92, 75) ) # Perform a left join to add scores to students result <- left_join(students, scores, by = "student_id") print(result)
copy

When performing a join, you must specify the key columns that exist in both data frames. These columns are used to match rows between the data frames. In the previous example, the student_id column is the key. If a row in the left data frame does not have a matching key in the right data frame, the new columns added from the right will contain NA values. This is how dplyr handles mismatched rows, ensuring that no data from the left data frame is lost in a left_join. Understanding how keys work and how unmatched rows are treated helps you control the outcome of merges and maintain data integrity.

question mark

Which of the following best describes the difference between inner_join and left_join in dplyr?

Selecteer het correcte antwoord

Was alles duidelijk?

Hoe kunnen we het verbeteren?

Bedankt voor je feedback!

Sectie 1. Hoofdstuk 6

Vraag AI

expand

Vraag AI

ChatGPT

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 1. Hoofdstuk 6
some-alt