Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
学ぶ Handling Missing Data | Section
/
Data Wrangling with Tidyverse in R

bookHandling Missing Data

メニューを表示するにはスワイプしてください

Handling missing data is a common challenge in data wrangling with the tidyverse. In R, missing values are represented by the special value NA. These NA values can arise from incomplete data collection, data entry errors, or merging datasets with non-overlapping entries. If not addressed, NA values can disrupt calculations and lead to misleading analysis results. For instance, operations like calculating the mean or sum of a vector containing NA will themselves return NA unless you explicitly handle the missing values. Recognizing and managing these missing values is essential for ensuring the accuracy and reliability of your data analysis.

123456789101112131415161718
options(crayon.enabled = FALSE) library(tidyverse) # Create a tibble with missing values data <- tibble( name = c("Alice", "Bob", "Charlie", "Dana"), score = c(95, NA, 88, NA) ) # Detect missing values using is.na missing_scores <- is.na(data$score) # Replace missing values with a specific value (e.g., 0) using replace_na data_filled <- data %>% mutate(score = replace_na(score, 0)) print(missing_scores) print(data_filled)
copy

When handling missing data in your workflow, you should consider both the source of the missingness and the impact of your chosen strategy. Common approaches include:

  • Removing rows with missing values;
  • Replacing them with a default or imputed value;
  • Leaving them as NA and using functions that can handle missing values appropriately.

The best practice is to investigate why data is missing and to document the approach you use to address it. In some cases, removing missing values may bias your results, especially if the missingness is not random. Replacing missing values with a constant, such as zero or the mean, may also introduce bias or distort the distribution of your data. Always choose a method that aligns with your analysis goals and the nature of your dataset.

question mark

Which of the following statements best describes the implications of different missing data handling strategies in R

正しい答えを選んでください

すべて明確でしたか?

どのように改善できますか?

フィードバックありがとうございます!

セクション 1.  10

AIに質問する

expand

AIに質問する

ChatGPT

何でも質問するか、提案された質問の1つを試してチャットを始めてください

セクション 1.  10
some-alt