Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Filtering Data - Advanced Conditions | Data Manipulation and Cleaning
Data Analysis with R

bookFiltering Data - Advanced Conditions

In the previous chapter, we learned how to filter data using simple comparisons and logical operators. In this chapter, we’ll build on that by using the %in% operator to match multiple values at once, and learn how to exclude specific rows from a dataset. These techniques are especially useful when dealing with categories that have many possible values.

Filtering with %in%

  • The %in% operator checks if elements of one vector are present in another;

  • It is helpful when you want to match against multiple possible values;

  • This makes filtering cleaner and more readable than chaining multiple == or != conditions.

Example: select cars where fuel type is Diesel or Petrol using base R

selected_fuel_cars <- df[df$fuel %in% c("Diesel", "Petrol"), ]
head(selected_fuel_cars)
count(selected_fuel_cars)

Excluding specific values

Example: exclude cars where fuel is Diesel

Use != to filter out rows that match a certain value.

non_diesel_cars <- df[df$fuel != "Diesel", ]
head(non_diesel_cars)

Example: exclude cars where fuel is Diesel or Petrol

  • Use %in% along with the logical NOT operator ! for cleaner exclusion;

  • This is easier to manage than writing multiple != conditions.

non_diesel_petrol_cars <- df[!df$fuel %in% c("Diesel", "Petrol"), ]
head(non_diesel_petrol_cars)
count(non_diesel_petrol_cars)

You can also use the longer approach with & and multiple != checks, but it becomes harder to manage when dealing with more than two values.

non_diesel_petrol_cars <- df[df$fuel != "Diesel" & df$fuel != "Petrol", ]
question mark

How do you exclude "Diesel" cars in base R?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 7

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Suggested prompts:

Can you explain more about how the %in% operator works in R?

What are some common mistakes to avoid when using %in% for filtering?

Can you show how to use these filtering techniques with dplyr instead of base R?

Awesome!

Completion rate improved to 4

bookFiltering Data - Advanced Conditions

Deslize para mostrar o menu

In the previous chapter, we learned how to filter data using simple comparisons and logical operators. In this chapter, we’ll build on that by using the %in% operator to match multiple values at once, and learn how to exclude specific rows from a dataset. These techniques are especially useful when dealing with categories that have many possible values.

Filtering with %in%

  • The %in% operator checks if elements of one vector are present in another;

  • It is helpful when you want to match against multiple possible values;

  • This makes filtering cleaner and more readable than chaining multiple == or != conditions.

Example: select cars where fuel type is Diesel or Petrol using base R

selected_fuel_cars <- df[df$fuel %in% c("Diesel", "Petrol"), ]
head(selected_fuel_cars)
count(selected_fuel_cars)

Excluding specific values

Example: exclude cars where fuel is Diesel

Use != to filter out rows that match a certain value.

non_diesel_cars <- df[df$fuel != "Diesel", ]
head(non_diesel_cars)

Example: exclude cars where fuel is Diesel or Petrol

  • Use %in% along with the logical NOT operator ! for cleaner exclusion;

  • This is easier to manage than writing multiple != conditions.

non_diesel_petrol_cars <- df[!df$fuel %in% c("Diesel", "Petrol"), ]
head(non_diesel_petrol_cars)
count(non_diesel_petrol_cars)

You can also use the longer approach with & and multiple != checks, but it becomes harder to manage when dealing with more than two values.

non_diesel_petrol_cars <- df[df$fuel != "Diesel" & df$fuel != "Petrol", ]
question mark

How do you exclude "Diesel" cars in base R?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 1. Capítulo 7
some-alt