Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Regular Expressions Basics | String Manipulation and Cleaning
Practice
Projects
Quizzes & Challenges
Quizzes
Challenges
/
Working with Text, Dates, and Files in R

bookRegular Expressions Basics

Regular expressions, often called regex, are a powerful tool for searching, matching, and manipulating patterns in text data. They allow you to define complex search criteria using a concise syntax, making them especially useful when you need to clean or analyze large amounts of text. With regex, you can quickly find email addresses, phone numbers, or any other patterns in strings, which is essential for data cleaning and preparation.

1234
emails <- c("alice@gmail.com", "bob@yahoo.com", "carol@gmail.com", "dave@hotmail.com") has_gmail <- grepl("gmail.com", emails) print(has_gmail) # Output: TRUE FALSE TRUE FALSE
copy

In the code above, the grepl() function searches for the pattern "gmail.com" within each element of the emails vector. The result is a logical vector indicating whether each email address contains the pattern. This is useful for filtering or subsetting data based on text content.

1234
text <- "Order 1234 was placed by user 5678." anonymized <- gsub("[0-9]", "#", text) print(anonymized) # Output: "Order #### was placed by user ####."
copy

Here, gsub() replaces every digit ([0-9]) in the string with the "#" character. The first argument is the pattern to search for, and the second is the replacement. This technique is helpful for anonymizing sensitive numeric data, such as IDs or phone numbers, in text.

Note
Definition

Definition: A regular expression is a sequence of characters that defines a search pattern, often used for string searching and manipulation.

Some common regex patterns you might use include:

  • Email addresses: [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,};
  • Phone numbers: \(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4};
  • Whitespace: \s+.

These patterns help you identify and work with specific types of text data efficiently.

1. What is the purpose of grepl() in R?

2. How does gsub() differ from grepl()?

3. Fill in the blank: To replace all spaces in a string with underscores, use gsub("__", "", my_string).

question mark

What is the purpose of grepl() in R?

Select the correct answer

question mark

How does gsub() differ from grepl()?

Select the correct answer

question-icon

Fill in the blank: To replace all spaces in a string with underscores, use gsub("__", "", my_string).

"_", my_string)

Натисніть або перетягніть елементи та заповніть пропуски

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 3

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

bookRegular Expressions Basics

Свайпніть щоб показати меню

Regular expressions, often called regex, are a powerful tool for searching, matching, and manipulating patterns in text data. They allow you to define complex search criteria using a concise syntax, making them especially useful when you need to clean or analyze large amounts of text. With regex, you can quickly find email addresses, phone numbers, or any other patterns in strings, which is essential for data cleaning and preparation.

1234
emails <- c("alice@gmail.com", "bob@yahoo.com", "carol@gmail.com", "dave@hotmail.com") has_gmail <- grepl("gmail.com", emails) print(has_gmail) # Output: TRUE FALSE TRUE FALSE
copy

In the code above, the grepl() function searches for the pattern "gmail.com" within each element of the emails vector. The result is a logical vector indicating whether each email address contains the pattern. This is useful for filtering or subsetting data based on text content.

1234
text <- "Order 1234 was placed by user 5678." anonymized <- gsub("[0-9]", "#", text) print(anonymized) # Output: "Order #### was placed by user ####."
copy

Here, gsub() replaces every digit ([0-9]) in the string with the "#" character. The first argument is the pattern to search for, and the second is the replacement. This technique is helpful for anonymizing sensitive numeric data, such as IDs or phone numbers, in text.

Note
Definition

Definition: A regular expression is a sequence of characters that defines a search pattern, often used for string searching and manipulation.

Some common regex patterns you might use include:

  • Email addresses: [A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,};
  • Phone numbers: \(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4};
  • Whitespace: \s+.

These patterns help you identify and work with specific types of text data efficiently.

1. What is the purpose of grepl() in R?

2. How does gsub() differ from grepl()?

3. Fill in the blank: To replace all spaces in a string with underscores, use gsub("__", "", my_string).

question mark

What is the purpose of grepl() in R?

Select the correct answer

question mark

How does gsub() differ from grepl()?

Select the correct answer

question-icon

Fill in the blank: To replace all spaces in a string with underscores, use gsub("__", "", my_string).

"_", my_string)

Натисніть або перетягніть елементи та заповніть пропуски

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 1. Розділ 3
some-alt