Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Вивчайте Challenge: Clean Messy Reviews | Advanced Text Cleaning
Quizzes & Challenges
Quizzes
Challenges
/
Data Cleaning Techniques in Python

bookChallenge: Clean Messy Reviews

Завдання

Swipe to start coding

You are given a list of customer review texts in the variable reviews. The reviews may contain emojis, hashtags, repeated characters, noise words, punctuation, and informal expressions.

Your goal is to create a normalized version of each review using several NLP cleaning steps.

Follow these steps:

  1. Convert each review to lowercase.
  2. Remove emojis, hashtags, and mentions using a regular expression.
  3. Normalize repeated characters: any character repeated 3 or more times should be reduced to a single instance (cooooolcool).
  4. Tokenize each review using nltk.word_tokenize().
  5. Remove stopwords using the provided stopwords list.
  6. Apply stemming to the remaining tokens using PorterStemmer.
  7. Store each cleaned review (joined back with spaces) in a list named cleaned_reviews.

Make sure the variable cleaned_reviews is declared and contains all normalized reviews in the correct order.

Рішення

Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 4. Розділ 3
single

single

Запитати АІ

expand

Запитати АІ

ChatGPT

Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат

Suggested prompts:

Can you explain this in simpler terms?

What are some examples related to this topic?

Where can I learn more about this?

close

bookChallenge: Clean Messy Reviews

Свайпніть щоб показати меню

Завдання

Swipe to start coding

You are given a list of customer review texts in the variable reviews. The reviews may contain emojis, hashtags, repeated characters, noise words, punctuation, and informal expressions.

Your goal is to create a normalized version of each review using several NLP cleaning steps.

Follow these steps:

  1. Convert each review to lowercase.
  2. Remove emojis, hashtags, and mentions using a regular expression.
  3. Normalize repeated characters: any character repeated 3 or more times should be reduced to a single instance (cooooolcool).
  4. Tokenize each review using nltk.word_tokenize().
  5. Remove stopwords using the provided stopwords list.
  6. Apply stemming to the remaining tokens using PorterStemmer.
  7. Store each cleaned review (joined back with spaces) in a list named cleaned_reviews.

Make sure the variable cleaned_reviews is declared and contains all normalized reviews in the correct order.

Рішення

Switch to desktopПерейдіть на комп'ютер для реальної практикиПродовжуйте з того місця, де ви зупинились, використовуючи один з наведених нижче варіантів
Все було зрозуміло?

Як ми можемо покращити це?

Дякуємо за ваш відгук!

Секція 4. Розділ 3
single

single

some-alt