Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Challenge: Clean Messy Reviews | Advanced Text Cleaning
Data Cleaning Techniques in Python

bookChallenge: Clean Messy Reviews

Task

Swipe to start coding

You are given a list of customer review texts in the variable reviews. The reviews may contain emojis, hashtags, repeated characters, noise words, punctuation, and informal expressions.

Your goal is to create a normalized version of each review using several NLP cleaning steps.

Follow these steps:

  1. Convert each review to lowercase.
  2. Remove emojis, hashtags, and mentions using a regular expression.
  3. Normalize repeated characters: any character repeated 3 or more times should be reduced to a single instance (coooool β†’ cool).
  4. Tokenize each review using nltk.word_tokenize().
  5. Remove stopwords using the provided stopwords list.
  6. Apply stemming to the remaining tokens using PorterStemmer.
  7. Store each cleaned review (joined back with spaces) in a list named cleaned_reviews.

Make sure the variable cleaned_reviews is declared and contains all normalized reviews in the correct order.

Solution

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 3
single

single

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

close

bookChallenge: Clean Messy Reviews

Swipe to show menu

Task

Swipe to start coding

You are given a list of customer review texts in the variable reviews. The reviews may contain emojis, hashtags, repeated characters, noise words, punctuation, and informal expressions.

Your goal is to create a normalized version of each review using several NLP cleaning steps.

Follow these steps:

  1. Convert each review to lowercase.
  2. Remove emojis, hashtags, and mentions using a regular expression.
  3. Normalize repeated characters: any character repeated 3 or more times should be reduced to a single instance (coooool β†’ cool).
  4. Tokenize each review using nltk.word_tokenize().
  5. Remove stopwords using the provided stopwords list.
  6. Apply stemming to the remaining tokens using PorterStemmer.
  7. Store each cleaned review (joined back with spaces) in a list named cleaned_reviews.

Make sure the variable cleaned_reviews is declared and contains all normalized reviews in the correct order.

Solution

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 4. ChapterΒ 3
single

single

some-alt