Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Challenge: Clean Messy Reviews | Advanced Text Cleaning
Data Cleaning Techniques in Python

bookChallenge: Clean Messy Reviews

Tehtävä

Swipe to start coding

You are given a list of customer review texts in the variable reviews. The reviews may contain emojis, hashtags, repeated characters, noise words, punctuation, and informal expressions.

Your goal is to create a normalized version of each review using several NLP cleaning steps.

Follow these steps:

  1. Convert each review to lowercase.
  2. Remove emojis, hashtags, and mentions using a regular expression.
  3. Normalize repeated characters: any character repeated 3 or more times should be reduced to a single instance (cooooolcool).
  4. Tokenize each review using nltk.word_tokenize().
  5. Remove stopwords using the provided stopwords list.
  6. Apply stemming to the remaining tokens using PorterStemmer.
  7. Store each cleaned review (joined back with spaces) in a list named cleaned_reviews.

Make sure the variable cleaned_reviews is declared and contains all normalized reviews in the correct order.

Ratkaisu

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 4. Luku 3
single

single

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Suggested prompts:

Can you explain this in simpler terms?

What are some examples related to this topic?

Where can I learn more about this?

close

bookChallenge: Clean Messy Reviews

Pyyhkäise näyttääksesi valikon

Tehtävä

Swipe to start coding

You are given a list of customer review texts in the variable reviews. The reviews may contain emojis, hashtags, repeated characters, noise words, punctuation, and informal expressions.

Your goal is to create a normalized version of each review using several NLP cleaning steps.

Follow these steps:

  1. Convert each review to lowercase.
  2. Remove emojis, hashtags, and mentions using a regular expression.
  3. Normalize repeated characters: any character repeated 3 or more times should be reduced to a single instance (cooooolcool).
  4. Tokenize each review using nltk.word_tokenize().
  5. Remove stopwords using the provided stopwords list.
  6. Apply stemming to the remaining tokens using PorterStemmer.
  7. Store each cleaned review (joined back with spaces) in a list named cleaned_reviews.

Make sure the variable cleaned_reviews is declared and contains all normalized reviews in the correct order.

Ratkaisu

Switch to desktopVaihda työpöytään todellista harjoitusta vartenJatka siitä, missä olet käyttämällä jotakin alla olevista vaihtoehdoista
Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 4. Luku 3
single

single

some-alt