Challenge: Clean Messy Reviews
Swipe to start coding
You are given a list of customer review texts in the variable reviews.
The reviews may contain emojis, hashtags, repeated characters, noise words, punctuation, and informal expressions.
Your goal is to create a normalized version of each review using several NLP cleaning steps.
Follow these steps:
- Convert each review to lowercase.
- Remove emojis, hashtags, and mentions using a regular expression.
- Normalize repeated characters: any character repeated 3 or more times should be reduced to a single instance (
coooool→cool). - Tokenize each review using
nltk.word_tokenize(). - Remove stopwords using the provided
stopwordslist. - Apply stemming to the remaining tokens using
PorterStemmer. - Store each cleaned review (joined back with spaces) in a list named
cleaned_reviews.
Make sure the variable cleaned_reviews is declared and contains all normalized reviews in the correct order.
Lösning
Tack för dina kommentarer!
single
Fråga AI
Fråga AI
Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal
Can you explain this in simpler terms?
What are some examples related to this topic?
Where can I learn more about this?
Fantastiskt!
Completion betyg förbättrat till 8.33
Challenge: Clean Messy Reviews
Svep för att visa menyn
Swipe to start coding
You are given a list of customer review texts in the variable reviews.
The reviews may contain emojis, hashtags, repeated characters, noise words, punctuation, and informal expressions.
Your goal is to create a normalized version of each review using several NLP cleaning steps.
Follow these steps:
- Convert each review to lowercase.
- Remove emojis, hashtags, and mentions using a regular expression.
- Normalize repeated characters: any character repeated 3 or more times should be reduced to a single instance (
coooool→cool). - Tokenize each review using
nltk.word_tokenize(). - Remove stopwords using the provided
stopwordslist. - Apply stemming to the remaining tokens using
PorterStemmer. - Store each cleaned review (joined back with spaces) in a list named
cleaned_reviews.
Make sure the variable cleaned_reviews is declared and contains all normalized reviews in the correct order.
Lösning
Tack för dina kommentarer!
single