Challenge: Tokenization with Regex
Task
Swipe to start coding
You are given a message in message
variable. You have to tokenize it into words using regex. To do this:
- Import necessary class.
- Convert
message
to lowercase and save inmessage_lower
. - Create a Regexp Tokenizer with correct pattern and save it in
word_tokenizer
. - Tokenize
message_lower
into words usingword_tokenizer
.
A word is a sequence of alphanumeric characters and underscores. '#NLPConference_20!'
, for example, contains one word: NLPConference_20
.
Solution
Everything was clear?
Thanks for your feedback!