Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Tokenize Using Regex | Text Preprocessing Fundamentals
Introduction to NLP
course content

Course Content

Introduction to NLP

Introduction to NLP

1. Text Preprocessing Fundamentals
2. Stemming and Lemmatization
3. Basic Text Models
4. Word Embeddings

Tokenize Using Regex

Task

Given a string named message, convert it lowercase, then tokenize it into words using regular expression tokenization and the corresponding nltk class. A word is a sequence of only alphanumeric characters (letters and numbers). '#Conference2023!', for example, contains one word: Conference2023.

Task

Given a string named message, convert it lowercase, then tokenize it into words using regular expression tokenization and the corresponding nltk class. A word is a sequence of only alphanumeric characters (letters and numbers). '#Conference2023!', for example, contains one word: Conference2023.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 1. Chapter 6
toggle bottom row

Tokenize Using Regex

Task

Given a string named message, convert it lowercase, then tokenize it into words using regular expression tokenization and the corresponding nltk class. A word is a sequence of only alphanumeric characters (letters and numbers). '#Conference2023!', for example, contains one word: Conference2023.

Task

Given a string named message, convert it lowercase, then tokenize it into words using regular expression tokenization and the corresponding nltk class. A word is a sequence of only alphanumeric characters (letters and numbers). '#Conference2023!', for example, contains one word: Conference2023.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Section 1. Chapter 6
toggle bottom row

Tokenize Using Regex

Task

Given a string named message, convert it lowercase, then tokenize it into words using regular expression tokenization and the corresponding nltk class. A word is a sequence of only alphanumeric characters (letters and numbers). '#Conference2023!', for example, contains one word: Conference2023.

Task

Given a string named message, convert it lowercase, then tokenize it into words using regular expression tokenization and the corresponding nltk class. A word is a sequence of only alphanumeric characters (letters and numbers). '#Conference2023!', for example, contains one word: Conference2023.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Task

Given a string named message, convert it lowercase, then tokenize it into words using regular expression tokenization and the corresponding nltk class. A word is a sequence of only alphanumeric characters (letters and numbers). '#Conference2023!', for example, contains one word: Conference2023.

Switch to desktop for real-world practiceContinue from where you are using one of the options below
Section 1. Chapter 6
Switch to desktop for real-world practiceContinue from where you are using one of the options below
We're sorry to hear that something went wrong. What happened?
some-alt