Challenge: Bag of Words
Task
Swipe to start coding
You have a text corpus stored in corpus
variable. Your task is to display the vector for the 'graphic design' bigram in a BoW model. To do this:
- Import the
CountVectorizer
class to create a BoW model. - Instantiate the
CountVectorizer
class ascount_vectorizer
, configuring it for a frequency-based model that includes both unigrams and bigrams. - Use the appropriate method of
count_vectorizer
to generate a BoW matrix from the'Document'
column in thecorpus
and store the result inbow_matrix
. - Convert
bow_matrix
to a dense array and create aDataFrame
from it, setting the unique features (unigrams and bigrams) as its columns. Store the result in thebow_df
variable. - Display the vector for
'graphic design'
bigram as an array.
Solution
Everything was clear?
Thanks for your feedback!