Challenge: TF-IDF
Uppgift
Swipe to start coding
You have a text corpus stored in corpus
variable. Your task is to display the vector for the 'medical' unigram in a TF-IDF model with unigrams, bigrams, and trigrams. To do this:
- Import the
TfidfVectorizer
class to create a TF-IDF model. - Instantiate the
TfidfVectorizer
class astfidf_vectorizer
and configure it to include unigrams, bigrams, and trigrams. - Use the appropriate method of
tfidf_vectorizer
to generate a TF-IDF matrix from the'Document'
column in thecorpus
and store the result intfidf_matrix
. - Convert
tfidf_matrix
to a dense array and create aDataFrame
from it, setting the unique features (terms) as its columns. Store the result in thetfidf_matrix_df
variable. - Display the vector for
'medical'
as an array.
Lösning
Var allt tydligt?
Tack för dina kommentarer!