Kurssisisältö
Extracting Text Meaning using TF-IDF
TF Score
Term Frequency (TF) is a measure that quantifies the importance of a word within a specific sentence or document, relative to the sentence or document's length. In essence, it's a way to highlight how frequently a word appears, adjusted for the size of the text to ensure fairness across texts of different lengths.
TF is calculated using a logarithmic scale to dampen the effect of very high frequencies, which helps maintain a balanced importance across all words. The formula used here is log(1 + (frequency of the word in the sentence) / (total number of words in the sentence))
. This adjustment accounts for the intuition that the significance of a word to a sentence does not increase linearly with its frequency.
For each sentence in our list of tokenized sentences (tokenized_sentences
), we calculate the TF score for every unique word. This is achieved by iterating through each word in a sentence, calculating its frequency relative to the sentence length, and applying the logarithmic formula. The result is a dictionary for each sentence, mapping words to their respective TF scores.
Swipe to start coding
Calculate the term frequency (TF) of each word in each sentence.
Ratkaisu
Kiitos palautteestasi!