Step 1: Calculate the term frequency (tf) for a word. The term frequency of a word is equal to the number of times a term (word) appears in a document. So, tf("like") = 34 means that all the documents together contain the word "like" 34 times.
Step 2: Calculate the document frequency (df) for a word. The document frequency of a word is equal to the number of documents containing that word. So, tf("like") = 5 means that there are 5 documents among the dataset that contains the word "like".
Step 3: Put everything into the tf-idf formula.
tf-idf(word) = tf(word) * log(N / df(word)).
So if there are N = 100 documents in total, then for the above example,
I liked the post and read it line by line, but I didn't got the formula , it would be nice if you can explain that in depth.
There are 3 steps.
Step 1: Calculate the term frequency (tf) for a word. The term frequency of a word is equal to the number of times a term (word) appears in a document. So, tf("like") = 34 means that all the documents together contain the word "like" 34 times.
Step 2: Calculate the document frequency (df) for a word. The document frequency of a word is equal to the number of documents containing that word. So, tf("like") = 5 means that there are 5 documents among the dataset that contains the word "like".
Step 3: Put everything into the tf-idf formula.
tf-idf(word) = tf(word) * log(N / df(word)).
So if there are N = 100 documents in total, then for the above example,
tf-idf("like") = 34 * log(100 / 5) = 34 * log(20) ~ 44.23