What new year resolutions is more trendy now…

Jan 12

Well, another year passed by, and it is time to reflect upon what we had set out to do before, and what we have achieved in the last year.

Read →

2 Comments

Yograj thakur

May 29Edited

I liked the post and read it line by line, but I didn't got the formula , it would be nice if you can explain that in depth.

Expand full comment

Reply (1)

Subhrajyoty Roy

Jun 1

There are 3 steps.

Step 1: Calculate the term frequency (tf) for a word. The term frequency of a word is equal to the number of times a term (word) appears in a document. So, tf("like") = 34 means that all the documents together contain the word "like" 34 times.

Step 2: Calculate the document frequency (df) for a word. The document frequency of a word is equal to the number of documents containing that word. So, tf("like") = 5 means that there are 5 documents among the dataset that contains the word "like".

Step 3: Put everything into the tf-idf formula.

tf-idf(word) = tf(word) * log(N / df(word)).

So if there are N = 100 documents in total, then for the above example,

tf-idf("like") = 34 * log(100 / 5) = 34 * log(20) ~ 44.23

Expand full comment