This problem happened because the product (p of a test word “j” in class c) was zero for both the categories and this in turn was zero because a few words in the given test example (highlighted in orange) NEVER EVER appeared in our training dataset and hence their probability was zero!.Clearly, they have caused all the destruction!.So does this imply that whenever a word that appears in the test example but never ever occurred in the training dataset it will always cause such destruction?.And in such case our trained model will never be able to predict the correct sentiment?.Will it just randomly pick a positive or negative category since both have the same zero probability and predict wrongly?.The answer is NO!.This is where the second method (numbered 2) comes into play..Method 2 is actually used to deduce p(i belonging to class c)..But before we move on to method number 2, we should first get familiar with its mathematical brainy stuff!.After adding pseudo-counts of 1s, the probability p of a test word that never appeared in the training dataset will not default to zero and therefore, the numerical value of the term product (p of a test word “j” in class c) will also not end up as a zero, which in turn p (i belonging to class c) will not be zero..So all is well and there is no more destruction by zero probabilities!.The numerator term of method number 2 will have an added, 1 as we have added a 1 for every word in the vocabulary and so it becomes: Total count of word “j” in class c = count of word “j” in class c + 1 Similarly, the denominator becomes: Total counts of words in class c = count of words in class c + |V| + 1 And so the complete formula becomes: Total count of word “j” in class c = 0 + 1 So any unknown word in the test set will have the following probability: Why add 1 to the denominator?. More details