10

Here is my understanding of the difference between a Bernoulli and a Multinomial Naive Bayes:

Bernoulli explicitly models the presence/absence of a feature, whereas Multinomial doesn't.

  • Is there something that I am missing?
  • Can someone explain why this difference matters intuitively? Perhaps using an example where you would obviously use one instead of the other

On the wikipedia page, they say:

Note that a naive Bayes classifier with a Bernoulli event model is not the same as a multinomial NB classifier with frequency counts truncated to one.

Why is that?

Valentin Calomme
  • 6,256
  • 3
  • 23
  • 54

1 Answers1

7

Bernoulli models the presence/absence of a feature. Multinomial models the number of counts of a feature. Here's a concise explanation.

Wikipedia warns that

Note that a naive Bayes classifier with a Bernoulli event model is not the same as a multinomial NB classifier with frequency counts truncated to one.

To understand why, we should note that, as this page nicely explains,

Whereas the binomial distribution generalises the Bernoulli distribution across the number of trials, the multinoulli distribution generalises it across the number of outcomes, that is, rolling a dice instead of tossing a coin.

What does this imply for us? Multinomial NB cares about counts for multiple features that do occur, whereas Bernoulli NB cares about counts for a single feature that do occur and counts for the same feature that do not occur.

This means that, for example, Multinomial NB will classify a document based on the counts it finds of multiple keywords; whereas Bernoulli NB can only focus on a single keyword, but will also count how many times that keyword does not occur in the document.

So they do model slightly different things. If you have discrete multiple features to worry about, you have to use Multinomial NB. But if you only have a single feature to worry about, then you can make a modelling choice based on the above.

A. G.
  • 271
  • 1
  • 3