Snippets Machine Learning / Naive Bayes

Naive Bayes

By Marcelo Fernandes Jul 15, 2017

Naive Bayes

Let's implement a naive bayes algorithm from scratch, and in order to do this, first let's dive into Bayes theorem:

Bayes Theorem implementation from scratch:

In a more understandable way, the Bayes theorem calculates the probability of an event occurring, based on certain other probabilites that are related to the event in question. It is composed of a prior (the probabilities that we are aware of or that is given to use) and the posterior (the probabilities we are looking to compute using the priors).
Let's implement a simple example: Let's say that we are trying to find the odds of an individual being an expert poker player . In the poker game, those probabilities play a very important role as it usually deals with good amounts of money.
Let's assume that:

P(Exp) is the probability of an being an poker expert. Its value is 0.01. Or in other words, 1% of the poker players are experts. (This is just an example, not necessarily true)

P(Win) is the probability of winning a hand.

P(Los) is the probability of loosing a hand

P(Win|Exp) is the probability of winning a hand given that a player is an expert poker player. Let's say this value is 0.6

P(Los|~Exp) is the probability of loosing a hand given that a player is not an expert poker player Let's say that his value is 0.9.

The Bayes formula is as follows:

  • P(A) is the prior probability of A occuring independantly. In our example this is P(Exp). This value is given to us.
  • P(B) is the prior probability of B occuring independantly. In our example this is P(Win).
  • P(A|B) is the posterior probability that A occurs given B. In our example this is P(Exp|Win). That is, the probability of an individual being an expert if he/she won the hand. This is the value that we are looking to calculate.
  • P(B|A) is the likelihood probability of B occuring, given A. In our example this is P(Win|Exp). This value is given to us.

Using the bayes formula we have that:
P(Exp|Win) = P(Win|Exp).P(Exp) / P(Win)

The probability of winning P(Win) can be calculated using the sensitivity and specificity as follows: P(Win) = [P(Exp) * Sensitivity] + [P(~Exp) * (1-Specificity)]

Where:

Sensitivity = P(Win|Exp)
Specificity = P(Los|~Exp)

""" Solution """

# P(Exp)
p_expert = 0.01

# P(~Exp)
p_no_expert = 0.99

# Sensitivity or P(Win|Exp)
p_win_exp = 0.6

# Specificity or P(Los|~Exp)
p_los_no_exp = 0.9

# P(Win)
p_win = (p_expert * p_win_exp) + (p_no_expert * (1 - p_los_no_exp))

# p_win = 0.1049999999

# P(Exp|Win) = P(Win|Exp).P(Exp) / P(Win)
p_exp_win = p_win_exp * p_expert / p_win

# p_exp_win = 0.05714

Naive Bayes

After that, it's pretty easy to extend for naive bayes theorem, which means that we can have more than one features, even though they are gonna be treated as mutually independent.

Notes


References

Link 1