Machine Learning / Recall and Precision

Recall and Precision

By Marcelo Fernandes Dez 8, 2017

Recall and Precision


Both Recall and Precision are measurement tools in order to decide how to progress with further training of your model. Your model might require a higher precision or a higher recall.

A quick definition of recall and precision, in a non-mathematical way:

  • Precision: high precision means that an algorithm returned substantially more relevant results than irrelevant ones
  • Recall: high recall means that an algorithm returned most of the relevant results

But, what do you mean by relevance? Let's begin with a quick example:

Suppose you are creating a model to predict whether or not a certain person is sick. In this scenario we can come up with 4 alternatives:

  1. A person is sick, and or model predicted that the person was actually sick (True Positive)
  2. A person is sick, and or model predicted that the person was not sick. (False Negative)
  3. A person is not sick, and the model predicted that the person was not sick(True Negative)
  4. A person is not sick, and the model predicted that the person was sick. (False-Positive)

In this scenario, we want to come up with the most complete result. It means that we want to find out every single sick person, and it does not matter if we miss place some healthy people as sick, as long as we get all the sick people.

This is a scenario where Recall is very important, as we want to bring the most complete result. The recall, in this case, is calculated as:

Identified Sick Patients
--------------------------------
Total Sick Patients


But as we have labed previously in our model, this formula can also be translated to:


True Positives
-------------------------------------------
(True Positives + False Negative)


Note how we are more interested on the Relevant Scope (Sick patients)

Now let's check an example in which we would be more interested on the precision:

Imagine that we are creating a thief detector for an Outlet Store. We want to detect whether or not a person leaving the store had robbed anything. In this case, it is not ok to accuse someone that did not rob anything, we are interested only on detect the real thieves. In this case, we might afford to miss one or another thief.

Comming back to our positives and negatives list:

  1. We have a thief, and or model predicted that the person was actually a thief (True Positive)
  2. We have a thief, and or model predicted that the person was not a thief. (False Negative)
  3. A person is not a thief, and the model predicted that the person was not a thief(True Negative)
  4. A person is not a thief, and the model predicted that the person was a thief (False Positive)

In this case, to calculate the precision, we are interested in answering the question: Out of all thieves, how much did we classified right?

This can be placed as:

Correctly Identified Thieves
------------------------------------------------------
Correctly Identified Thieves + Wrongly Identified Thieves


But as we have labed previously in our model, this formula can also be translated to:


True Positives
-------------------------------------------
(True Positives + False Positive)

Here is a good illustrative picture of what we have been talking about precision and recall


Notes