#026 CNN Intersection over Union

Intersection over Union

In this post, we will learn about a function called Intersection over union. We will use this to make our object detection algorithm to work even better.

How do we tell if our object detection algorithm is working well?

*An interception is a yellow region and union is the whole blue region (including the interception*)

When doing the object detection our task is to localize the object in the best possible way. Take a look at the picture above we can see that there are two bounding boxes – a red one which is the ground truth bounding box and the purple one which is the output of our algorithm. We can see that they don’t overlap perfectly, so we need to measure how bad ( or how good) is the actual outcome. To do that we will compute the intersection over union.

In the object detection task, our expectation is to localize the object in the best possible way. Let’s have a look at the image above. If the red bounding box is the ground truth bounding box (where the car is in the image) and our algorithm outputs the bounding box in purple, the intersection over union tells us whether we have a good or a bad outcome.

Intersection over union — *(\(IoU \)) represents the intersection over union of these two bounding boxes*

The union of these two bounding boxes is a blue area. That is the area that is contained in both bounding boxes, whereas the intersection of the boxes is a smaller yellow region. The intersection over union computes the size of the intersection and divides it by the size of the union. By convention the bounding box is correct if the \(IoU \) is greater than \(0.5 \). If the bounding box we got and the ground truth bounding boxes overlapped perfectly, the \(IoU \) would be \(1 \) because the intersection would be equal to the union. In general as long as \(IoU \) is greater than or equal to \(0.5 \) then the obtained answer is rather decent. By convention, \(0.5 \) is used as a threshold to determine whether the predicted bounding box is correct or not.

This is just a convention used in practice. In case that we want to be more strict, we can judge an answer as correct only if the \(IoU \) is greater than and equals to \(0.6 \) or some other number. However, the higher the \(IoU \) is, the more accurate the bounding box is. We defined \(IoU \) as a way to evaluate whether or not our object localization algorithm is accurate or not, but more generally \(IoU \) is a measure of the overlap between two bounding boxes, where if we have two boxes, we can compute their intersection and union and then we take the ratio of these two areas. This is also a way of measuring how similar two boxes are to each other. We will see this used in the next post when we talk about Non-max suppression.

Summary

In summary, we defined \(IoU \) as the measure of the overlap between two bounding boxes, were given the two boxes we can compute their intersection and union and then we take the ratio of these two areas. This is also a way of measuring how similar two boxes are to each other. We will see this used in the next post when we talk about Non-max suppression.

#026 CNN Intersection over Union