#027 CNN Non-Max Suppression algorithm
Non-Max Suppression
In this post, we will learn how the non-max suppression algorithm allows us to overcome multiple detections of the same object in an image. Let’s go through an example!

Let’s say we want to detect pedestrians, cars, and motorcycles in this image.

If we look at the picture above we can see that there are two cars. Each of these two cars has one midpoint so it should be assigned to just one grid cell which then actually predicts that there is a car in the picture. In practice, we’re running an object classification and localization algorithm for every one of these grid cells, so it’s quite possible that different cells might think that the center of a car is in many different cells.
Non-max \enspace supperesion cleans up these multiple bounding boxes
Let’s see an example of how Non-Max\enspace suppression works. Since we are running the image classification and localization algorithm on every grid cell, it is possible that many of them will be with a large probability p_c, that there is an object in that cell. When we run the algorithm we would end up with multiple detections of each object. What the Non-Max\enspace suppression does is it bends together these detections so that we end up with just one detection per car.

More specifically, this algorithm will first search for the probability associated with each of these detections, so it looks at p_c values first, and then it takes the largest one. In the picture above, there is a rectangle associated with 0.9 (the car on the right) and this means that an algorithm has detected a car there. After this, the Non-Max\enspace suppression looks at other rectangles that are close to the first one and the ones with the highest overlap with this one (highest IoU) will be suppressed. As an example, in the picture above we can see those two rectangles with the IoU 0.6 and the 0.7 are going to be suppressed.
More specifically, it first searches for the probability associated with each of these detections (it looks at p_c values) and first, it takes the largest one. In the picture above, there is a rectangle associated with 0.9 (the car on the right). This means that an algorithm detected a car there. After this, the Non-Max\enspace suppression looks at other rectangles that are close and the ones with a high overlap with this one (high IoU) will be suppressed. As an example, we can see those two rectangles with the IoU 0.6 and the 0.7 are going to be suppressed.
Similarly, the same procedure can be applied for the car on the left, we go through the remaining rectangles and find the one with the highest probability, the highest p_c value. So for the car on the left, it will be the box with the value p_{c} =0.8. Next, a task of Non-Max\enspace suppression is to discard the remaining detections having high IoU with this cell. In this case, we have two bounding boxes with lower p_c that recognized the car on the left (two red rectangles in the left in the above picture) and because they both have big IoU both of them will be discarded.
Similarly, for the car on the left, we go through the remaining rectangles and find the one with the highest probability, the highest p_c value. Here, it is the box with p_c=0.8. Next, a task of Non-Max\enspace suppression is to discard the remaining detections having high IoU with this cell. Here, we have two bounding boxes with lower p_c that recognized the car on the left (two red rectangles in the left in the above picture) and because they both have big IoU both of them will be discarded.
Applying the Non-Max\enspace suppression means that we’re going to output our maximal classifications probabilities, and suppress the other ones (the other detections of the same object) with lower values. That’s why we use the name Non-Max\enspace suppression .
First, within this 19\times19 grid we’re going to get a 19\times19\times8 output volume. For each of these 361 cells we output the vector:
\begin{bmatrix} p_{c}\\b_{x}\\b_{y}\\b_{h}\\b_{w}\\c_{1}\\c_{2}\\c_{3} \end{bmatrix}
For this example, we will be detecting only cars. So, for every grid cell, we will obtain the following vector:
\begin{bmatrix} p_{c}\\b_{x}\\b_{y}\\b_{h}\\b_{w} \end{bmatrix}
Let’s see how Non-Max suppression works for this example. It will first discard all the boxes with p_c values less than some predefined threshold. We can use for instance a value of 0.6 . For every grid cell we output a bounding box together with a probability of detecting a car within bounding box.
Next, while there are any remaining bounding boxes that we have not yet discarded or processed, we are going to repeatedly select the box with the highest p_c . That will be selected as our output prediction. Next, we will discard any remaining box with high overlap ( IoU ) with the box that we have just outputted in the previous step. We keep doing this while there are still any remaining boxes that we’ve not yet process until we have taken each of the boxes and either output it as a prediction, or discarded it.
In case that we want to detect more objects, for instance: pedestrians, cars, and motorcycles, the output vector will have three p_c components. It has been shown in practice that we can run Non-Max\enspace suppression three times independently, one for every output class.
This completes the Non-Max suppression. In practice, it obtains rather decent results. However, to complete the idea of the YOLO algorithm we need to introduce one additional idea. It is the idea of Anchor boxes, and we will explain it in the next post.