#015 Template matching using OpenCV in Python

#015 Template matching using OpenCV in Python

Highlights: In this post, we’re going to talk about template matching which is probably the simplest form of object detection. Template matching is a technique in digital image processing that identifies the parts of an image that match a predefined template. It has various applications and is used in such fields as face and speech recognition, automation, and motion estimation. So, let’s begin with our post and see how template matching works in Python.

Tutorial Overview:

  1. How does template matching work?
  2. Template matching in OpenCV with Python

1. How does template matching work?

Let’s have a look at the following example. In this GIF animation, we can see a photo of Lionel Messi. This is our input image. We also have the template image which is a cropped part of the input image. Now, we are simply going to scan a larger image with this template by sliding it across all possible positions. Then, we will compare a template against overlapped image regions in the larger image, until we find a match. 

Template matching

So, how are we going to make that comparison? Are we going to compare pixel by pixel or do we need to use another method? OpenCV has provided several different template matching methods. Here we can see the formulas that OpenCV has calculated for each available method. Note that \(I \) denotes an image, \(T \) template image, and \(R \) result.

SqDiff – Squared difference

$$ R(x, y)=\sum_{x^{\prime}, y^{\prime}}\left(T\left(x^{\prime}, y^{\prime}\right)-I\left(x+x^{\prime}, y+y^{\prime}\right)\right)^{2} $$

SqDiffNormed – Normalized squared difference

$$ R(x, y)=\frac{\sum_{x^{\prime}, y^{\prime}}\left(T\left(x^{\prime}, y^{\prime}\right)-I\left(x+x^{\prime}, y+y^{\prime}\right)\right)^{2}}{\sqrt{\sum_{x^{\prime}, y^{\prime}} T\left(x^{\prime}, y^{\prime}\right)^{2} \cdot \sum_{x^{\prime}, y^{\prime}} I\left(x+x^{\prime}, y+y^{\prime}\right)^{2}}} $$

CCorr – Cross correlation

$$ R(x, y)=\sum_{x^{\prime}, y^{\prime}}\left(T\left(x^{\prime}, y^{\prime}\right) \cdot I\left(x+x^{\prime}, y+y^{\prime}\right)\right) $$

CCorrNormed – Normalized cross correlation

$$ R(x, y)=\frac{\sum_{x^{\prime}, y^{\prime}}\left(T\left(x^{\prime}, y^{\prime}\right)-I\left(x+x^{\prime}, y+y^{\prime}\right)\right)^{2}}{\sqrt{\sum_{x^{\prime}, y^{\prime}} T\left(x^{\prime}, y^{\prime}\right)^{2} \cdot \sum_{x^{\prime}, y^{\prime}} I\left(x+x^{\prime}, y+y^{\prime}\right)^{2}}} $$

CCoeff – Cosine coefficient

$$ R(x, y)=\sum_{x^{\prime}, y^{\prime}}\left(T^{\prime}\left(x^{\prime}, y^{\prime}\right) \cdot I^{\prime}\left(x+x^{\prime}, y+y^{\prime}\right)\right) $$

Where:

$$ T^{\prime}\left(x^{\prime}, y^{\prime}\right)=T\left(x^{\prime}, y^{\prime}\right)-1 /(w \cdot h) \cdot \sum_{x^{\prime \prime}, y^{\prime \prime}} T\left(x^{\prime \prime}, y^{\prime \prime}\right) $$

$$ I^{\prime}\left(x+x^{\prime}, y+y^{\prime}\right)=I\left(x+x^{\prime}, y+y^{\prime}\right)-1 /(w \cdot h) \cdot \sum_{x^{\prime \prime}, y^{\prime \prime}} I\left(x+x^{\prime \prime}, y+y^{\prime \prime}\right) $$

CCoeffNormed – Normalized cosine coefficient

$$ R(x, y)=\frac{\sum_{x^{\prime}, y^{\prime}}\left(T^{\prime}\left(x^{\prime}, y^{\prime}\right) \cdot I^{\prime}\left(x+x^{\prime}, y+y^{\prime}\right)\right)}{\sqrt{\sum_{x^{\prime}, y^{\prime}} T^{\prime}\left(x^{\prime}, y^{\prime}\right)^{2} \cdot \sum_{x^{\prime} y^{\prime}} I^{\prime}\left(x+x^{\prime}, y+y^{\prime}\right)^{2}}} $$

When we slide the template image across the input image, a metric is calculated at every pixel location. This metric represents how similar the template is to that particular area of the input image. For each location of \(T \) in the image \(I \) we store the metric in the result matrix \(R \). Each location \((x,y) \) in \(R \) contains the match metric. Note that this process is very similar to the convolution where the output of our image will shrink.

In the following image, we can see the map of the comparison results. The brightest locations indicate the highest matches. As you can see, the location marked by the yellow circle is probably the one with the highest value, so that location will be considered as the best match candidate for our template. This yellow circle will represent the top left corner of the rectangle, where we assume the optimal match candidate will be located. The width and height of the rectangle are equal to the template image.

Template matching OpenCV

Now, let’s see how each of these methods works in Python.

2. Template matching in OpenCV with Python

First, we are going to import the necessary libraries and load the input image and the template image. We will also correct the color order because we will plot these images with matplotlib.

import cv2
import numpy as np
from matplotlib import pyplot as plt

from google.colab.patches import cv2_imshow
template = cv2.imread("Picture9.jpg")
template = cv2.cvtColor(template, cv2.COLOR_BGR2RGB)
img = cv2.imread("Picture10.jpg")
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img)
Template matching
plt.imshow(template)
Template matching

So, as you can see the input image is Leo Messi and the template image is just Messi’s face. Now, if we check out dimensions of a template image we can see that they are (269, 191, 3). 

template.shape
(269, 191, 3)

Note that if draw a rectangle of size (269, 191) around Messi’s face in the input image, that area will have the exact size and shape of a template image.  

Now, one more note before we proceed. In our code we’re going to use the list of strings which are the names of different template matching methods. 

methods =["cv2.TM_CCOEFF" , "cv2.TM_CCOEFF_NORMED", "cv2.TM_CCORR" , "cv2.TM_CCORR_NORMED" , "cv2.TM_SQDIFF" , "cv2.TM_SQDIFF_NORMED"]

However, we need to evaluate each of these strings as if it is an OpenCV function. To do that we are going to use the function eval(). In that way we can directly transform a string that matches each of these built-in template matching functions. It will be more convenient for us to write our code in that way, instead of calling each function manually. Now, let’s move on. 

The next step is to create a for loop that goes through each of these methods. We will also create a copy of the input image. Then, using the eval() function, we will loop through all string methods, and transform that string we are going to use to the actual OpenCV function.

Now we are ready to perform the template matching. For this, we will use the function cv2.matchTemplate() that consists of the following parameters.

  • Input image – An image where the search is running. Note that it must be an 8-bit integer or 32-bit floating-point.
  • Template – Searched template image. It must be smaller or equal to the input image and it must have the same data type.
  • Output – The resulting map of comparison results. It is a single-channel 32-bit floating-point. If the input image dimensions are \(x\times y \) pixels and template dimensions are \(m\times n \) pixels, then the result is \((x-m+1)\times (y-n+1) \).
  • Method –  Parameter that specifies the comparison method
  • Mask – Mask of the searched template. It must have the same datatype and size as the template. It is not set by default.
for m in methods:
  img_copy = img.copy()
  method = eval(m)
  res = cv2.matchTemplate(img_copy,template,method)

Now, we need to find the maximum and minimum values of the resulting map, as well as the minimum and maximum value locations. For that, we will use the function cv2.minMaxLoc() and as a parameter, we will just pass our image that is the output of our template matching method. This function takes in the map, finds the minimum and the maximum locations, and returns them back as a tuple. Then, we can unpack that tuple, to find the minimum value, the maximum value, as well as the minimum and maximum locations. 

min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)

One important thing to note is that two methods that use squared differences, SqDiff and SqDiffNormed, are going to be slightly different because the location with the minimum value will be considered the match. On the other hand, for other methods, the match will be located where the function finds the maximum value.

To fix this problem, we need to create a loop that will check if a given method considers location with the minimum value as a match, or location with the maximum value. So we will say that if we use a SqDiff or SqDiffNormed method the top left corner of the rectangle is equal to the minimum location. For all other methods will say that the top left corner of the rectangle is equal to the maximum location. 

if method in [cv2.TM_SQDIFF, cv2.TM_SQDIFF_NORMED]:
    top_left = min_loc
  else:
    top_left = max_loc
  print(top_left) 

So, now we know the location of the top left corner of our rectangle. Next, we need to find the bottom right corner of the rectangle. We already know that width and height of the rectangle are equal to the template image. Therefore, we just need to get the shape of the rectangle. Then, we are going to define the bottom right corner of the rectangle that is equal to the top left corner indexed at zero plus the width, and then the top left corner indexed at one plus the height.

height, width, channels = template.shape  
bottom_right = (top_left[0]+width, top_left[1]+height)

Now, we just need to draw our rectangle with the cv2.rectangle() function.

cv2.rectangle(img_copy, top_left, bottom_right, (0,255,0),6)

The final step is to plot all results that we obtained. Let’s see how they look. 

plt.subplot(121)
  plt.imshow(res)
  plt.title("TEMPLATE MATCHING MAP")
  plt.subplot(122)
  plt.imshow(img_copy)
  plt.title("TEMPLATE DETECTION")
  plt.suptitle(m)

  plt.show()
  print("\n")
  print("\n")

(301, 28)

Template matching ccoeff

(301, 28)

Template matching ccoeff normed

(0, 224)

Template matching ccorr

(301, 28)

Template matching ccorr normed

(301, 28)

Template matching sqdiff

(301, 28)

Template matching sqdiff normed

So, as you can see, five of the methods did a pretty good job. Five methods find while one of them made a mistake. We can see that CCoeff, CCoeffNormed, SqDiff, SqDiffNormed and CCorrNormed methods matched the template correctly. Moreover, all five methods identified top left corner of the rectangle at the same location (301,28). On the other hand, we can see that the Corr method found a lot of different comparisons but unfortunately did not detect the correct template. So, even when we have an exact copy of the part of an input image, we still may not end up with correct results. That is why you’re gonna have to make sure you’re using the right method.

Here you can download code from our GitHub repo.

Summary

In this post, we learn how to detect objects in images using a template matching method. We examined several different template matching methods and compared their results in Python. In the next post, we will learn how to extract feature points from images and match them together.