#005 Advanced Computer Vision – Basketball Player Tracking with Open CV

Highlights: Hello and welcome. In this post, we are going to talk about the concept of object tracking. We will clarify the idea of object tracking and how this concept differs from object detection. After that, we will learn about the most popular object-tracking algorithms from the OpenCV library and we will explore their applications in real-world scenarios. In particular, we are going to learn how to use object tracking to track basketball players on the court. So, let’s begin.

What is object tracking?

First, let’s explain the idea behind object tracking. Object tracking is a process of locating an object in consecutive video frames. The process starts with object detection. In the previous post, we learned that when the object is detected we also get a bounding box for that object. Detection is applied for every object that we are trying to track in the first frame of a video. Then, an ID is assigned and the algorithm tries to carry across this ID in consecutive frames and identify the new position of the same object.

Difference between object detection and object tracking

Now you’re probably wondering what is the difference between object detection and object tracking. Well, the most important difference is that object detection detects an object in every frame. In other words, it is a series of repeated detections. On the other hand, object tracking only requires object detection in the first frame. After that detection, the algorithm has enough information about the object. It learns about the appearance of the object, the location in the previous frame, and the direction and velocity of its motion. Then, the algorithm can use all this information to predict the location of the object in the next frame. That is why object-tracking algorithms are generally much faster than object-detection algorithms.

Also, object-tracking algorithms are more robust to occlusion than detection algorithms. For example, if we are the object that we are tracking often can be occluded by another object in the scene. In this case, the object detector will probably fail. On the other hand, a good tracking algorithm will be able to handle these kinds of obstacles if there are happened for a short period of time.

Now that we have learned what are the basic concepts of object tracking, let’s dive deeper into some practical examples [2]. Our goal is to explore several algorithms provided by OpenCV and compare their performance in tracking basketball players.

Object tracking in Python

First, let’s import the necessary libraries. For our task, we just need to import the OpenCV library.

import cv2

Now, we will capture our video. Note, that we are going to use the same video for all tracking algorithms that we are going to explore. In that way, we will be able to compare their results.

BOOSTING Tracker

Now, let’s define our tracker object by calling the function cv2.legacy.TrackerBoosting_create(). This is the function of one of the most popular tracking algorithms called BOOSTING Tracker.

Next, we will read our video and manually define our initial bounding box. For this step, we need to use some of the object detection algorithms like YOLO. However, to simplify the code and to pay attention to the tracking rather than object detection, we will manually define the bounding box around the player that we going to track.

cap = cv2.VideoCapture("Aaron Gordon.mp4") 

tracker = cv2.legacy.TrackerBoosting_create()

ret, img = cap.read()

bbox = (480, 240, 100, 270)
tracker.init(img, bbox)

The next step is to define the function drawBox(). This function will draw a bounding box around the tracking object in each frame. Also, it will write “Tracking” in the upper left corner, in case player tracking was successful in that particular frame.

def drawBox(img, bbox):
  x, y, w, h = int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3])
  cv2.rectangle (img,(x,y), ((x+w), (y+h)), (255,0,0), 3,1)
  cv2.putText(img, "Tracking", (120,75), cv2.FONT_HERSHEY_SIMPLEX, 0.75,(0,255,0),2)

Next, we will define our FOURCC code and our output video.

fourcc = cv2.VideoWriter_fourcc('M', 'P', '4', 'V')
out = cv2.VideoWriter("output.mp4", fourcc, cap.get(cv2.CAP_PROP_FPS), (1280,720))

Now, we will read our video and loop over all frames. Then, all we have to do is to update our tracker. For that, we will use the function tracker.update(). Also, to measure the speed of the tracking algorithm we will calculate fps and display that number in the upper left corner. Moreover, if the tracking is successful we are going to write “Tracking” in the upper left corner. Otherwise, we will write “Tracking lost”.

while True:
  success, img = cap.read()
  if not success:
      break

  success, bbox = tracker.update(img)
  timer = cv2.getTickCount()

  if success:
    drawBox(img,bbox)
  else:
    cv2.putText(img, "Tracking lost", (120,75), cv2.FONT_HERSHEY_SIMPLEX, 0.75,(0,0,255),2)
      
  fps = cv2.getTickFrequency() / (cv2.getTickCount() - timer);
  cv2.putText(img, str(int(fps)), (120,100), cv2.FONT_HERSHEY_SIMPLEX, 0.75,(0,0,255),2)

  out.write(img)

cap.release()
out.release()
cv2.destroyAllWindows()

Now let’s see the results of the BOOSTING Tracker algorithm.

As you can see, the tracking was correct during the first quarter of the video, but then the algorithm fails to detect the player.

This tracker is based on an online version of AdaBoost. This is the algorithm that is used for the HAAR cascade face detector. To better understand this algorithm let’s have a look at the following image [1].

Given an initial position of the object (a) in time \(t \), the classifier is evaluated at many possible positions in a surrounding search region in frame \(t + 1 \) (b). The tracking step is based on the classical approach of template tracking. Note that the classifier runs on every pixel in the neighborhood of the previous location. The initial bounding box is taken as the positive example for the object, and mage patches outside the bounding box are treated as the background. Next, the achieved confidence map (c) is analyzed in order to estimate the most probable position. Finally, for each position, a confidence value is obtained and the tracker (classifier) is updated (d). The new location of the object is the one where the maximum confidence value is recorded.

Pros: We can say that this algorithm performs O.K. However, the algorithm is 10 years old and when we compare it with other advanced trackers based on similar principles, we can say that its performance is not that good. So, there are not so many pros to using this algorithm

Cons: Tracking performance is mediocre and the algorithm does not reliably know when tracking has failed.

Multiple Instance Learning (MIL)

To run this algorithm we will use the same code, only we will replace the previous tracker function with the following one.

tracker = cv2.legacy.TrackerMIL_create()

We can see that this algorithm does a much better job. For the first half of the video, tracking is good, and in the second half, it failed.

This tracking algorithm utilizes a similar concept as the BOOSTING tracker. However, it differs in its approach to identifying positive examples. Rather than solely relying on the current location of the object, it examines a small neighborhood around the current location to generate multiple potential positive examples. This methodology allows for a more comprehensive identification of the object in question.

Pros: The algorithm performance is pretty good. This is probably one of the best trackers available in OpenCV.

Cons: Similarily to the BOOSTING tracker, the tracking failure is not reported. Also, this algorithm does not recover when full occlusion occurs.

Kernelized Correlation Filters (KCF)

To apply this algorithm we will use the function cv2.legacy.TrackerKCF_create().

tracker = cv2.legacy.TrackerKCF_create()

This tracker is based on the same principles as the previous two trackers. The main difference is that this algorithm utilizes the mathematical properties arising from the large overlapping regions of positive samples used in the MIL tracker to improve the accuracy and speed of tracking. The algorithm estimates an optimal image filter that produces a desired response, such as a Gaussian shape centered at the target location, when applied to the input image. The filter is trained on translated instances of the target patch and is used to evaluate the response of the filter during testing, with the maximum response indicating the new position of the target. The result is a highly efficient and accurate algorithm for object tracking.

Pros: The improved speed and accuracy compared to the MIL tracker. Also, this algorithm reports tracking failure.

Cons: Does not recover from full occlusion.

Tracking, learning, and detection (TLD)

To apply this algorithm we will use the function cv2.legacy.TrackerTLD_create().

tracker = cv2.legacy.TrackerTLD_create()

As the name suggests, this tracker decomposes tracking tasks into three components: tracking, learning, and detection. Here is a brief overview of this tracker.

Tracking is initialized by a single bounding box. Given this bounding box, TLD immediately learns the initial target model.
The object is then tracked using a frame-to-frame tracker as long as it is visible. The tracker reports the object state using a bounding box that defines the location and scale in the image coordinates. During tracking the model is continuously updated with new information about the target.
In the event that the target becomes fully occluded or dispersals from the field of view, TLD uses the online learned model and re-detects the target once it becomes visible again.
In case the tracker drifts or the object reappears with a significantly different appearance after occlusion, TLD allows trajectory correction. The operator provides an additional bounding box and this information is smoothly integrated into the target model. This feature provides improved tracking stability.

Pros: Works best under occlusion over multiple frames.

Cons: Lots of false positives. The tracker sometimes tends to temporarily track a different object from the one that we intended to track.

Minimum Output Sum of Squared Error (MOSSE)

To apply this algorithm we will use the function cv2.legacy.TrackerMOSSE_create().

tracker = cv2.legacy.TrackerMOSSE_create()

The MOSSE algorithm uses adaptive correlation for object tracking which produces stable correlation filters when initialized using a single frame. Furthermore, this algorithm is robust to variations in lighting, scale, pose, and non-rigid deformations.

Pros: Detects occlusion based on the peak-to-sidelobe ratio, which enables the tracker to pause and resume where it left off when the object reappears. Performs much faster compared to the other trackers.

Cons: Algorithms sometimes fails to track the object even when small movement occurs.

Tracking algorithms comparison

In the following table, you can check the overall performance of these 5 algorithms.

So, each of these 5 algorithms has its pros and cons. But which algorithm performs the best when tracking basketball players on the court? To answer this question we need to analyze our code a little bit further and try to figure out how to improve the tracking.

One thing that we can do is manually change the size of the bounding box. Let’s use other basketball videos and apply this trick.

Have a look at the two videos above. In the first video, we can see the bounding box of its original size, and in the second video, we can see the much smaller customized bounding box. The tracker that we used in these two examples was the MIL tracker. As you can see in the example on the left side, player tracking is very good for most parts of the video except for the last second when the tracker started to track another player. To avoid this problem we changed the size of the bounding box. However, this trick did not help us so much because the tracker started to track another player, as you can see in the video on the right side.

Now, let’s test the performance of these 5 tracking algorithms on different videos. In each of these videos, we will track just one player and plot the \(x \) and \(y \) coordinate of the center of the bounding box. In that way, we will be able to display and compare the performance of all 5 algorithms. Let’s have a look.

We can clearly see that the MOOSE tracking algorithm performed best in all of the 4 examples while TDL had the worst results because of the lots of false positive detections.

Summary

So, that is it for this post. In this post, we talked about the basic idea behind object tracking, and we explained the main difference between object tracking and object detection. Also, we learned the easiest way to track objects using the OpenCV library. Furthermore, we explored some of the best tracking algorithms that the OpenCV library can offer, and compared their speed and accuracy in tracking basketball players on the court.

References:

[1] Grabner, Helmut, Michael Grabner, and Horst Bischof. “Real-time tracking via on-line boosting.” Bmvc. Vol. 1. No. 5. 2006.

[2] Object Tracking using OpenCV (C++/Python)

#005 Advanced Computer Vision – Basketball Player Tracking with Open CV