#003 CNN More On Edge Detection
More on Edge Detection
We’ve seen in the previous post how the convolution operation allows us to implement a vertical edge detector. In this post we will learn:
- the difference between positive and negative edges and what is the difference between light to dark versus dark to light edge transitions
- types of filters ( detectors )
- how an algorithm can learn detector’s parameters (coefficients)
A vertical edge detector
Let’s have a look at this \(6 \times 6 \) image. It has light on the left and dark on the right. If we convolve it with the vertical edge detection filter it will result in detection of the vertical edge. This vertical edge is shown in the middle of the output image as we can see in the picture below.
An example of a vertical edge detection. Here we have light to dark transition.
Dark to light versus light to dark transition
Next, let’s see what happens if in an original input image we flip the colors. It becomes darker on the left and brighter on the right side. That is, tens are now on the right half of the image and the zeros are on the left. If we convolve it with the same edge detection filter, we will obtain \(-30s \) instead of \(30s \) in the middle. The output image is depicted in two shades of gray. So, because the shade of the transitions is reversed, the 30s now get reversed as well, and \(-30s \) shows that this is a dark to light, rather than the light to dark transition. If we don’t care which of these two possible transitions we detected, we can take absolute values of output matrix. However, this particular filter does make a difference between the light to dark versus the dark to light edges.
An example of vertical edge detection of dark to light transition
Let’s see some more examples of edge detection. In the picture below we can see two filters for edge detection. On the left we can see a filter for vertical edge detection and on the right we can see a horizontal edge detection.
Filters for vertical (left) and horizontal (right) edge detection
This \(3 \times 3 \) filter we’ve seen allows us to detect vertical edges, so maybe it should not surprise us that rotated \(3 \times 3 \) filter will allow us to detect horizontal edges. So as a reminder, a vertical edge, according to this filter, is a \(3 \times 3 \) region where the pixels are relatively bright on the left part and relatively dark on the right part. Similarly, a horizontal edge would be detected as if the region where the pixels are relatively bright on top and relatively dark in the bottom row.
Let’s see one more complicated example.
This is a more complex one, where we have tens in the upper left and lower right corner. We also draw (sketch) this as an image.
A horizontal edge detection
Now we will see detection of edges in the picture below.
An image with a chessboard pattern
This image is going to be darker where zeros are located. We will shade these darker regions. An image will be lighter in the upper left and lower right corners. If we convolve this image with a horizontal edge detector we will obtain the following output matrix.
An example of a horizontal edge detection
This \(30 \) (green) here corresponds to this \(3 \times 3 \) green region where indeed there are bright pixels on the top and darker color pixels on the bottom. Hence, it detects a strong positive edge. This negative \(30 \) in the above picture corresponds to the region which is actually brighter on the bottom and darker on top, so that is a negative edge in this example. This \(+/- 10 \) are kind of artifact due to the fact that we’re working with relatively small images. This is just a \(6 \times 6 \) image, but intermediate values like \(+/-10 \) just reflect the fact that that filter here captures part of the positive edge on the left and part of the negative edge on the right. If this is a \(1000 \times 1000 \) image with this type of chessrboard pattern, then you won’t see these transition regions of the \(10s \), and the intermediate values would be quite small relative to the size of the image. So, in summary, different filters allow us to find both vertical and horizontal edges.
Example of some filters
It turns out that the \(3\times 3 \) vertical edge detection filter we’ve used, is just one possible choice and historically in the computer vision literature, there was a fair amount of debate about what is the best set of numbers (coefficients) to use. So, here’s something else we could use which is called a Sobel filter. The advantage of this is that it allows you to put a little bit more weight to the central row of the central pixel, and this makes it maybe a little bit more robust, but computer vision researchers will use other excessive numbers for instance, instead of a \(1, 2, 1 \) it should be a \(3, 10, 3 \) and also then \(-3, -10, -3 \). This is called a Scharr filter and it has slightly different properties. This is just for vertical edge detection and if we rotate it \(90° \), we get a horizontal edge detector.
How to choose weights in the filter?
We can make our algorithm to learn parameters of the filter
With the rise of deep learning one of the things we learned is that when we want to detect edges in some complicated image, maybe we don’t need to have computer vision researchers handpick these \(9 \) numbers. It turns out that we can just learn them and treat the 9 numbers of this matrix as parameters, which we can learn using backpropagation. The goal is to learn \(9 \) parameters so that when we take the \(6 \times 6 \) image and convolve it with our \(3 \times 3 \) filter, it should construct a good edge detector. We will see in our later posts that by just treating these \(9 \) numbers as parameters the backprop can choose to learn \(1, 1, 1 \) or \(-1, -1, -1 \) or learn the Sobel filter or Scharr filter. Moreover, it can learn some other parameters that are even better at capturing the statistics of our data than any of these hand coded filters. Also, rather than just vertical and horizontal edges, maybe we can learn to detect edges there at \(45° \) or \(70° \) or \(73° \) or any other orientation it chooses. So, by just letting all of these numbers to be parameters and if we learn them automatically from data, we find that our new networks can actually learn low-level features. That is how we can learn features such as edges even more robustly than computer researchers are generally able to code up these things by hand. But underlining all these computations is still this convolution operation which allows back-propagation to learn whatever \(3 \times 3 \) filter wants and then apply it throughout the entire image. In order to output whatever feature it’s trying to detect: the vertical edges, results of edges, edges at some angle, or even some other filter that we might not even have a name for it in English.
So, the idea that we can treat these numbers as parameters we learn, has been one of the most powerful ideas in the computer vision. Later, in our posts we will explain the details how to use a back-propagation to learn these 9 numbers.
In the next post we will learn about Padding.