#007 CNN One Layer of A ConvNet

#007 CNN One Layer of A ConvNet

One layer of a Convolutional Neural Network

We will now present how to  make one convolutional layer within our network. Let’s go through the example. We’ve seen in the previous post how to take a 3D volume and convolve it with two different filters in order to get two different \(4 \times 4 \) outputs.

An example of a convolution with two different filters

An example of a convolution with two different filters

Convolving with the first filter gives us one \(4 \times 4 \) image output, and convolving with the second filter gives a different \(4 \times 4 \) output. To turn this into a convolutional neural network layer we need to add bias which is a scalar. Python broadcasting provides that bias is added to every element in that \(4 \times 4 \) output, or to all these sixteen elements. Then we will apply activation function, for example \(ReLU \) activation function. The same we will do with the output we got by applying the second \(3 \times 3 \times 3 \) filter (kernel). So, once again we will add a different bias and then we will apply a \(ReLU \) activation function. After adding a bias and after applying a \(ReLU \) activation function dimensions of outputs remain the same, so we have two \(4 \times 4 \) matrices.

The result of convolution with two filters

The result of convolution with two filters

Next, we will repeat previous steps. Then we stack end up with a \(4 \times 4 \times 2 \) output. This computation have gone from \(6 \times 6 \times 3 \) to a \(4 \times 4 \times 2 \) and it represents one layer of a convolutional neural network.

The result of a convolution of  \(6\times  6\times 3\) with  two \(3\times 3\times 3 \) is a volume of dimension \(4\times 4\times 2\)

The result of a convolution of  \(6\times  6\times 3\) with  two \(3\times 3\times 3 \) is a volume of dimension \(4\times 4\times 2\)

In neural networks one step of a forward propagation step was: \(Z^{\left [ 1 \right ]}=W^{\left [ 1 \right ]}\times a^{\left [ 0 \right ]}+b^{\left [ 1 \right ]} \),  where \(a^{\left [ 0 \right ]}=x \). Then we applied the non-linearity to get \(a^{\left [ 1 \right ]}=g^{Z^{[l]}} \). The same idea we will apply in a layer of the Convolutional Neural Network.

A convolutional layer

A convolutional layer

Now we will compare terms used in Neural Networks with the one that we use in Convolutional Neural Networks.

Using the analogy from neural networks, our input here is this \(6\times 6 \times 3\) volume and these convolutional filters are like \(W^{[1]} \). 

During the convolution operation we’re taking these \(27 \) numbers, or really \(27 \times 2 \) because we have two filters. We’re creating all of these numbers and multiplying them so we’re really computing a linear function to get this \( 4 \) \(\times \) \( 4 \) . 

In addition we also add bias, before applying activation function. This plays a role similar to \(Z \), and then finally by applying the non-linearity, this output becomes our activation at the next layer. This is how we go from \(a^{\left [ 0 \right ]} \) to \(a^{\left [ 1 \right ]} \). So, the convolution is really:

  1. apply the linear operation 
  2. add the biases and
  3. apply \(ReLU \)

We’ve gone from a \(6 \times 6 \times 3 \) dimensional \(a^{\left [ 0 \right ]} \) through one layer of a neural network to a \(4 \times 4 \times 2 \) dimensional \(a^{\left [ 1 \right ]} \). So, \(6 \times 6 \times 3 \) has gone to \(4\times4\times2 \) and that’s one layer of a convolutional net. In this example we had two filters involved which is why we end up with \(4 \times 4 \times 2 \) output. If we had \(10 \) filters instead of \(2 \) then would have we wound obtain a \(4\times 4\times 10 \) dimensional output volume. That is we’d be taking \(10 \) of these maps instead of \(2 \) of them, and stacking them up to form a \(4\times 4\times 10 \) output volume, and that’s how \(a^{\left [ 1 \right ]} \) would be obtained.

Number of parameters in one layer

 Let’s go through the following exercise. Let’s suppose we have \(10 \) filters that are \(3 \times 3 \times 3 \) in one layer of a neural network. So, how many parameters does this layer have?

If you have \(10 \) filters that are \(3 \times 3 \times 3 \) in one layer of neural network, how many parameters does that layer have?

Number of parameters in one convolutional layer 

Number of parameters in one convolutional layer 

In each filter, there is a \(3 \times 3 \times 3 \) volume so each filter has \(27 \) parameters to be learned. Then, we added  the bias,  parameter \(b \), so this gives us \(28 \) parameters. Previously we had two filters, but now if we imagine that we actually have ten of these filters, then we have \(28 \times 10 \) so that would be \(280 \) parameters. Nice point about this is that no matter how big the input images are the number of parameters will remain fixed. The input image could be \(1000 \times 1000 \) or \(5000 \times 5000\), but the number of parameters we have remains \(280 \). We can use these ten filters to detect features: vertical edges, horizontal edges, maybe other features anywhere even in the very large image with just a very small number of parameters. This is really one property of convolutional neural nets that makes them less prone to overfitting. So, once we learn ten feature detectors that work, we could apply this even to very large images and the number of parameters also remains fixed and relatively small as \(280 \) in this example.

Let’s just summarize the notation we’re going to use, one convolutional layer in a convolutional neural network. If a layer \(l \) is a convolutional layer, we’re going to denote the filter size with \(f^{\left [ l \right ]} \). So, previously we’ve said the filters are \(f \times f \) and  this superscript \([l] \) signifies that this is a filter size \(f \times f \)  filter in layer \(l \). As usual, the superscript square bracket \(l \) is the notation we’re using to refer to particular layer \(l \). Then, we use \(p^{\left [ l \right ]}\) to denote the amount of padding, and the amount of padding can also be specified just by saying that we want a valid convolution, which means no padding, or a same convolution which means we choose a padding so that the output image size has the same height and width as the input image consequently. We’re going to use \(s^{\left [ l \right ]} \) to denote the stride.

In the next post we will take a look at a simple convolutional neural network example.

More resources on the topic:

Leave a Reply

Your email address will not be published. Required fields are marked *