Processing math: 100%
datahacker.rs@gmail.com

#016 CNN Network in Network – 1×1 Convolutions

#016 CNN Network in Network – 1×1 Convolutions

Network in Network – 1×1 Convolutions

In terms of designing ConvNet architectures one of the ideas that really helps is using a 1\times 1 convolution. You might be wondering what does 1\times 1 convolution do? Isn’t that just multiplying by a number? It seems like a funny thing to do. However, it turns out that it’s not quite like that. Let’s take a look!

What does a 1\times 1 convolution do?

1x1 convolution

An example of 1\times1 convolution

We can see 1\times 1 filter which consists of just one number, number 2 . If we take this 6\times 6\times 1  image and convolve it with a 1\times 1\times 1   filter, we obtain the image and multiply it by 2 .

A convolution by a 1\times 1 filter doesn’t seem totally useful. We just multiply it by some number, but that’s the case of 6\times 6\times 1 channel images. If we have a 6\times 6\times 32 , instead of 1\times 1 , then a convolution with a 1\times 1 filter can do something that makes much more sense. Let’s look in the following picture.

3d images - 1x1 convolution

An example of a  1\times 1 convolutions on a 3D images

In particular a 1\times 1 convolution will look at each of the 36 different positions (6\times 6 ), and it will take the element-wise product between 32 numbers on the left and the 32 numbers in the filter. Then, a ReLU non-linearity will be applied. Looking at one out of the 36 positions, maybe one slice through this volume, we take these 32 numbers, multiply it by 1\times 1 slice through the volume, and we get a single number.

In fact, one way to think about the 32 numbers we have in this 1 \times 1 \times 32 filter is that if we have one neuron. That is, taking as input 32 numbers, multiplying them by 32 weights and then applying a ReLU   non-linearity to it, and then obtaining the corresponding result as the output.

More generally, if we have not just one filter but multiple filters, then it’s as if we have not just one unit but multiple units that are taking as inputs all the numbers in one slice, and then building them up into an output there the 6\times 6\times number \enspace of \enspace filters . One way to think about the 1\times 1 convolution is that it is basically like having a fully connected neural network that applies to each of the 36 different positions. What this fully connected neural network does, it has a 32 dimensional input whereas the number of outputs equals the number of 1\times 1 filters applied. Doing this every 36 positions we end up with an output that is 6\times 6 \times number \enspace of \enspace filters . This can carry out a pretty non-trivial computation on our input volume. This idea is often called a 1\times 1 convolution, but sometimes it’s also called a Network\enspace in\enspace Network . This idea has been very influential. It  has influenced many other neural network architectures, including the Inception\enspace network  which we’ll see in the next posts.

Let’s see an example where a 1\times 1 convolution is useful.

Using 1\times 1 convolutions

Reducing a number of channels

An example of how we can reduce a number of channels witx 1\times 1 convolution

If we want to shrink the height and width we can use a pooling layer, and we know how to do that. But what if the number of channels has gotten too big and we want to shrink that? How do we shrink that into a 28\times 28 \times 32 dimensional volume?

What we can do is use 32 filters that are 1\times 1 , and technically each filter would be of dimension 1\times 1 \times192 because the number of channels in our filter has to match the number of channels in input volume. But we use 32 filters and the output of this process will be a 28\times 28 \times 32 volume. This is a way to let us shrink n_{c}   (number of channels). We’ll see later how this idea of 1\times 1 convolutions allows us to shrink the number of channels and therefore save on computations and networks. But of course, if we want to keep the number of channels to the 192 that’s fine too. The effect of a 1\times 1 convolution is that we apply non-linearity that allows us to learn the more complex function. Adding another layer also helps us to learn more complex functions. So, we can have the input which is 28 \times 28 \times 192 dimensional and the output is 28 \times 28 \times 192

That’s how a 1\times 1 convolutional layer is actually doing something pretty non-trivial. It has a non-linearity too, in our network and allow us to decrease or keep the same, or increase the number of channels in our volumes. In our next post we’ll see that this is actually very useful for building the Inception\enspace network . To conclude, a 1\times 1 convolution operation is actually doing a pretty non-trivial operation and allows us to shrink the number of channels in our volumes or keep it the same or even increase it if we want.

More resources on the topic:

Leave a Reply

Your email address will not be published. Required fields are marked *