datahacker.rs@gmail.com

# #013 B CNN AlexNet

## $$AlexNet$$

In the previous posts we talked about LeNet5. Let’s now see one more example of a convolutional neural network. The second convolutional neural network that we are going to present is $$AlexNet$$ neural network. An input to this neural network is $$227\times227\times3$$. We have a color image as an input and that is why we have $$3$$ channels.

$$AlexNet$$ architecture

Let’s explore the architecture of this convolutional neural network.

$$Conv\enspace1$$

First, we will apply convolutional layer: filter size is $$f=11$$, and a number of filters is $$96$$. In this convolutional layer we will also use a stride of $$4$$. This stride of $$4$$ will decrease dimensions of an input volume by a factor of $$4$$, so after this first convolutional layer we will get $$55\times55\times96$$ volume.

$$Max\enspace pool \enspace1$$

The next layer is $$Max\enspace pooling$$ layer. In this layer we will use a $$3\times3$$ filter and a stride of $$2$$. This will reduces the dimensions of $$55\times 55\times 96$$ volume to $$27\times 27 \times 256$$, because we are using a stride of $$2$$.

$$Conv\enspace 2$$

Next layer is a convolutional layer with a filter size $$f=5$$ so we are also using a $$same\enspace$$ convolution, so we will get the same dimensions $$27\times27\times256$$.

$$Max\enspace pool \enspace2$$

After this $$same$$ convolution, we will apply $$Max\enspace pooling$$ with  a $$3\times3$$ filter and a stride of 2. This will reduce the height and width to 13.

$$Conv\enspace 3,\enspace Conv\enspace 4, Conv\enspace 5$$

Next, we will apply $$3$$ $$3\times3$$ $$same$$ convolution layers with padding = 1 and a stride = 1. In the first two convolutional layers we will use $$384$$ filters and in the third (in $$Conv \enspace 5$$ layer ) we will use $$256$$ filters.

$$Max\enspace pool \enspace3$$

Next, we will apply the third $$Max\enspace pool$$ layer with a stride of 2, so we have the volume with dimensions $$6 \times 6 \times 256$$. If we multiply out these numbers $$6 \times 6 \times 256=9216$$. We’re going to unroll this into $$9216$$ nodes.

$$FC\enspace 6, \enspace FC\enspace 7,\enspace FC\enspace 8$$

Finally, $$AlexNet$$ has tree $$Fully\enspace connected$$ layers. The first two layers have $$4096$$ nodes, whereas the third $$Fully\enspace connected$$ layer has $$1000$$ units. Finally, we have the $$softmax$$ to output which one of $$1000$$ classes the object could be.

AlexNet architecture

$$AlexNet$$ is similar to $$LeNet$$, but much larger. We have stated that $$LeNet-5$$ has about $$60000$$ parameters. On the other hand, Alexnet has about $$60$$ million parameters which are a big number of parameters to be learned. Splitting these layers across two (or more) GPUs may help to speed up the process of training. Notice also that here we have a lot of hyperparameters that authors of $$AlexNet$$  had to come up with. Another aspect of this architecture they made it much better than the $$LeNet$$, was using the ReLU activation function.

In the next post, we will talk about VGG 16 and VGG 19.