#013 B CNN AlexNet
In the previous posts we talked about LeNet−5. Let’s now see one more example of a convolutional neural network. The second convolutional neural network that we are going to present is \(AlexNet \) neural network. An input to this neural network is \(227\times227\times3\). We have a color image as an input and that is why we have \(3 \) channels.
\(AlexNet \) architecture
Let’s explore the architecture of this convolutional neural network.
First, we will apply convolutional layer: filter size is \(f=11 \), and a number of filters is \(96\). In this convolutional layer we will also use a stride of \(4\). This stride of \(4\) will decrease dimensions of an input volume by a factor of \(4 \), so after this first convolutional layer we will get \(55\times55\times96 \) volume.
\(Max\enspace pool \enspace1\)
The next layer is \(Max\enspace pooling\) layer. In this layer we will use a \(3\times3 \) filter and a stride of \(2\). This will reduces the dimensions of \(55\times 55\times 96 \) volume to \( 27\times 27 \times 256 \), because we are using a stride of \(2 \).
Next layer is a convolutional layer with a filter size \(f=5 \) so we are also using a \(same\enspace \) convolution, so we will get the same dimensions \(27\times27\times256 \).
\(Max\enspace pool \enspace2\)
After this \(same\) convolution, we will apply \(Max\enspace pooling \) with a \(3\times3\) filter and a stride of 2. This will reduce the height and width to 13.
\(Conv\enspace 3,\enspace Conv\enspace 4, Conv\enspace 5 \)
Next, we will apply \(3 \) \(3\times3 \) \( same \) convolution layers with padding = 1 and a stride = 1. In the first two convolutional layers we will use \(384 \) filters and in the third (in \( Conv \enspace 5 \) layer ) we will use \(256 \) filters.
\(Max\enspace pool \enspace3 \)
Next, we will apply the third \(Max\enspace pool \) layer with a stride of 2, so we have the volume with dimensions \(6 \times 6 \times 256 \). If we multiply out these numbers \(6 \times 6 \times 256=9216\). We’re going to unroll this into \(9216 \) nodes.
\(FC\enspace 6, \enspace FC\enspace 7,\enspace FC\enspace 8 \)
Finally, \(AlexNet \) has tree \(Fully\enspace connected\) layers. The first two layers have \(4096 \) nodes, whereas the third \(Fully\enspace connected\) layer has \(1000 \) units. Finally, we have the \(softmax\) to output which one of \(1000 \) classes the object could be.
\(AlexNet\) is similar to \(LeNet \), but much larger. We have stated that \( LeNet-5 \) has about \(60000 \) parameters. On the other hand, Alexnet has about \(60\) million parameters which are a big number of parameters to be learned. Splitting these layers across two (or more) GPUs may help to speed up the process of training. Notice also that here we have a lot of hyperparameters that authors of \(AlexNet \) had to come up with. Another aspect of this architecture they made it much better than the \(LeNet \), was using the ReLU activation function.
In the next post, we will talk about VGG 16 and VGG 19.