#013 B CNN AlexNet
AlexNet
In the previous posts we talked about LeNet−5. Let’s now see one more example of a convolutional neural network. The second convolutional neural network that we are going to present is AlexNet neural network. An input to this neural network is 227\times227\times3. We have a color image as an input and that is why we have 3 channels.
AlexNet architecture
Let’s explore the architecture of this convolutional neural network.
Conv\enspace1
First, we will apply convolutional layer: filter size is f=11 , and a number of filters is 96. In this convolutional layer we will also use a stride of 4. This stride of 4 will decrease dimensions of an input volume by a factor of 4 , so after this first convolutional layer we will get 55\times55\times96 volume.
Max\enspace pool \enspace1
The next layer is Max\enspace pooling layer. In this layer we will use a 3\times3 filter and a stride of 2. This will reduces the dimensions of 55\times 55\times 96 volume to 27\times 27 \times 256 , because we are using a stride of 2 .
Conv\enspace 2
Next layer is a convolutional layer with a filter size f=5 so we are also using a same\enspace convolution, so we will get the same dimensions 27\times27\times256 .
Max\enspace pool \enspace2
After this same convolution, we will apply Max\enspace pooling with a 3\times3 filter and a stride of 2. This will reduce the height and width to 13.
Conv\enspace 3,\enspace Conv\enspace 4, Conv\enspace 5
Next, we will apply 3 3\times3 same convolution layers with padding = 1 and a stride = 1. In the first two convolutional layers we will use 384 filters and in the third (in Conv \enspace 5 layer ) we will use 256 filters.
Max\enspace pool \enspace3
Next, we will apply the third Max\enspace pool layer with a stride of 2, so we have the volume with dimensions 6 \times 6 \times 256 . If we multiply out these numbers 6 \times 6 \times 256=9216. We’re going to unroll this into 9216 nodes.
FC\enspace 6, \enspace FC\enspace 7,\enspace FC\enspace 8
Finally, AlexNet has tree Fully\enspace connected layers. The first two layers have 4096 nodes, whereas the third Fully\enspace connected layer has 1000 units. Finally, we have the softmax to output which one of 1000 classes the object could be.
AlexNet architecture
AlexNet is similar to LeNet , but much larger. We have stated that LeNet-5 has about 60000 parameters. On the other hand, Alexnet has about 60 million parameters which are a big number of parameters to be learned. Splitting these layers across two (or more) GPUs may help to speed up the process of training. Notice also that here we have a lot of hyperparameters that authors of AlexNet had to come up with. Another aspect of this architecture they made it much better than the LeNet , was using the ReLU activation function.
In the next post, we will talk about VGG 16 and VGG 19.