#015 TF Implementing AlexNet in TensorFlow 2.0

#015 TF Implementing AlexNet in TensorFlow 2.0

Highlights: In this post we will show how to implement a fundamental Convolutional Neural Network \(AlexNet\) in TensorFlow 2.0. The AlexNet architecture is designed by Alex Krizhevsky and published with Ilya Sutskever and Geoffrey Hinton. It competed in the ImageNet Large Scale Visual Recognition Challenge in 2012.

Tutorial Overview:

  1. Review of the Theory
  2. Implementation in TensorFlow 2.0

1. Review of the Theory

Real life Computer Vision problems requires big amount of quality data to be trained on. In the past, people were using CIFAR and NORB dataset as a benchmark datasets for Computer Vision problems. However, the ImageNet competition changed that. This dataset required a more complex network than before in order to achieve good results.

One network architecture which achieved the best result back in 2012 was AlexNet. It achieved a Top-5 error rate of 15.3%. The next best result was far behind (26.2%).

This architecture has around 60 million parameters and consists of the following layers.

Layer TypeFeature MapSizeKernel SizeStrideActivation
Image1227×227
Convolution 9655×5511×114ReLU
Max Pooling9627×273×32
Convolution 25627×275×51ReLU
Max Pooling25613×133×32
Convolution 38413×133×31ReLU
Convulution38413×133×31ReLU
Convulution25613×133×31ReLU
Max Pooling2566×63×32
Fully Connected4096ReLU
Fully Connected4096ReLU
Fully Connected1000Softmax

In our case, we will train a model on only two classes from ImageNet dataset, so our last Fully Connected layer will have only two neurons with Softmax activation function.

There are a few changes which made AlexNet differ from other networks back in the time. Let’s see what changed the history!

Overlapping pooling layers

Standard pooling layers are summarizing the outputs of neighboring groups of neurons in the same kernel map. Traditionally, the neighborhoods summarized by adjacent pooling units do not overlap. Overlapping pooling layers are similar to the standard pooling layers, except the adjacent windows over which the Max is computed overlap each other.

Overlapping pooling vs non-overlapping pooling

ReLU nonlinearity

Traditional way to evaluate a neuron output is using \(sigmoid\) or \(tanh\) activation function. These two functions are fixed between min and max value, so they are saturating nonlinear. However, in AlexNet, Rectified linear unit function, or shortly \(ReLU\) is used. This function has a threshold at \(0\). This is a nonsaturating activation function.

\(ReLU\) function requires less computation and allows faster learning, which has a great influence on the performance of large models trained on large datasets.

Local Response Normalization

Local Response Normalization (LRN) was first introduced in AlexNet architecture where the activation function of choice was \(ReLU\). The reason for using LRN was to encourage lateral inhibitionThis refers to the capacity of a neuron to reduce the activity of its neighbors. This is useful when we are dealing with neurons with \(ReLU\) activation function. Neurons with \(ReLU\) activation function have unbounded activations and we need LRN to normalize that.

2. Implementation in TensorFlow 2.0

The interactive Colab notebook can be found at the following link   


Let’s start with importing all necessary libraries.

After imports, we need to prepare our data. Here, we will use only a small part of the ImageNet dataset. With the following code you can download all images and store them in folders.


Now we can create a network. Last layer in the original \(AlexNet\) has 1000 neurons, but here we will use only one. That is because we will use images for only two classes. In order to build our convolutional neural network we will use the Sequential API.

AlexNet architecture

After creating a model, let’s define some important parameters for later use. In addition, let’s create Image Data Generators. \(AlexNet\) has a lot of parameters, 60 million, which is a huge number. This will make overfitting highly possible if there are not sufficient data. So here, we will make use of a data augmentation technique, about which you can find more here.

For the same reason, dropout layers are used in \(AlexNet\). This technique consists of “turning off” neurons with a predetermined probability. This forces each neuron to have more robust features that can be used with other neurons. We won’t use a dropout layers here because we won’t use the whole dataset.

Now we can set up the TensorBoard and start training our model. This way we can track the model performance in real-time.

Learning visualization

Let’s use our model to make some predictions and visualize them.

Data visualization

Summary

In this post we showed how to implement \(AlexNet\) in TensorFlow 2.0. We were using just a part of the ImageNet dataset, and that is why we did not get the best results. For better accuracy, more data and longer training time is required.