# TF Implementing a VGG-19 network in TensorFlow 2.0
Highlights: In this post we will show how to implement a fundamental Convolutional Neural Network like \(VGG-19\) in TensorFlow. The VGG-19 architecture was design by Visual Geometry Group, Department of Engineering Science, University of Oxford. It competed in the ImageNet Large Scale Visual Recognition Challenge in 2014.
Tutorial Overview:
1. Theory recapitulation
With ConvNets becoming more of a popular in the computer vision field, a number of attempts have been made to improve the original AlexNet architecture. One important aspect of ConvNet architecture design is it’s depth.
Remarkable thing about the \(VGG-19 \) is that instead of having so many hyper parameters it is a much simpler network. It has \(conv \) layers that are just \(3\times3\) filters with a stride of \(1 \), and with the same padding. In all \(Max\enspace pooling \) layers \(2 \times 2\) filters are used with a stride of \(2 \).
For classification, three fully connected layers are used, two with \(4096\) neurons, and last one with \(1000\) neurons.
In all layers except the last one, \(ReLU\) activation function is used, while in the last one \(Softmax\) is used for probability distribution between classes.
\(VGG-19\) is trained on more than a million images from the ImageNet database. The network is 19 layers deep and can classify images into 1000 object categories.
Let’s see in details how this architecture looks like.
Layer Type | Feature Map | Size | Kernel Size | Stride | Activation |
---|---|---|---|---|---|
Image | 1 | 224×224 | – | – | – |
Convolution | 64 | 224×224 | 3×3 | 1 | ReLU |
Convolution | 64 | 224×224 | 3×3 | 1 | ReLU |
Max Pooling | 64 | 112×112 | 2×2 | 2 | – |
Convolution | 128 | 112×112 | 3×3 | 1 | ReLU |
Convolution | 128 | 112×112 | 3×3 | 1 | ReLU |
Max Pooling | 128 | 56×56 | 2×2 | 2 | – |
Convolution | 256 | 56×56 | 3×3 | 1 | ReLU |
Convolution | 256 | 56×56 | 3×3 | 1 | ReLU |
Convolution | 256 | 56×56 | 3×3 | 1 | ReLU |
Convolution | 256 | 56×56 | 3×3 | 1 | ReLU |
Max Pooling | 256 | 28×28 | 2×2 | 2 | – |
Convolution | 512 | 28×28 | 3×3 | 1 | ReLU |
Convolution | 512 | 28×28 | 3×3 | 1 | ReLU |
Convolution | 512 | 28×28 | 3×3 | 1 | ReLU |
Convolution | 512 | 28×28 | 3×3 | 1 | ReLU |
Max Pooling | 512 | 14×14 | 2×2 | 2 | – |
Convolution | 512 | 14×14 | 3×3 | 1 | ReLU |
Convolution | 512 | 14×14 | 3×3 | 1 | ReLU |
Convolution | 512 | 14×14 | 3×3 | 1 | ReLU |
Convolution | 512 | 14×14 | 3×3 | 1 | ReLU |
Max Pooling | 512 | 7×7 | 2×2 | 2 | – |
Fully Connected | – | 4096 | – | – | ReLU |
Fully Connected | – | 4096 | – | – | ReLU |
Fully Connected | – | 1000 | – | – | Softmax |
2. Implementation in TensorFlow
The interactive Colab notebook can be found at the following link
Let’s start with importing all necessary libraries.
After imports, we can create a network. We will create a network by using Sequential API.
Training a network with \(140000000\) parameters will take to long, so here we will just load weights from a pre-trained model. This is done by using the load_weights() function. Weights can be found here.
Our model will predict numerical classes, so in order to make them readable to humans, we need to create a dictionary with their names.
Now we can test our model. Let’s first write a function for this, and then use it to make predictions. We can use TensorFlow function predict() and then use np.argmax() to find the predicted class, or predict_classes() to make it simpler.
Summary
In this post we talked about VGG-19 network and how to implement it in TensorFlow. We made use of available weights in order not to train our network again. In the next post we will show how to implement YOLO v3 object detection algorithm.