#013 CNN VGG 16 and VGG 19
\(VGG \) neural network
In the previous posts we talked about \(LeNet-5 \) and AlexNet . Let’s now see one more example of a convolutional neural network called \(VGG-16 \) and \(VGG-19 \) network.
In this network smaller filters are used, but the network was built to be deeper than convolutional neural networks we have seen in the previous posts.
Architecture of \(VGG-16 \)
Remarkable thing about the \(VGG-16 \) is that instead of having so many hyper parameters we will use a much simpler network. We will focus on just having \(conv \) layers that are just \(3\times3\) filters with a stride of \(1 \), and with the same padding. In all \(Max\enspace pooling \) layers we will use \(2 \times 2\) filters with a stride of \(2 \).
Let’s go through the architecture.
- The first two layers are convolutional layers with \(3 \times 3 \) filters, and in the first two layers we use \(64\) filters so we end up with a \(224 \times 224 \times 64 \) volume because we’re using \(same \) convolutions (height and width are the same). So, this \((CONV\enspace 64) \times 2 \) represents that we have \(2\enspace conv\) layers with \(64\) filters. The filters are always \(3 \times 3\) with stride of \(1 \) and they’re always implemented with the \(same \) convolutions.
- Then, we use a \(pooling \) layer which will reduce height and width of a volume: it goes from \(224 \times 224 \times 64\) down to \(112 \times 112 \times 64\).
- Then we have a couple more \(conv \) layers. Here we use \(128 \) filters and because we use the \(same \) convolutions, a new dimension will be \(112 \times 112 \times 128\).
- Then, a \(pooling \) layer is added so new dimension will be \( 56 \times 56 \times 128 \).
- \(2 \enspace conv\) layers with \(256 \) filters
- The \(pooling \) layer
- A few more \(conv\) layers with \(512 \) filters
- A \(pooling \) layer
- A few more \(conv\) layers with \(512 \) filters
- A \(pooling \) layer
- At the end we have final \(7 \times 7 \times 512\) into \(Fully\enspace connected\) layer \((FC) \) with \(4096 \) units, and in a \(softmax \) output one of a \(1000 \) classes.
Layers of \(VGG-16 \) and \(VGG-19 \)
Number \(16 \) in the name \(VGG-16\) refers to the fact that this has \(16\) layers that have some weights. This is a pretty large network, and has a total of about \(138\) million parameters. That’s pretty large even by modern standards. However, the simplicity of the \(VGG-16 \) architecture made it quite appealing. We can tell that this architecture is really quite uniform. There are a few \(conv \) layers followed by a \(pooling \) layer which reduces the height and width of a volume. If we look at a number of filters we use we can see that we have \(64\) filters and then we double it to \(128 \) and then to \(256\) and in the last layers we use \(512\) layers. The number of filter we use is roughly doubling on every step or doubling through every stack of \(conv \) layers and that is another simple principle used to design the architecture of this network. The main downside was that it was a pretty large network in terms of the number of parameters to be trained. \(VGG-19\) neural network which is bigger then \(VGG-16\), but because \(VGG-16\) does almost as well as the \(VGG-19\) a lot of people will use \(VGG-16\).
In the next post, we will talk more about Residual Network architecture.