datahacker.rs@gmail.com

# #013 C CNN VGG 16 and VGG 19

## $$VGG$$ neural network

In the previous posts we talked about $$LeNet-5$$ and AlexNet  . Let’s now see one more example of a convolutional neural network called  $$VGG-16$$ and $$VGG-19$$ network.

In this network smaller filters are used, but the network was built to be deeper then convolutional neural networks we have seen in the previous posts.

Architecture of $$VGG-16$$

Remarkable thing about the $$VGG-16$$ is that instead of having so many hyper parameters we will use a much simpler network. We will focus on just having $$conv$$ layers that are just $$3\times3$$ filters with a stride of $$1$$, and with the same padding. In all $$Max\enspace pooling$$ layers we will use $$2 \times 2$$ filters with a stride of $$2$$.

Let’s go through the architecture.

• The first two layers are convolutional layers with  $$3 \times 3$$ filters, and in the first two layers we use $$64$$ filters so we end up with a $$224 \times 224 \times 64$$ volume because we’re using $$same$$ convolutions (height and width are the same). So, this $$(CONV\enspace 64) \times 2$$ represents that we have $$2\enspace conv$$ layers with $$64$$ filters. The filters are always $$3 \times 3$$ with stride of $$1$$ and they’re always implemented with the $$same$$ convolutions.
• Then, we use a $$pooling$$ layer which will reduce height and width of a volume: it goes from $$224 \times 224 \times 64$$ down to $$112 \times 112 \times 64$$.
• Then we have a couple more $$conv$$ layers. Here we use $$128$$ filters and because we use the $$same$$ convolutions, a new dimension will be $$112 \times 112 \times 128$$.
• Then, a $$pooling$$ layer is added so new dimension will be $$56 \times 56 \times 128$$.
• $$2 \enspace conv$$ layers with $$256$$ filters
• The $$pooling$$ layer
• A few more $$conv$$ layers with $$512$$ filters
• A $$pooling$$ layer
•  A few more $$conv$$ layers with $$512$$ filters
• A $$pooling$$ layer
• At the end we have final $$7 \times 7 \times 512$$ into $$Fully\enspace connected$$ layer $$(FC)$$ with $$4096$$ units, and in a $$softmax$$ output one of a $$1000$$ classes.

Layers of $$VGG-16$$ and $$VGG-19$$

Number $$16$$ in the name $$VGG-16$$ refers to the fact that this has $$16$$ layers that have some weights. This is a pretty large network, and has a total of about $$138$$ million parameters. That’s pretty large even by modern standards. However, the simplicity of the $$VGG-16$$ architecture made it quite appealing. We can tell that this architecture is really quite uniform. There are a few $$conv$$ layers followed by a $$pooling$$ layer which reduces the height and width of a volume. If we look at a number of filters we use we can see that we have $$64$$ filters and then we double it to $$128$$ and then  to $$256$$ and in the last layers we use $$512$$ layers. The number of filter we use is roughly doubling on every step or doubling through every stack of $$conv$$ layers and that is another simple principle used to design the architecture of this network. The main downside was that it was a pretty large network in terms of the number of parameters to be trained. $$VGG-19$$ neural network which is bigger then $$VGG-16$$, but because $$VGG-16$$ does almost as well as the $$VGG-19$$ a lot of people will use $$VGG-16$$.

In the next post, we will talk more about Residual Network architecture.