#020 CNN Data Augmentation
Most computer vision tasks could use more data and data augmentation is one of the techniques that is often used to improve the performance of computer vision systems. The computer vision is a pretty complicated task. For an input image we have to figure out what is in that picture and we need to learn a decently complicated function to do that. In practice, having more data will help for almost all computer vision tasks. Today, the state of computer vision requires more data for the majority of computer vision problems. In general, this is not true for all applications of machine learning, but it does feel like it’s true for computer vision. When we’re training in a computer vision model, a data augmentation will often help. This is true whether we’re using transfer learning or whether we’re trying to train our model from scratch. Let’s take a look at the common data augmentation methods in a computer vision.
Data augmentation methods in computer vision
Perhaps the simplest data augmentation method is Mirroring along the vertical axis. If we have this example in our training set, we can flip it horizontally to get that image on the right. For most computer vision tasks if the left picture is a cat then mirroring is still a cat. Hence, if the mirroring operation preserves whatever we’re trying to recognize in the picture this would be a good data augmentation technique to use.
Data augmentation – Mirroring
Another commonly used technique is Random cropping. In the given dataset we pick a few random crops. Random cropping isn’t a perfect data augmentation method. What if we randomly ended up taking a crop which doesn’t look much like the cat. In practice it works well as long as our random crops are reasonably large subset of the optional image.
Data augmentation – Random cropping
Mirroring and Random cropping are frequently used, and we could also use things like Rotation shearing of the image. There’s really no harm if we’re trying all of these things as well, although in practice they seem to be used a bit less, due to the complexity.
Second type of data augmentation that is commonly used is Color shifting. For a picture below, let’s say we add to the red, green and blue channels different distortions. In this example we are adding to the red and blue channels and subtracting from the green channel.
Data augmentation – Color shifting
Red and blue make purple so this makes the first image in the picture above a bit more purple. That creates a distorted image for our training set. For illustration purposes we’re making somewhat dramatic changes to the colors. In practice, our R, G and B values we sample from some probability distribution that can be quite small as well. That is, we take different values of R, G and B and use them to distort the color channels. In the second example we are making a less red and more green and more blue, so that turns the whole image a bit more yellowish. In the third example we are making the picture more blue. Motivation for this is that if maybe the sunlight was a bit yellower, or maybe the illumination was a bit more yellow, that could easily change the color of an image. But the identity of the cat or the identity of the content label \(y \) should remain the same. Introducing these color distortions we make our learning algorithm more robust to changes in the color of our images.
There are different ways to sample R, G and B channel. One of the ways to implement color distortion uses an algorithm called PCA (Principal Components Analysis). The rough idea of PCA color augmentation is, for example, if our image is mainly purple, if it mainly has red and blue tint and very little green, then PCA color augmentation will add or subtract a lot of red and blue, but relatively little to green, so it preserves the overall color. The details can be found in the AlexNet paper or you can also find some open source implementations of PCA color augmentation and just use that.
Implementing distortions during training
We might have our training data stored on a hard disk, we’ll use this round bucket symbol to represent our hard disk, and if we have a small training set we could do almost anything and they’ll be okay, but if we have a very large training set then this is how people will often implement it. Which is we might have a CPU thread that is constantly loading images from our hard disk, so we have this stream of images coming in from our hard disk.
Implementing distortions during training
Next, we can do is to use a CPU thread to implement the distortions ( Random cropping , Color shifting or the Mirroring) but for each image we might then end up with some distorted version of it. This image in the picture above, we’re going to mirror it and we can also implement Random cropping (or any other data augmentation technique). CPU thread is constantly loading data as well as implementing whatever distortions are needed to form a batch, or really mini batches of data, and that data is then constantly passed to some other thread or some other process for training. This could be done on the CPU on the GPU if we have a large neural network to train. A pretty common way of implementing data augmentation is to have one thread or multiple threads that is responsible for loading the data and implementing distortions, and then passing that to some other thread or some other process that then does the training. These two tasks can run in parallel.
“ImageNet Classification With Deep Convolutional Neural Network”
We employ two distinct forms of data augmentation, both of which allow transformed images to be produced from the original images with very little computation, so the transformed images do not need to be stored on disk.
In our implementation, the transformed images are generated in Python code on the CPU while the GPU is training on the previous batch of images. So these data augmentation schemes are, in effect, computationally free.
That’s it for data augmentation! Similar to other parts of training a deep neural network, the data augmentation process also has a few hyperparameters such as how much color-shifting do we implement and exactly what parameters we use for random cropping. A good place to get started with these techniques might be to use someone else’s open-source implementation for data augmentation.
In the next post, we will talk more about Object Localization and Detection.