datahacker.rs@gmail.com

# #020 CNN Data Augmentation

## Data Augmentation

Most computer vision tasks could use more data and data augmentation is one of the techniques that is often used to improve the performance of computer vision systems. The computer vision is a pretty complicated task. For an input image we have to figure out what is in that picture and we need to learn a decently complicated function to do that. In practice, having more data will help  for almost all computer vision tasks. Today, the state of computer vision requires more data for the majority of computer vision problems. In general, this is not true for all applications of machine learning, but it does feel like it’s true for computer vision. When we’re training in a computer vision model, a data augmentation will often help. This is true whether we’re using transfer learning or whether we’re trying to train our model from scratch. Let’s take a look at the common data augmentation methods in a computer vision.

## Data augmentation methods in computer vision

### Mirroring

Perhaps the simplest data augmentation method is Mirroring along the vertical axis. If we have this example in our training set, we can flip it horizontally to get that image on the right. For most computer vision tasks if the left picture is a cat then mirroring is still a cat. Hence, if the mirroring operation preserves whatever we’re trying to recognize in the picture this would be a good data augmentation technique to use.

Data augmentation – Mirroring

### Random cropping

Another commonly used technique is Random cropping. In the given dataset we pick a few random crops. Random cropping isn’t a perfect data augmentation method. What if we randomly ended up taking a crop which doesn’t look much like the cat. In practice it works well as long as our random crops are reasonably large subset of the optional image.

Data augmentation – Random cropping

Mirroring and Random cropping are frequently used, and we could also use things like Rotation shearing of the image. There’s really no harm if we’re trying all of these things as well, although in practice they seem to be used a bit less, due to the complexity.

### Color shifting

Second type of data augmentation that is commonly used is Color shifting. For a picture below, let’s say we add to the red, green and blue channels different distortions. In this example we are adding to the red and blue channels and subtracting from the green channel.

Data augmentation – Color shifting

Red and blue make purple so this makes the first image in the picture above a bit more purple. That creates a distorted image for our training set. For illustration purposes we’re making somewhat dramatic changes to the colors. In practice, our R, G and B  values we sample from some probability distribution that can be quite small as well. That is, we take different values of R, G and B and use them to distort the color channels. In the second example we are making a less red and more green and more blue, so that turns the whole image a bit more yellowish. In the third example we are making the picture more blue. Motivation for this is that if maybe the sunlight was a bit yellower, or maybe the illumination was a bit more yellow, that could easily change the color of an image. But the identity of the cat or the identity of the content label $$y$$ should remain the same. Introducing these color distortions we make our learning algorithm more robust to changes in the color of our images.

There are different ways to sample R, G and B channel. One of the ways to implement color distortion uses an algorithm called PCA (Principal Components Analysis). The rough idea of PCA color augmentation is, for example, if our image is mainly purple, if it mainly has red and blue tint and very little green, then PCA color augmentation will add or subtract a lot of red and blue, but relatively little to green, so it preserves the overall color. The details can be found in the AlexNet paper or you can also find some open source implementations of PCA color augmentation and just use that.

### Implementing distortions during training

We might have our training data stored on a hard disk, we’ll use this round bucket symbol to represent our hard disk, and if we have a small training set we could do almost anything and they’ll be okay, but if we have a very large training set then this is how people will often implement it. Which is we might have a CPU thread that is constantly loading images from our hard disk, so we have this stream of images coming in from our hard disk.

Implementing distortions during training