GANs #001 Introduction to Generative Models
Highlights: In the last few years Generative Adversarial Networks – GANs have gained a lot of attention in both research and engineering circles. Generative Models today are what Conv Nets were 4-8 years ago. It is the hot topic of today. Here, you will learn why that is and how they have evolved in the last three decades.
1. Application of Generative Adversarial Networks – GANs?
Generative models are Deep learning-based approaches that belong to unsupervised learning. One of the most famous examples of GANs that astonished the public was a, so-called, DeepFake video, which showed how ex-president Obama addresses the nation. The video was so realistic that it was practically impossible for a human to distinguish that it was a fraud video. Similar videos went viral and people soon realized how far this field of AI has gone.
Many other GAN applications also gained attention of not only AI researchers, but of a broader community. The fake face generator developed by Nvidia, proved to be both fascinating…and scary. For a LinkedIN CTO, this news was quite painful, as a new war against fake accounts just started.
In addition, super-resolution and restoration videos went viral, as well as an “oldify” video editing methods.
Companies that develop photo editing software welcomed GANs and their development. With their help the painful development of the photo editing tools was now automated.
And you know what else is possible?
Have you heard of the first AI painting which sold for 432,500 USD?
Furthermore, this AI self-portrait (if we can call it that) posed a new question: Whose work is this? Is it the ownership of the researcher who has developed an AI model and developed a trained GAN? Or is it maybe the ownership of the user who generated the painting using this technology and sold it?
Big players have joined this race as well. For instance, Nvidia is domineering with their self-generated scenery images. These models are so sophisticated that they are able to generate photo-realistic images using only simple inputs, such as paintbrush sketches.
And it is not the end of it. The state of California has expressed its worries and issued a law which prohibits creation of politically related fake news videos 60 days prior to the election, unless it’s specifically stated that an image, audio or video has been manipulated.
In summary, we are witnessing a new technology that has already matured, but it’s also growing rapidly and produces more and more astonishing results. So, GANs have literally shaken our everyday reality!
Hence, it is an exciting time to study this new technology. It has been introduced by AI researchers ( Ian Goodfellow, 2014), but it is usually equipped with heavy math equations. Then, GANs were adopted by AI startups and companies. Now is the right time to present this new technology to the practitioners, coders and researchers who are not yet in the GANs area of activity.
The goal of this post series is to present Generative models so that the theoretical concepts are easily understood, whereas the code is easy to implement, experiment with and build upon.
2. History of GANs
The state-of-the-art results in GANs are really spectacular. Wait, but why now? Well, similar methods for detection of underlying generating principles within data have existed before. For instance, a well-known method PCA (Principal Component Analysis) is a century-old and it’s used for dimensionality reduction (1901).
This method searches to transform the original coordinates/basis of a data, to preserve the relation between data samples, with the goal to reduce the needed number of components to analyze them.
For instance, imagine that you’re trying to make a shadow puppet. You can think of a 3D model made by the hands and try to recognize the sign gesture demonstrated. This is happening in 3D, so we need three dimensions. And yet, we can project a shadow of the hands onto a wall as well with the help of a light. The wall is now a 2D space and the image also becomes two dimensional. Moreover, from this image, we can still conclude what the sign gesture is, if we position the hands carefully enough.
In addition, there are more sophisticated methods as compared to PCA, such as Independent Component Analysis. This field of research was very active starting from 1990s and into the 2000s. Several algorithms have been developed and are still quite popular: InfoMax, SOBI, JADE, robustICA, fastICA, etc. These methods can be seen as tools for solving the so-called cocktail party problem.
This problem assumes that we put several microphones in a room (let’s say three at different locations). There is a lot of noise in the room from different sources: people are talking over the piano playing, while someone is listening to breaking news on the TV. Each of these microphones will record a different mixture of these sources according to the proximity and loudness of the sound source. For instance, a microphone next to the piano will get the piano tune as a dominant sound, but it will also capture a chatter noise of the people talking. The fascinating solution of the ICA is that we can use the microphone recordings, process them and very accurately extract original sound sources.
The above video may give you more intuition into the Cocktail party problem and the applications of ICA. Moreover, PCA and ICA are also traditionally applied on the images of faces.
Check out this amazing comparison of different methods prior to the year 2010 for face image decomposition and generation. It is an official scikit library tutorial. These results were interesting, but far from perfect or realistic.
And what was the problem? Well, all these models were shallow! They were mainly linear (or nonlinear), but in the first decade of the 2000s Neural networks were almost a taboo topic to discuss. Support Vector Machines were dominating the scene at the time.
You know the story of the Deep Learning revolution that took place starting from 2012 (Alex net paper).
It allowed more complex visual features to be captured and expressed by using a higher number of convolutional layers stacked together in conv nets. In a similar way, this allowed autoencoders to extract significant features from an image and create the so-called latent variables. For instance, from MNIST or face dataset we can generate new latent variables. To some extent, you may think that the benefit of deep convolutional architectures is that you can better compress the information in the images, compared to the shallow models (e.g. PCA or ICA).
So, now that we’re equipped with a bit of the latest research results and a historical overview, we can proceed. In the GANs post series we will present the latest results related to:
- Generative modeling
- Variational Autoencoders
- Generative Adversarial Networks – GANs
- And their latest applications
These topics are regarded as intermediate or advanced level Computer Vision/Artificial Intelligence concepts. You need to have some basic Data Science programming (or as we call it, data hacking) skills in Python. Any prior experience with images and deep learning models is also useful for taking on these problems. In case that you need to master or need a refresher on these skills, we kindly recommended the following literature:
- Data Science Handbook:: Jake VanderPlas
- Deep Learning Coursera course:: Andrew NG
- Machine learning an algorithmic approach pdf
In addition, dataHacker.rs has similar tutorials so that you can easily grasp these concepts.
In the next post, we will start exploring some fascinating Generative Models – NEXT LINK.