#003 RNN – Architectural Types of Different Recurrent Neural Networks

Strahinja Zivkovic Deep Learning 24.09.2020 | 0

Highlights: Recurrent Neural Networks (RNN) are sequence models that are a modern, more advanced alternative to traditional Neural Networks. Right from Speech Recognition to Natural Language Processing to Music Generation, RNNs have continued to play a transformative role in handling sequential datasets.

In this blog post, we will explore the various models and architectural categories of Recurrent Neural Networks. We will learn from real-life cases how these different RNN categories solve day to day problems and help build simplified RNN models for a variety of applications.

Tutorial Overview:

Basic Classification of RNN Models
Applications of RNN in real life scenarios
Many-to-Many Architecture
Many-to-One Architecture
One-to-One Architecture
One-to-Many Architecture

1. Classification of RNN Models

By now, I’m sure, you must have understood the fundamentals of Recurrent Neural Networks, their basic architecture, and the computational representation of RNN’s forward and backpropagation techniques.

The one major point we have been discussing since our previous post is that in our basic RNN models, we have, so far, considered the input and output sequences to be of equal lengths. While we started off with equal lengths for the sake of ease of understanding concepts, we must now venture into the various other possibilities that may arise in real-life scenarios and problems.

To classify in the most simplistic manner, we can categorize RNN architectures broadly into these four buckets:

Many Inputs ó Many Outputs
Many Inputs ó One Output
One Input ó One Output
One Input ó Many Outputs

Going ahead in this blog post, we will learn about the above four categories in detail, with some explanatory examples. So, let’s get going.

2. Applications of RNN in Real-Life Scenarios

Up until now, we have come across RNN architectures where the number of inputs x is equal to the number of outputs y. Let’s revise our list of some practical examples we saw in an earlier post and understand how RNN architectures differ in each case.

Up It is not necessary that we will always find ourselves in an ideal scenario where the input \(x \) and output \(y \) are of the same type, and \(T_{x} \) and \(T_{y} \) will be equal.

Take, for example, Music Generation. Here, the length of input \(T_{x} \) can be 1 or it may well be an empty set. In the case of Sentiment Classification, the output \(y \) ranges from integral values of 1 to 5. Observe how Name Entity Recognition is our original scenario where the length of both input and output is the same. Having said that, there are some problems with the Name Entity Recognition where the input-output lengths can be different as well. Looking at the case of Machine Translation, both input and output sequence can have varying lengths since the input is say, a French sentence and the output is the translated English sentence. The meaning, no doubt, is the same but the length of the sequences is different.

Such are the possibilities that can arise in the case of RNN architectures, however, there are established ways that define how to tackle these cases by modifying the basic RNN architecture.

Note: Some of the figures and facts from this particular post have taken inspiration from another blog post by Andre Coffee which is titled “Unreasonable effectiveness of recurrent neural networks”.

Let us go one by one in understanding the four major categories of RNN architectures.

3. Many-to-Many Architecture

Well, this one is easy because we have seen this before. This is exactly what we learned in our previous posts. A Recurrent Neural Network where the input and output lengths are equal \(T_{x}= T_{y} \).

The input sequence starts from \(x^{\left \langle 1 \right \rangle} \), \(x^{\left \langle 2 \right \rangle} \) up to \(x^{\left \langle T_{x} \right \rangle} \) and the output sequence computed is \(\hat{y}^{\left \langle 1 \right \rangle} \), \(\hat{y}^{\left \langle 2 \right \rangle} \) and so on up to \(\hat{y}^{\left \langle T_{y} \right \rangle}\). Since there are multiple inputs which result in multiple outputs, we can call it a Many-to-Many architecture.

In the above example, the input and the output sequence lengths are equal. However, within Many-to-Many architectures, there are examples where these input and output lengths are different. Machine Translation is one such real-life case where the input sentence says, in French, and the output sentence is a translated English one, which may have a different number of words than the original sentence but of course, the meaning remains the same.

Such neural networks have two distinct portions – the Encoder and the Decoder. The Encoder is what takes the input in French and the Decoder is where the sentence is read and translated into a different language.

Now, let us look at another example where the inputs are many but the output is singular.

4. Many-to-One Architecture

We will take the case of Sentiment Classification to explain this category of RNN models where there are multiple inputs but only one output value.

In our example for Sentiment Classification, we learned how movie reviews can be turned into a star rating. Here, the input \(x \) is a piece of movie review text which says “Decent effort. The plot could have been better.” Hence, the input is a sequence of multiple word inputs. Now, we could predict output \(y \) in two ways – one, using only 0 and 1 as output values categorizing the movie review as either Positive or Negative. And, second, using values from 1 to 5 in which case our example would qualify as neither a bad nor an excellent review, but a mixed review. Thus, the output value will be something like 3 stars. This is what we call Many-to-One architecture.

5. One-to-One Architecture

This one is as basic as it gets. A single input that predicts a single output forms what we call a One-to-One architecture. It is the most standard Neural Network there can be and is quite self-explanatory. An important thing to note in One-to-One architectures is that you don’t really need an activation value \(a \), incoming or outgoing, as this is a very simple scenario of Input IN and output OUT.

Moving on to the last category of RNN architectures where a single input predicts a sequence of outputs.

6. One-to-ManyArchitecture

A single input that results in multiple output values or an output sequence is called a One-to-Many architecture. This particular architecture can be found in the Music Generation problems.

Say, you are given an integral input \(x \), which tells the network what genre of music you want, or the first note of the music that you like. It can even be a null input \(x \) where you don’t feed anything and want the network to randomly generate some music, in which case the input \(x \) will just be a vector of zeros. In such cases, once the input \(x \) is fed into the neural network, no other input is given for the entire propagation process. Only the activation values that predict outputs at each time step and multiple outputs predicted are received until the last note of the musical piece is synthesized.

Let us summarise all the categories of RNN architectures we learned so far in a compiled graphical format. Have a look.

Notice how each category of RNN architecture differs in execution from the other. These are the basic building blocks of all Recurrent Neural Networks that exist, apart from some subtle variations in sequence generation, which we will learn in the due course of time.

We can create a wide range of RNN models using the above categorization of RNNs. I hope you will research more real-life examples of these four types of RNN architectures. Do write your comments below citing more cases, especially where input and output sequences are of varying lengths. In the upcoming posts, we will learn more about sequence generation, implementing a Language Model, and how to sample novel sequences, among much more interesting information. So, stay tuned!

Architectural Classification of Recurrent Neural Networks

Basic categorization based on input and output quantities
Four main types of RNNs – Many-to-Many, Many-to-One, One-to-One, and One-to-Many
Not all types of RNNs have input and output sequences with equal lengths
Machine Translation is a Many-to-Many architecture
Sentiment Classification is a Many-to-One architecture
Music Generation is a One-to-Many architecture
One-to-One architecture is the most standard form of neural network

Summary

Once you dive deep into sequence generation and deeper model creation, you will be equipped to handle such cool problem sets like Music Generation and Machine Translation by yourself. Understanding what differentiates these different architectures lays the foundation of decision making when it comes to RNN model and sequence generation. I hope my posts are keeping you well-informed as well as well-entertained (through my examples). If not, you can always leave your suggestions and I promise to work on them. Catch you in my next blog post, ok? See you ?

#003 RNN – Architectural Types of Different Recurrent Neural Networks