#005 Machine Learning – Introduction to Neural Networks

#005 Machine Learning – Introduction to Neural Networks

Highlights: Welcome back to yet another post in our popular new tutorial series on Machine Learning. In the previous post, we studied the motivation, the fundamentals, the working and the implementation of a Logistic Regression Model.

In today’s post, we’ll take you through everything you need to know about Neural Networks. You will finally understand all the terms related to Neural Networks and Deep Learning that you keep hearing about these days. We will begin with a brief history of AI and then, dive deeper into the architecture of Neural Networks along with learning about the various types of Neural Networks. So, what are we waiting for? Let’s begin right away.

Tutorial Overview:

  1. A Brief History of AI
    1. Chess Program Beats World Champion for the First Time
  2. Neurons & The Brain
  3. What is a Neural Network?
  4. Types of Neural Networks

1. A Brief History of AI

Research in the field of Artificial Intelligence began in the 1950s, building on the work of the British mathematician Alan Turing, during World War II. 

With the development of the first computers in the 1950s, many mathematicians, psychologists, engineers, and other scientists were encouraged to actively start exploring the possibility of creating an artificial brain.

Research has shown that the brain is a network of interconnected cells and neurons. Neurons transmit information to each other using electrical impulses. Marvin Minsky, together with Dean Edmonds in 1951. constructed the first Artificial Neural Network.

The field of study got its name when a group of experienced researchers in the field of AI organized a conference at Dartmouth in 1956, where previous achievements and future directions of development were discussed. At the suggestion of the American scientist, John McCarthy (the winner of the Turing Award and the creator of the LISP programming language), the field of study was named Artificial Intelligence.

The 1950s also brought with it, the progress of Artificial Neural Networks. In 1957, the American scientist Frank Rosenblatt introduced the first algorithm called “Perceptron”, which enabled the application of Neural Networks to the problem of Classification. Based on a set of inputs from the outside world (image, sound, numbers, etc.), the Neural Network generates the output that represents the class to which the inputs belong.

Chess Program Beats World Champion for the First Time

In the 1960s, Artificial Intelligence found one of its applications in board games, more precisely in chess. Unfortunately, in those early years, machines could not match humans. However, things changed soon.

The rapid growth in computer processing power and memory capacity during the 1980s and 1990s increasingly narrowed the gap between top chess players and the best systems.

In 1997, an IBM supercomputer known as Deep Blue stunned the world by becoming the first machine to beat a reigning world chess champion in a six-game match. Deep Blue defeated the then-current world chess champion, Gary Kasparov, with 2 games to 1, with 3 draws. 

Today, it is impossible for a man to match a computer in chess. The best chess programs can easily beat the world’s best human chess players.

On December 5, 2017, the DeepMind team introduced a computer program AlphaZero, which within 24 hours of training achieved a superhuman level of play. It is considered that AlphaZero is the best chess program in the world today.

In the past 10 years that rapid progress has been made, due to a combination of three key factors – universal cloud computing, huge amounts of data, and major advances in Machine Learning. The most significant achievements of AI have been achieved in the field of autonomous cars and robotics.

Now that you know a bit about how AI evolved, let us understand the motivation behind the use of Neural Networks in Artificial Intelligence.

2. Neurons & The Brain

The original motivation behind Neural Networks was to create a software that could mimic how the human brain thinks and learns. Some of the biological motivations still remain similar to how we perceive artificial Neural Networks or Computer Neural Networks today. Let’s start by taking a look at how the brain works and how we can relate it to Neural Networks.

Let’s take a look at the following diagram that illustrates what neurons in a brain look like.

We can split a human neuron into three parts:

  • Dendrites receive information or signals from other neurons
  • Cell Body processes information coming from the different Dendrites
  • Axon sends the output signal to another neuron for the flow of information

So, we saw what a single neuron in the human brain looks like. As we already mentioned, the idea behind an Artificial Neural Network is to mimic the human brain. It uses a very simplified mathematical model of what a biological neuron does. Let’s take a look at the following image of a single neuron in the Artificial Neural Network.

In the image above, we can see that the neuron takes a number as input. Then, it performs some computation and outputs a number which serves as an input to the second neuron.

However, when we’re building an Artificial Neural Network or Deep Learning algorithm, rather than building one neuron at a time, we often want to simulate many such neurons at the same time.

These neurons will collectively input a few numbers, carry out some computation, and output some other numbers. The image shown below depicts an Artificial Neural Network.

Now, let’s dive more deeply into the details of how Neural Networks actually work. 

3. What is a Neural Network?

To illustrate how Neural Networks work, let’s start with an example. We’ll use an example from demand prediction wherein we’ll look at the product and try to answer this question – “Will this or won’t this product be a top seller?” Let’s take a look.

Here, we’re selling t-shirts and we would like to know if a particular t-shirt will be a top seller or not. You have collected the data for different t-shirts that were sold at different prices as well as the ones that became top sellers.

This type of application is used by retailers today in order to plan better inventory levels as well as marketing campaigns. If you know what’s likely to be a top seller, you would plan, for example, to just purchase more of that stock in advance.

In our example, the input feature \(x \) is the price of the t-shirt. This is the input for the learning algorithm. If you apply Logistic Regression to fit a Sigmoid Function to the data, then, the output of your prediction can be represented with this formula for \(a \).

As you can see, here, we use the term \(a \) to denote the output of this Logistic Regression algorithm. The term \(a \) stands for Activation, and it’s actually a term from Neuroscience. It refers to how much of an high output is one neuron sending to other neurons. This Logistic Regression algorithm can be thought of as a very simplified model of a single neuron in the brain. Have a look at the following image.

In the above representation, what the neuron does is that it asks us to input the price \(x \), and it outputs the value of \(a \), which is the probability of the T-shirt being a top seller. 

If this is a Neural Network with a single neuron, a much larger Neural Network is formed by taking many of these single neurons and stacking them together.

You can think of this neuron as a single Lego brick. You form a bigger neural network by stacking together many of these Lego bricks.

An example of a basic Neural Network with more features is illustrated in the following image. 

In the diagram above, the features are the price of the t-shirt, the shipping costs, the amount of marketing for that particular t-shirt, and the material quality. Interestingly, whether or not a t-shirt becomes a top seller actually depends on three factors:

  1. The affordability of this t-shirt,
  2. The degree of awareness that potential buyers have of this t-shirt, and,
  3. The perceived quality to potential bias saying that this is a high-quality t-shirt. 

Now, let us create an artificial neuron for each of these factors.

The first neuron considers affordability, which is mainly a function of price and shipping costs because the total amount of the payment is equal to the price plus the shipping costs. This is why, we are going to input the price and the shipping costs to the affordability neuron.

The second neuron determines awareness. Awareness in this case is mainly a function of the marketing of the t-shirt.

And finally, we will use a third neuron to estimate whether people perceive this product to be of high quality. This will primarily be a function of the price of the t-shirt and of the material quality.

Given these estimates of affordability, awareness, and the perceived quality, we, then, wire the outputs of these three neurons to another neuron on the right (as shown in the image above). This neuron outputs the probability of this t-shirt of being a top seller. 

In the terminology of Neural Networks, we’re going to group these three neurons together into what’s called a Layer. A Layer is a group of neurons that take as an input same or similar features. 

A Layer can have multiple neurons. It can also have a single neuron. This Layer on the right is also called the Output Layer because the output of this final neuron is the output probability predicted by the Neural Network.

In the terminology of Neural Networks, we’re also going to call affordability, awareness, and the perceived quality to act as Activations. The term Activations come from biological neurons, and it refers to the degree at which the biological neuron is sending a high output value or sending many electrical impulses to other neurons downstream from it. The values of affordability, awareness, and perceived quality are the Activations of these three neurons in this middle layer. In addition, this output probability is the Activation of the neuron shown on the right.

This particular Neural Network inputs four numbers. So, the middle layer of the Neural Network uses these four numbers to compute the new numbers, also called Activation Values. Then, the output layer of the Neural Network uses those three numbers to compute one number.

In a Neural Network, this list of four numbers is also called the Input Layer, and the middle layer is called the Hidden Layer. This is shown in detail in the figure below.

The way a Neural Network is implemented in practice is that each neuron in a certain layer will have access to every feature or to every value from the previous layer. This is why, in the Neural Network, every input feature is connected to every neuron in the Hidden Layer, as shown in the following image.

So, this is what a Neural Network looks like.

However, there are many kinds of Neural Networks in Deep Learning. Let’s learn about some of them in the next section.

4. Types of Neural Networks

Now, let’s focus on three important types of Neural Networks that form the basis for most of the pre-trained models in Deep Learning. These three types are:

  • Standard or Artificial Neural Networks (ANN)
  • Convolution Neural Networks (CNN)
  • Recurrent Neural Networks (RNN)

In the following image, we can see the representations of the above three Neural Networks.

This image has an empty alt attribute; its file name is Picture10-1-1024x291.jpg

Different types of Neural Networks are useful for different applications.

  • In the real estate application, we use a universally Standard Neural Network architecture.
  • For image applications, we often use a Convolutional Neural Network (CNN).
  • Audio, most of the times, is naturally-represented as a one-dimensional time series or as a one-dimensional temporal sequence. Hence, for sequence data, we often use a Recurrent Neural Network (RNN).
  • Be it English or Chinese, in any language, the alphabets or words appear one at a time. To represent language as a sequential data, RNNs are often used.
  • For more complex applications such as autonomous driving, wherein you have an image and radar info, it is possible to end up with a more custom or complex, Hybrid Neural Network architecture.

We’ve reached the end of this tutorial post where we learned all about Artificial Neural Networks. These Neural Networks are kind of like the mathematical representation of a human brain and our nervous system. We also studied the architecture of a Neural Network and learned about the various types of Neural Networks along with the kind of applications they are often used for. In the upcoming posts, we’ll go into further detail about how Neural Networks work with some in-depth mathematics. Before that, let’s quickly revise what we learnt today.

Machine Learning – Introduction to Neural Networks

  • The original motivation behind Neural Networks is to mimic the workings of the human brain
  • There are three types of neurons and three types of layers in a Neural Network – input, processing (hidden), and output
  • Neural Networks are often used to predict the answer of a particular problem or question
  • In a Neural Network, every input feature is connected to every neuron in the Hidden Layer
  • The three most popular Neural Networks are Standard Neural Networks (SNN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN)

Summary

Friends, are you liking our new blog series on Machine Learning? We’ve curated the most common, the most useful and the most interesting of topics in the line-up of tutorials. We hope you are learning a great deal, but most importantly, we hope you are practising even more by going to the DataHacker YouTube channel to learn more about such interesting stuff. Do let us know in the comments section, if you have any doubts, any queries or even if you simply want to chat with us. We’ll be happy to talk to you. We’ll be back on the blog with our new post on Machine Learning. Till then, take care! 🙂

References:

Advanced Learning Algorithms by Andrew Ng