datahacker.rs@gmail.com

#003 PyTorch – Shallow Neural Network in PyTorch 1.5

#003 PyTorch – Shallow Neural Network in PyTorch 1.5

Highlights: Welcome everyone! In this post we will learn how to use PyTorch for building a shallow neural network. If needed, we recommend that you go through a much detailed theoretical explanation about neural networks, and then, what a shallow neural network are. So, let’s start implementing one in PyTorch.

Image result for pytorch

Tutorial Overview:

  1. Define our imports
  2. Generating Data
  3. Visualization
  4. Splitting Dataset
  5. Define Model Structure
  6. Loss Function (Criterion) and Optimizer
  7. Model Training
  8. Make Predictions
  9. Visualize our Predictions
  10. Testing our Model

Download Code

Before we go over the explanation, you can download code from our GitHub repo

https://github.com/maticvl/dataHacker/blob/master/CNN/%23003%20PyTorch%20-%20Shallow%20Neural%20Network%20in%20PyTorch%201.3.ipynb

1. Define our imports

Building a Shallow Neural Network using PyTorch is relatively simple. First, let’s import our necessary libraries. We will import torch that will be used to build our model, NumPy for generating our input features and target vector, matplotlib for visualization. Finally, we will use sklearn for splitting our dataset and measuring the accuracy of our model.

# Necessary imports
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt
import matplotlib.cm as cm
%matplotlib inline

# This line detects if we have gpu support on our system
device = ("cuda" if torch.cuda.is_available() else "cpu")

2. Generating Data

As the next step, using NumPy we will create randomly generated samples representing 2 different classes for the independent features in our dataset, which are stored in the variable X and one for the target variable – y.

x1 = np.random.randn(2000)*0.5+3
x2 = np.random.randn(2000)*0.5+2

x3 = np.random.randn(2000) *0.5 + 4
x4 = np.random.randn(2000) *0.5 + 5

# Creating a Matrix
X_1 = np.vstack([x1, x2])
X_2 = np.vstack([x3, x4])
X = np.hstack([X_1, X_2]).T

# Creating a Vector that contains classes (0, 1)
y = np.hstack([np.zeros(2000), np.ones(2000)])

print(X.shape)
print(y.shape)
Output: 
(4000, 2) 
(4000,)

3. Visualization

To visualize the created dataset, matplotlib has a built-in function to create scatter plots called scatter(). A scatter plot is a type of plot that shows the data as a collection of points. The position of a point depends on its two-dimensional coordinates, where each value is a position on either the horizontal or vertical dimension. The parameter c represents the color marker. Here, we parsed in the argument y, so the color will be determined by the value of our target vector. Also, edgecolor represents the edge color of the marker which in our case w shortens to white.

plt.scatter(X[:,0], X[:,1], c=y, cmap=cm.coolwarm, edgecolors='w');
plt.title('Dataset')
plt.xlabel('feature 1')
plt.ylabel('feature 2')
Output:
dataset PyTorch

4. Splitting Dataset

Next we need to split our input features X into two separate sets – X_train and X_test. We will also split our target vector y into two sets y_train and y_test. Doing this using the sklearn library is straightforward. Let’s look at the code:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)

# converting the datatypes from numpy array into tensors of type float
X_train = torch.from_numpy(X_train).type(torch.FloatTensor)
X_test = torch.from_numpy(X_test).type(torch.FloatTensor)

y_train = torch.from_numpy(y_train.squeeze()).type(torch.LongTensor)
y_test = torch.from_numpy(y_test.squeeze()).type(torch.LongTensor)

# checking the shape
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
Output: 
torch.Size([3200, 2]) 
torch.Size([800, 2]) 
torch.Size([3200]) 
torch.Size([800])

If you have noticed above, the .squeeze() function is used when we want to remove single-dimensional entries from the shape of an array. Simply reducing the dimension into a rank-1 array.

5. Define Model Structure

To define our model structure we will be using the nn.module to build our neural network and gave it a class name ShallowNeuralNetwork. Then subclass it from nn.module. Once that’s done, we need to call the super.__init() method. By doing PyTorch will be able to keep track of what we are adding into the neural network.

class ShallowNeuralNetwork(nn.Module):
    def __init__(self, input_num, hidden_num, output_num):
        super(ShallowNeuralNetwork, self).__init__()
        self.hidden = nn.Linear(input_num, hidden_num) # hidden layer
        self.output = nn.Linear(hidden_num, output_num) # output layer
        self.sigmoid = nn.Sigmoid() # sigmoid activation function
        self.relu = nn.ReLU() # relu activation function
    
    def forward(self, x):
        x = self.relu(self.hidden(x)) 
        out = self.output(x)
        return out
    
    def predict(self, x):
        # apply softmax to output 
        predictions = self.sigmoid(self.forward(x))
        result = []
        # pick the class with the maximum weight
        for current_value in predictions:
            if current_value[0] > current_value[1]:
                result.append(0)
            else:
                result.append(1)
        return result

input_num = 2
hidden_num = 2
output_num = 2 # The output should be the same as the number of classes

model = ShallowNeuralNetwork(input_num, hidden_num, output_num)
model.to(device) # send our model to gpu if available else cpu. 
print(model)
Output: ShallowNeuralNetwork( 
(hidden): Linear(in_features=2, out_features=2, bias=True) 
(output): Linear(in_features=2, out_features=2, bias=True) (sigmoid): Sigmoid() (relu): ReLU() )

The nn.Linear() method is used to calculate the Linear transformation. It takes our input feature matrix X and multiplies it by weights and adds our bias terms. These parameters have been created by the object itself when called. All we need to do is to specify the size of the input and the output.

Next, we also want to create a relu function as the activation and then sigmoid for the output, since it’s a binary classification problem.

Now, all we need to do is to create a forward method, which receives an input tensor that passes through our hidden layer. This is going to be a linear transformation that we explained earlier, which is going to go through a relu activation, another linear layer which is our output layer and finally through a sigmoid activation. Then, we can look at the structure that we have defined for our network. This means how many neurons we have in each layer, along with the activation functions.

6. Loss Function (Criterion) and Optimizer

After the forward pass, a loss function is calculated from the target y_train and the prediction y_pred in order to update weights for the improved model selection in the further step. Setting up the loss function is a fairly simple step in PyTorch. Here, we will use the Crossentropy loss, or log loss. It measures the performance of a classification model whose output is a probability value between 0 and 1. We should note that the Crossentropy loss increases as the predicted probability diverge from the actual label.

Next, we will use Adam optimizer for the update of hyper-parameters. A function model.parameters() will provide the parameters to the optimizer and lr=0.01 defines the learning rate for Adam algorithm.

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr = 0.01)

7. Model Training

Our model is now ready to train. We begin by setting up an epoch size. Epoch is a single pass through the training dataset. In the example below, the number of epochs is set to 1000, meaning that there will be 1000 single passes of the training phase and weight updates.

# transfers our tensor from CPU to GPU 1 if CUDA is available
if torch.cuda.is_available():
    X_train = Variable(X_train).cuda()
    y_train = Variable(y_train).cuda()
    X_test = Variable(X_test).cuda()
    y_test = Variable(y_test).cuda()

num_epochs = 1000

for epoch in range(num_epochs):
    # forward propagation
    y_pred = model(X_train)
    loss = criterion(y_pred, y_train)
    
    # back propagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if epoch % 200 == 0:
        print('Epoch [{}/{}], Loss: {:.5f}'.format(epoch, num_epochs, loss.item()))
print('\nTraining Complete')
Output:
Epoch [0/1000], Loss: 0.67416 
Epoch [200/1000], Loss: 0.06665 
Epoch [400/1000], Loss: 0.02735 
Epoch [600/1000], Loss: 0.01536 
Epoch [800/1000], Loss: 0.01033 

Training Complete

After the forward pass and the loss computation is done, we do a backward pass, which refers to the process of learning and updating the weights. First, we first need to set our gradient to zero: optimizer.zero_grad(). This is because every time a variable is backpropagated through the network multiple times, the gradient will be accumulated instead of being replaced from the previous training step in our current training step. This will prevent our network to update its parameters properly. Then, we run a backward pass by loss.backward() and optimizer.step() which updates our parameters based on the current gradient.

8. Make Predictions

Now that when our model is trained, we can simply make a new prediction by passing in the X_test feature vector into our model:

model_prediction = model.predict(X_test)

X_test = X_test.cpu().numpy() # We are moving our tensors to cpu now
y_test = y_test.cpu().numpy()

model_prediction = np.array(model_prediction) # convert our predictions from list to numpy.array()
print("Accuracy Score on test data ==>> {}%".format(accuracy_score(model_prediction, y_test) * 100))
Output:
Accuracy Score on test data ==>> 100.0%

Finally, we get an accuracy score of 100%. If you are running this on your personal computer or through the interactive Google colab this accuracy score will vary constantly since the input features are randomly generated.

9. Visualize our Predictions

Finally, we may plot the result and visually compare the actual predictions with the predictions that our model has made.

model_prediction = model.predict(X_test)

X_test = X_test.cpu().numpy() # We are moving our tensors to cpu now
y_test = y_test.cpu().numpy()

model_prediction = np.array(model_prediction) # convert our predictions from list to numpy.array()
print("Accuracy Score on test data ==>> {}%".format(accuracy_score(model_prediction, y_test) * 100))
Output:
Accuracy Score on test data ==>> 100.0%

10. Testing our Model

We may also test our result, by picking two points from the diagram and pass the values into our model to see what our model will predict.

fig, ax = plt.subplots(2, 1, figsize=(12, 10))

# True Predictions
ax[0].scatter(X_test[y_test==0, 0], X_test[y_test==0, 1], label='Class 0', cmap=cm.coolwarm)
ax[0].scatter(X_test[y_test==1, 0], X_test[y_test==1, 1], label='Class 1', cmap=cm.coolwarm)
ax[0].set_title('Actual Predictions')
ax[0].legend()

# Models Predictions
ax[1].scatter(X_test[model_prediction==0, 0], X_test[model_prediction==0, 1], label='Class 0', cmap=cm.coolwarm)
ax[1].scatter(X_test[model_prediction==1, 0], X_test[model_prediction==1, 1], label='Class 1', cmap=cm.coolwarm)
ax[1].set_title('Our Model Predictions')
ax[1].legend()
Testing PyTorchModel

Summary

To sum it up, we have learned how to train our neural network model to make accurate predictions for our circle classification problem in PyTorch. We hope you enjoyed it. In the next post we will be working with the handwritten digit dataset.

We also provide an interactive Colab notebook which can be found here  Run in Google Colab

References: