## #003 PyTorch – Shallow Neural Network in PyTorch 1.5

*Highlights*: Welcome everyone! In this post we will learn how to use PyTorch for building a shallow neural network. If needed, we recommend that you go through a much detailed theoretical explanation about neural networks, and then, what a shallow neural network are. So, let’s start implementing one in PyTorch.

Tutorial Overview:

- Define our imports
- Generating Data
- Visualization
- Splitting Dataset
- Define Model Structure
- Loss Function (Criterion) and Optimizer
- Model Training
- Make Predictions
- Visualize our Predictions
- Testing our Model

### Download Code

Before we go over the explanation, you can download code from our GitHub repo

## 1. Define our imports

Building a Shallow Neural Network using PyTorch is relatively simple. First, let’s import our necessary libraries. We will import torch that will be used to build our model, NumPy for generating our input features and target vector, matplotlib for visualization. Finally, we will use sklearn for splitting our dataset and measuring the accuracy of our model.

```
# Necessary imports
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import matplotlib.cm as cm
%matplotlib inline
# This line detects if we have gpu support on our system
device = ("cuda" if torch.cuda.is_available() else "cpu")
```

## 2. Generating Data

As the next step, using NumPy we will create randomly generated samples representing 2 different classes for the independent features in our dataset, which are stored in the variable `X `

and one for the target variable – `y`

.

```
x1 = np.random.randn(2000)*0.5+3
x2 = np.random.randn(2000)*0.5+2
x3 = np.random.randn(2000) *0.5 + 4
x4 = np.random.randn(2000) *0.5 + 5
# Creating a Matrix
X_1 = np.vstack([x1, x2])
X_2 = np.vstack([x3, x4])
X = np.hstack([X_1, X_2]).T
# Creating a Vector that contains classes (0, 1)
y = np.hstack([np.zeros(2000), np.ones(2000)])
print(X.shape)
print(y.shape)
```

```
Output:
(4000, 2)
(4000,)
```

## 3. Visualization

To visualize the created dataset, matplotlib has a built-in function to create scatter plots called `scatter()`

. A **scatter plot** is a type of **plot** that shows the data as a collection of points. The position of a point depends on its two-dimensional coordinates, where each value is a position on either the horizontal or vertical dimension. The parameter `c`

represents the color marker. Here, we parsed in the argument `y`

, so the color will be determined by the value of our target vector. Also, **edgecolor** represents the edge color of the marker which in our case `w`

shortens to **white**.

```
plt.scatter(X[:,0], X[:,1], c=y, cmap=cm.coolwarm, edgecolors='w');
plt.title('Dataset')
plt.xlabel('feature 1')
plt.ylabel('feature 2')
```

`Output:`

## 4. Splitting Dataset

Next we need to split our input features `X `

into two separate sets – `X_train `

and `X_test`

. We will also split our target vector `y `

into two sets `y_train `

and `y_test`

. Doing this using the sklearn library is straightforward. Let’s look at the code:

```
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)
# converting the datatypes from numpy array into tensors of type float
X_train = torch.from_numpy(X_train).type(torch.FloatTensor)
X_test = torch.from_numpy(X_test).type(torch.FloatTensor)
y_train = torch.from_numpy(y_train.squeeze()).type(torch.LongTensor)
y_test = torch.from_numpy(y_test.squeeze()).type(torch.LongTensor)
# checking the shape
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
```

```
Output:
torch.Size([3200, 2])
torch.Size([800, 2])
torch.Size([3200])
torch.Size([800])
```

If you have noticed above, the `.squeeze()`

function is used when we want to remove single-dimensional entries from the shape of an array. Simply reducing the dimension into a rank-1 array.

## 5. Define Model Structure

To define our model structure we will be using the `nn.module`

to build our neural network and gave it a class name ShallowNeuralNetwork. Then subclass it from nn.module. Once that’s done, we need to call the `super.__init()`

method. By doing PyTorch will be able to keep track of what we are adding into the neural network.

```
class ShallowNeuralNetwork(nn.Module):
def __init__(self, input_num, hidden_num, output_num):
super(ShallowNeuralNetwork, self).__init__()
self.hidden = nn.Linear(input_num, hidden_num) # hidden layer
self.output = nn.Linear(hidden_num, output_num) # output layer
self.sigmoid = nn.Sigmoid() # sigmoid activation function
self.relu = nn.ReLU() # relu activation function
def forward(self, x):
x = self.relu(self.hidden(x))
out = self.output(x)
return out
def predict(self, x):
# apply softmax to output
predictions = self.sigmoid(self.forward(x))
result = []
# pick the class with the maximum weight
for current_value in predictions:
if current_value[0] > current_value[1]:
result.append(0)
else:
result.append(1)
return result
input_num = 2
hidden_num = 2
output_num = 2 # The output should be the same as the number of classes
model = ShallowNeuralNetwork(input_num, hidden_num, output_num)
model.to(device) # send our model to gpu if available else cpu.
print(model)
```

```
Output: ShallowNeuralNetwork(
(hidden): Linear(in_features=2, out_features=2, bias=True)
(output): Linear(in_features=2, out_features=2, bias=True) (sigmoid): Sigmoid() (relu): ReLU() )
```

The `nn.Linear()`

method is used to calculate the Linear transformation. It takes our input feature matrix** **`X `

and multiplies it by weights and adds our bias terms. These parameters have been created by the object itself when called. All we need to do is to specify the size of the input and the output.

Next, we also want to create a `relu `

function as the activation and then `sigmoid `

for the output, since it’s a binary classification problem.

Now, all we need to do is to create a forward method, which receives an input tensor that passes through our hidden layer. This is going to be a linear transformation that we explained earlier, which is going to go through a `relu `

activation, another linear layer which is our output layer and finally through a sigmoid activation. Then, we can look at the structure that we have defined for our network. This means how many neurons we have in each layer, along with the activation functions.

## 6. Loss Function (Criterion) and Optimizer

After the forward pass, a loss function is calculated from the target `y_train`

and the prediction `y_pred`

in order to update weights for the improved model selection in the further step. Setting up the loss function is a fairly simple step in PyTorch. Here, we will use the **Cross**–**entropy loss**, or log **loss**. It measures the performance of a classification model whose output is a probability value between 0 and 1. We should note that the **Cross**–**entropy loss** increases as the predicted probability diverge from the actual label.

Next, we will use Adam optimizer for the update of hyper-parameters. A function `model.parameters()`

will provide the parameters to the optimizer and `lr=0.01`

defines the learning rate for Adam algorithm.

```
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr = 0.01)
```

## 7. Model Training

Our model is now ready to train. We begin by setting up an epoch size. Epoch is a single pass through the training dataset. In the example below, the number of epochs is set to 1000, meaning that there will be 1000 single passes of the training phase and weight updates.

```
# transfers our tensor from CPU to GPU 1 if CUDA is available
if torch.cuda.is_available():
X_train = Variable(X_train).cuda()
y_train = Variable(y_train).cuda()
X_test = Variable(X_test).cuda()
y_test = Variable(y_test).cuda()
num_epochs = 1000
for epoch in range(num_epochs):
# forward propagation
y_pred = model(X_train)
loss = criterion(y_pred, y_train)
# back propagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if epoch % 200 == 0:
print('Epoch [{}/{}], Loss: {:.5f}'.format(epoch, num_epochs, loss.item()))
print('\nTraining Complete')
```

```
Output:
Epoch [0/1000], Loss: 0.67416
Epoch [200/1000], Loss: 0.06665
Epoch [400/1000], Loss: 0.02735
Epoch [600/1000], Loss: 0.01536
Epoch [800/1000], Loss: 0.01033
Training Complete
```

After the forward pass and the loss computation is done, we do a backward pass, which refers to the process of learning and updating the weights. First, we first need to set our gradient to zero: `optimizer.zero_grad()`

. This is because every time a variable is backpropagated through the network multiple times, the gradient will be accumulated instead of being replaced from the previous training step in our current training step. This will prevent our network to update its parameters properly. Then, we run a backward pass by `loss.backward()`

and `optimizer.step()`

which updates our parameters based on the current gradient.

## 8. Make Predictions

Now that when our model is trained, we can simply make a new prediction by passing in the X_test feature vector into our model:

```
model_prediction = model.predict(X_test)
X_test = X_test.cpu().numpy() # We are moving our tensors to cpu now
y_test = y_test.cpu().numpy()
model_prediction = np.array(model_prediction) # convert our predictions from list to numpy.array()
print("Accuracy Score on test data ==>> {}%".format(accuracy_score(model_prediction, y_test) * 100))
```

Output: Accuracy Score on test data ==>> 100.0%

Finally, we get an accuracy score of 100%. If you are running this on your personal computer or through the interactive Google colab this accuracy score will vary constantly since the input features are randomly generated.

## 9. Visualize our Predictions

Finally, we may plot the result and visually compare the actual predictions with the predictions that our model has made.

```
model_prediction = model.predict(X_test)
X_test = X_test.cpu().numpy() # We are moving our tensors to cpu now
y_test = y_test.cpu().numpy()
model_prediction = np.array(model_prediction) # convert our predictions from list to numpy.array()
print("Accuracy Score on test data ==>> {}%".format(accuracy_score(model_prediction, y_test) * 100))
```

```
Output:
Accuracy Score on test data ==>> 100.0%
```

## 10. Testing our Model

We may also test our result, by picking two points from the diagram and pass the values into our model to see what our model will predict.

```
fig, ax = plt.subplots(2, 1, figsize=(12, 10))
# True Predictions
ax[0].scatter(X_test[y_test==0, 0], X_test[y_test==0, 1], label='Class 0', cmap=cm.coolwarm)
ax[0].scatter(X_test[y_test==1, 0], X_test[y_test==1, 1], label='Class 1', cmap=cm.coolwarm)
ax[0].set_title('Actual Predictions')
ax[0].legend()
# Models Predictions
ax[1].scatter(X_test[model_prediction==0, 0], X_test[model_prediction==0, 1], label='Class 0', cmap=cm.coolwarm)
ax[1].scatter(X_test[model_prediction==1, 0], X_test[model_prediction==1, 1], label='Class 1', cmap=cm.coolwarm)
ax[1].set_title('Our Model Predictions')
ax[1].legend()
```

## Summary

To sum it up, we have learned how to train our neural network model to make accurate predictions for our *circle *classification problem in PyTorch. We hope you enjoyed it. In the next post we will be working with the handwritten digit dataset.

We also provide an interactive Colab notebook which can be found here Run in Google Colab