#010 B How to train a shallow Neural Network with a Gradient Desecent?
In this post we will see how to build a shallow Neural Network in Python.
A Shallow Neural Network
First we will import all libraries that we will use it this code.
Then we will define our datasets. Those are two linearly non-separable datasets. To getreate them we can use either make_circles or make_moons function from Sci-kit learn.
We need to define activation functions that we will use in our code.
Following function initializes parameters that algorithm needs to learn. We will initialize parameters \( W_1 \) and \( W_2 \) with small random values to break simmetry, so we can initialize parameters \(b_1\) and \(b_2\) with zeros.
To make a forward pass through a neural network we will define following function.
Function compute_cost calculates cost when our neural network outputs \(\textbf{A}^{[2]}\) and our ground truth labels are \(\textbf{Y}\).
Function backward_pass calculates backpropagation step in a neural network, so the output of this function is a dictionary grads where we have put gradients for update of parameters.
To uprate parameters we will use function update_parameters. This functions updates parameters in every iteration of training.
Now, we will define a function NN_model, which will call all functions previously defined.
To make predictions we defined this function.
Training our neural network means learning parameters \(W_1, W_2, b_1 \) and \(b_2\). Function NN_model allows us to propagate through a neural network and to update parameters in every iteration. When this function ends we get the final values for these parameters and then we can use them on a training set to see how well our neural network classifies unseen data. Here is a code to get trained paremeters.
We will now make predictions on both on the training and the test set, and check results.
Now, we will define a few functions so that we can easily make some plots.
First, we will plot how does algorithm classifies training set, so we will see how will be classified examples which were used to learn parmeters \(W_1\) ,\(W_2\), \(b_1\) and \(b_2\).
Here we will see how algorithm does on the training set.
Now, we will vary number of iterations to see what is the optimal number of iterations for training the parameters.
Then we will vary number of units in a hidden layer, and see results.
Complete code you can see here.
In the next post we will learn why it is important to initialize parameters with small random values.