## #012 B Building a Deep Neural Network from scratch in Python

In this post we will see how to implemet a deep Neural Network in Python from scratch. It isn’t something that we will do often in praxis, but it is good way to understand the inner workings of a Deep Learning.

First we will import libraries we will use in the following code.

In the following code we will define activation functions: \(sigmoid \) , \(ReLU\) and \(tanh\) we will also save values that we will need for the backward propagation step and that are \(Z \) values, and after that we will define function which will output \(\textbf{dZ}\). So, to be clear, when we calculate activation of any hidden unit or of a hidden layer and also caches the value of \( Z^{[l]} \) and we have set of functions called “backward” which outputs \( \textbf{dZ} \) values.

To calculate \(\textbf{dZ}^{[l]} \) we will use following formula, where \( g’^{[l]} \) is derivative of activation function in the layer \(l \):

\(\textbf{dZ}^{[l]} = \textbf{A}^{[l]} *g’^{[l]}(\textbf{Z}^{[l]}) \)

Function initialize_parameters_deep accepts a list of numbers of hidden units in each layer in our deep neural network. We will initialize all parameters with small random values. Dimension of a matrix \( \textbf{W}^{[l]} \) is \( (list[l], list[l-1])\), and dimension of \(b \) is \((list[l],1)\). In the previous post we have defined that dimension of matrices in the following manner:

\(W^{[l]} \) | \(( n[l], n[l-1] )\) |

\(b^{[l]} \) | \((n[l], 1)\) |

*Tabel of dimensions for parameters \( W^{[l]} \) and \(b^{[l]} \)*

So we need to iterate through a for loop to initialize values of parameters in every layer. We will put all values of parameters in a dictionary.

*Functions which makes calculations in a single node of a layer*

Function linear_forward(A_prev, W, b) takes activations from the previous layer and calculates Z value in the current layer. If we are calculating \(Z\) value in the first hidden layer then \(A_{prev} = A^{[0]} = X \) This function also chaces all values we used in this calculation. So we will put A, W and b in a tuple cache. We can say that this function makes calculation for the first half of each neuron.

Function linear_actiovation_forward takes as inputs activation from the previous layers A_prev, parameter for the layer in which we are calculating Z value, and type of activation function used in that layer. This function uses (or calls) two previously defined functions linear forward and linear_activation forward. this function returns calculated activations in the current layer and caches values: activations from the previous layer A_prev, parameters W and b used in that layer calculations and value Z.

To do a complete forward pass through a Neural Network we need to implement following finction.

If you are a bit confused about what each function does look at the following picture.

*L_model_forward*

The following function compute_cost when algorithm outputs AL values and ground thruth labels are Y.

To make calculations in the backward pass we will use the following equations:

\(\textbf{dW}^{[l]} = \textbf{dZ}^{[l]} \textbf{A}^{[l-1]T}\)

\(db^{[l]} = \frac{1}{m}np.sum(\textbf{dZ}^{[l]}, axis = 1, keepdims = True) \)

\(\textbf{dA}^{[l-1]} = \textbf{W}^{[l]T} \textbf{dZ}^{[l]} \)

Function linear_backward takes as input \(dZ\) and cached values from the back propagation step and outputs \(\textbf{dA_{prev}}, \textbf{dW}\) and \(db\).

To make calculations in the backward pass we will use following equations:

The following function represents one backward proapagation through a neural network. It outputs predictions calculated in the last (or the output) layer of a neural network. This function firstly calculates \(\textbf{dA^{[l]}} \) because we are doing a binary classification so in the output layer we have the \(sigmoid \) activation function and in all the other layers we will use the same activation function so that we can make backpropagation calculations with a for loop.

Notice also that a list of cached values is equal to the number of layers in a neural network, so we cached \(L \) tuples. When we start a backpropagation step we first use the last value from the list of tuples of cached values, and because indexing in Python goes from 0 then we use \((L-1)^{th} \) tuple, sto the last one.

To clariffy how these functions work look at the following picture.

*L_model_backward function *

To update parameters we will use update_parameters function.

We will define following functions to easily make plots.

Following code makes L-layes neural network as it call all previously defined functions.

To make predictions with learned parameters we will use function predict which is define in the following code.

In the next post, we will learn about computer vision.