## #009 TF An implementation of a Convolutional Neural Network in tf.keras – MNIST dataset

In this post we will see how we can classify handwritten digits using Convolutional Neural Network implemented in TensorFlow 2.0.

Required packages:

- Numpy
- Matplotlib
- Tensorflow
- Sklearn
- Seaborn

Table of Contents:

### 1. Load the digit dataset

Let start with importing all necessary libraries.

After imports, we can use imported module to load mnist data. The * load_data()* function will automatically download and split our data into train and test sets.

Let us check the shape of new data. We can also plot some digits to see how they look.

Now it is important to reshape our * X_train *and

*arrays to be of the shapes (60000, 28, 28, 1) and (10000, 28, 28, 1) respectively. This is done because our network will accept images of shape (1, 28, 28, 1) and no dimension can be*

**X_test***.*

**None** Many machine learning algorithms cannot analyze categorical data directly. That is, neurons usually output either 0 or 1. Hence, if we have a digit class going from “0” to “9” we will use 10 binary output neurons. This is known as a **one hot encoding**. [1]. Hence, if the output should be digit 5, the 6th neuron should output 1, and all the remaining should be zeros. Note, that the first neuron is active for a “zero” digit.

### 2. Implementing a Neural Network

When all data is loaded and prepared, it is time to create a model. We will use a simple Sequential API in order to do this. Mnist dataset is not too complicated, so there is no need to create a complicated network. We will make use of just two convolutional layers followed by max-pooling layers and batch normalization. To make predictions, we will first flatten the output of the previous layers and add two fully connected layers with the softmax activation function on the last one.

To make this work in Keras we need to compile a model. An important choice to make is the loss function. We use the **categorical_crossentropy** loss because it measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). In other words, this loss function is used to solve a multi-class classification problem.

For optimization algorithm, let’s choose **Adam**, which is a combination of RMSprop and Stochastic Gradient Descent with momentum.

After creating a model, we need to train its parameters to make it powerful. Let’s trains the model for a given number of epochs.

After training, we can use our model to make predictions. Evaluating the model on training and test sets will give us both loss and accuracy values.

To see where our model makes a lot of mistakes, let’s make use of a confusion matrix, which is also known as an error matrix. Here, on the left side we have a true value, while on the bottom we have predicted value. Their intersection represents how much these pairs we have. The higher the diagonal values of the confusion matrix the better, indicating many correct predictions.

We can also save the weights of our trained model for later use by calling the **save_weights** function.

### 3. Visualization and Testing

Now we can plot some predictions, to see how our model works.

### Summary

In the next post we will learn how to improve model performance using data augmentation techniques.