Build Your First Nueral Network: Basic Image Classification Using Keras

Image classification is one of the most important problem to solve in machine learning. It can provide vital solutions to a variety of computer vision problems, such as face recognition, character recognition, object avoidance in autonomous vehicles and many others. Convolutional Neural Network (CNN), since its inception has been used for image classification and other computer vision problems. It is called convolutional neural network because of convolutional layer. Keras is a high level library which provides an easy way to get started with machine learning and neural networks. It will be used here to implement CNN to classify handwritten digits of MNIST dataset.

Image Classification is a process to determine which of the given classes an input image belongs to. CNNs represent a huge breakthrough in image classification. In most cases, CNN outperforms other image classification methods and provides near to human-level accuracy. CNN models do not simply spit the class name the input image belongs to, rather it gives a list of probabilities. Each entry in the list shows the likelihood that the input image belong to a certain class. For example, if we have two classes in a dataset of "cats and dogs" images, a CNN model gives us two probabilities. One to show the likelihood or probability of the input image to belong to "dog" class and the other depicts the the probability that the image might belong to "cat" class.

There are four basic parts of any neural network model.

Network architecture
Loss function
Optimizer
Regularizer.

1. Network architecture

Network architecture refers to the organization of layers in the network and the structure of each layer. It also shows the connectivity between the node of one layer to the nodes of next layer. A node is like a basic functional unit used repeatedly in a layer. A CNN model usually has convolutional layers, pooling layers, dropout layers and fully connected layers.

Convolutional layers extract different features, also called activations or feature maps, from images at different levels while pooling layer down samples and summarizes these features. Dropout out layer is a regularization technique which prevents model to overfit the training data.

2. Loss function

Loss function, also called cost function, calculates the cost of the network during each iteration in training phase. Cost or loss of a neural network refers to the difference between actual output and output predicted by the model. It tells how good the network performed during that iteration. The purpose of the training phase is to minimize this loss value. The only way to minimize loss value meaningfully is to change weights in each layer of the network. It is done with the help of optimizer.

Examples of loss functions include Mean Squared Error and Cross-Entropy loss which give best performance at classification problems.

3. Optimizer

An optimizer is basically an optimization algorithm which helps to minimize or maximize an objective function. In neural networks it is used to find minima of the loss function. Based on the loss value and existing weights, gradients are calculated which tell us in which direction (positive or negative) to update the weights and the amount by which the weights are supposed to change. These calculated gradients are propagated back throughout the network by optimizer.

There are different types of optimizers. Few of the popular optimizers are Adam and different variations of Gradient Decent algorithm. Each of these is suitable for different scenarios. However, Adam (adaptive momentum) is widely used for classification problems due to its speed and accuracy in finding local minima of the loss function.

4. Regularizer

Regularizer is not a mandatory component of a neural network but it is a good practice to use one because it prevent model from overfitting. Overfitting means larger generalization error. An overfit model performs extremely accurate on training data. However, it performs poorly on the data that is has never seen before. There are different regularization techniques such as dropout, L1 and L2 regularization. To prevent our model overfit training data, we will add a dropout layer to it.

That's enough for theory. Let's see the code stepwise.

1. Import keras library:

import keras

2. Load MNIST dataset:

Keras provides an easy to use API to download the basic datasets like: MNIST, Cifar10, Cifar100, Fashion MNIST. It will take just two lines to load the entire dataset in local memory.

mnist = keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

3. Define some global variables

batch_size = 200

epochs=5

input_shape = [-1, 28,28,1]

4. Pro-process data

In pre-processing, we will only normalize images, convert labels to categorical format (also called one-hot encoding), and reshape images. Normalization brings pixel values in the range of 0-1. It is not necessary but it helps to improve accuracy. However, labels need to be converted to categorical format, because there are 10 classes in MNIST and as we have discussed in introductory section above, CNN gives a list of probabilities.

x_test = x_test/255.0

x_train = x_train/255.0

MNIST labels are single digits ranging from 0-9. In one-hot encoding, each digit is converted to an array of 10 values having 1 only at the digit itself as index of the array. For example 2 is converted to [0,0,1,0,0,0,0,0,0,0] and 3 is converted to [0,0,0,1,0,0,0,0,0,0,0].

One-hot encoding actually tells the model that for instance for an image of digit 3, you should give maximum probability at 3rd index. It sounds a little hard but keras has a utils module which saves us time.

y_train = keras.utils.to_categorical(y_train)

y_test = keras.utils.to_categorical(y_test)

CNN consider number of channels too in convolution operations and MNIST image are provided in 28x28 format. All these images are grayscale and it has only one channel so we will convert it to [-1, 28, 28, 1]. -1 here means that reshape all the images in array.

If you don't understand, don't worry about it—Legendary Andrew Ng

x_train = x_train.reshape(input_shape)

x_test = x_test.reshape(input_shape)

5. Build model

Here is where we define our network architecture. Keras' Sequential model API is pretty easy to understand. It creates a model but stacking layers over each other in the order they are provided. All we need to do it to create an object of Sequential class and add layers to it using add method. There is also an option to add layers at the constructor but I prefer to use add method. It gives a clue how the input pass through the network.

model = keras.Sequential()

model.add(keras.layers.Conv2D(6, (3,3), activation=keras.activations.relu,  

           input_shape=[28,28,1]))

model.add(keras.layers.MaxPool2D())

model.add(keras.layers.Conv2D(16, (3,3), activation=keras.activations.relu))

model.add(keras.layers.MaxPool2D())

model.add(keras.layers.Conv2D(120, (3,3), activation=keras.activations.relu))

model.add(keras.layers.Flatten())

model.add(keras.layers.Dense(84, activation=keras.activations.relu))

model.add(keras.layers.Dropout(0.5))

model.add(keras.layers.Dense(10, activation=keras.activations.softmax))

Remember what is an optimizer and loss function? We definitely need optimizer to update weights and loss function to calculate the cost or loss of the network during training phase.

optimizer = keras.optimizers.adam()

model.compile(optimizer=optimizer, loss=keras.losses.categorical_crossentropy,

              metrics=['accuracy'])

6. Training

Our model is now ready to enter the training phase. We will call fit function and provide the training data we want our model to fit. There are some other information needed such as batch size, number of epochs and verbose.

model.fit(x_train, y_train, batch_size, epochs, 1)

7. Testing

Once, all the epochs are completed and the training phase ends we evaluates our model to know how good it is at classification.

results = model.evaluate(x_test, y_test, batch_size, 0)

print('{}: {:.2f}, {}: {:.2f}'.format(model.metrics_names[0], results[0],\

        model.metrics_names[1], results[1]))

8. Save trained model

In order to use the trained model next time for classification, it needs to be saved because it is insane to retrain a model each time we need it to use.

model.save('model.h5')

To use the already trained and saved model, it is loaded using keras' load_model function. If you have a saved model, you don't need step 5 and 6.

new_model = keras.models.load_model('model.h5')

Note: In this post, I have skipped some details to make things easy to understand. However, we will see those details in upcoming posts.

If you have any issue with the code, feel free to ask in the comments. I will try to reply instantly.

Tech Doodling

Search This Blog