{"id":1563,"date":"2019-03-08T02:54:47","date_gmt":"2019-03-08T02:54:47","guid":{"rendered":"http:\/\/kusuaks7\/?p=1168"},"modified":"2023-06-28T12:13:42","modified_gmt":"2023-06-28T12:13:42","slug":"deep-learning-for-beginners-practical-guide-with-python-and-keras","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/deep-learning-for-beginners-practical-guide-with-python-and-keras\/","title":{"rendered":"Deep Learning for Beginners &#8211; Practical Guide with Python and Keras"},"content":{"rendered":"<p id=\"3813\">This post will show how the example of digits recognition, presented in a <a href=\"https:\/\/www.experfy.com\/blog\/neural-networks-basic-concepts-for-beginners\">previous post<\/a>\u00a0(I strongly recommend reading it previously), is encoded with Keras to offer the reader a first practical contact with Deep Learning using this Python library.<\/p>\n<p style=\"text-align: center;\">\n<h3 id=\"f196\">Environment set\u00a0up<\/h3>\n<h4 id=\"5dd0\">Why Keras?<\/h4>\n<p id=\"4269\"><a href=\"https:\/\/keras.io\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/keras.io\">Keras<\/a>\u00a0is the recommended library for beginners, since its learning curve is very smooth compared to others, and at the moment it is one of the popular middleware to implement neural networks. Keras is a Python library that provides, in a simple way, the creation of a wide range of Deep Learning models using as backend other libraries such as TensorFlow, Theano or CNTK. It was developed and maintained by\u00a0<a href=\"https:\/\/twitter.com\/fchollet\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/twitter.com\/fchollet\" data->Fran\u00e7ois Chollet<\/a>, an engineer from Google, and his code has been released under the permissive license of MIT. Also an important thing is that Keras is included in\u00a0<a href=\"https:\/\/www.tensorflow.org\/guide\/keras\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.tensorflow.org\/guide\/keras\" data->TensorFlow as a API<\/a>. Although Keras is currently included in Tensorflow package, but can also be used as a Python library. To start in the subject I consider that this second option is the most appropriate.<\/p>\n<p id=\"21c3\">The code in this post is available in the form of Jupyter notebooks in the GitHub (\u00a0<a href=\"https:\/\/github.com\/JordiTorresBCN\/DEEP-LEARNING-practical-introduction-with-Keras\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/JordiTorresBCN\/DEEP-LEARNING-practical-introduction-with-Keras\" data->https:\/\/github.com\/JordiTorresBCN\/DEEP-LEARNING-practical-introduction-with-Keras<\/a>), although this can be run as a normal program in Python if the reader so wishes.<\/p>\n<h4 id=\"d648\">Colaboratory environment<\/h4>\n<p id=\"c860\">In this post, we will use the\u00a0<a href=\"https:\/\/colab.research.google.com\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/colab.research.google.com\"><em>Colaboratory<\/em><\/a><em>\u00a0<\/em>offered by Google<em>.<\/em><\/p>\n<figure id=\"c8db\"><canvas width=\"75\" height=\"37\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*P3wZkM9uYQ6R9zWeWxKXKA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*P3wZkM9uYQ6R9zWeWxKXKA.png\" \/><\/figure>\n<p id=\"79c5\">It is a Google research project created to help to disseminate Machine Learning education and research. It is a Jupyter notebook environment that requires no configuration and runs completely in the Cloud allowing the use of Keras, TensorFlow and PyTorch. The most important feature that distinguishes Colab from other free cloud services is; Colab provides GPU and is totally free. Detailed information about the service can be found on the\u00a0<a href=\"https:\/\/research.google.com\/colaboratory\/faq.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/research.google.com\/colaboratory\/faq.html\" data->faq page<\/a>.<\/p>\n<p id=\"0436\">Notebooks are stored in Google Drive and can be shared as you would do with Google Docs. This environment is free to use, which only requires a Google account. In addition, the environment allows the use of an NVIDIA K80 GPU free of charge.<\/p>\n<p id=\"f049\">When entering for the first time you will see a window like the one shown below. In this window you should select the GITHUB tab and fill in the URL field with \u201cJordiTorresBCN\u201d and the Repository field with \u201cjorditorresBCN \/ DEEP-LEARNING-practical-introduction-with-Keras\u201d.<\/p>\n<figure id=\"f927\"><canvas width=\"75\" height=\"37\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*XoUAfNCObTiOWpX9hLBK3A.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*XoUAfNCObTiOWpX9hLBK3A.png\" \/><\/figure>\n<p id=\"45cb\">To load a notebook, click on the button that appears on their right (open notebook in new tab):<\/p>\n<figure id=\"74dc\"><canvas width=\"75\" height=\"53\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*hrl4tR8TQ5KqzSdvC-UVqQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*hrl4tR8TQ5KqzSdvC-UVqQ.png\" \/><\/figure>\n<p id=\"8bc1\">By default, Colab notebooks run on CPU. You can switch your notebook to run with GPU. In order to obtain access to one GPU we need to choose the tab Runtime<em>\u00a0<\/em>and then select \u201cChange runtime type\u201d as shown in the following figure:<\/p>\n<figure id=\"fe9d\"><canvas width=\"75\" height=\"34\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*euE7nGZ0uJQcgvkpgvkoQg.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*euE7nGZ0uJQcgvkpgvkoQg.png\" \/><\/figure>\n<p id=\"b1d9\">When a pop-up window appears select GPU. Ensure \u201cHardware accelerator\u201d is set to GPU (the default is CPU).<\/p>\n<blockquote id=\"422f\"><p>A warning may appear indicating that the code is not created by Google. I hope that you trust my code and run it anyway!\u00a0\ud83d\ude09<\/p><\/blockquote>\n<p id=\"e57f\">Afterwards, ensure that you are connected to the runtime (there is a green check next to \u201cconnected\u201d in the menu ribbon):<\/p>\n<figure id=\"cb25\"><canvas width=\"75\" height=\"9\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*s7yXmKtrfm1Rda255lw-Ug.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*s7yXmKtrfm1Rda255lw-Ug.png\" \/><\/figure>\n<p id=\"360d\">Now you are able to run GitHub repo in Google Colab. Enjoy!<\/p>\n<h3 id=\"b816\">Data to feed a neural\u00a0network<\/h3>\n<h4 id=\"c796\">Dataset for training, validation and\u00a0testing<\/h4>\n<p id=\"a88f\">Before presenting the implementation in Keras of the previous example, let\u2019s review how we should distribute the available data in order to configure and evaluate the model correctly.<\/p>\n<p id=\"57e0\">For the configuration and evaluation of a model in Machine Learning, and therefore Deep Learning, the available data are usually divided into three sets: training data, validation data, and test data. The training data are those used for the learning algorithm to obtain the parameters of the model with the iterative method that we have already mentioned.<\/p>\n<p id=\"ac0d\">If the model does not completely adapt to the input data (for example, if it presented overfitting), in this case, we would modify the value of certain hyperparameters and after training it again with the training data we would evaluate it again with the validation ones. We can make these adjustments of the hyperparameters guided by the validation data until we obtain validation results that we consider correct. If we have followed this procedure, we must be aware that, in fact, the validation data have influenced the model so that it also fits the validation data. For this reason, we always reserve a set of test data for final evaluation of the model that will only be used at the end of the whole process, when we consider that the model is already fine-tuned and we will no longer modify any of its hyperparameters.<\/p>\n<p id=\"d3ab\">Given the introductory nature of this post and that we will not go into detail of tuning the hyperparameters, in the examples we will ignore the validation data and only use the training and test data.<\/p>\n<h4 id=\"2ef1\">Preloaded data in\u00a0Keras<\/h4>\n<p id=\"30c4\">In Keras the MNIST dataset is preloaded in the form of four Numpy arrays and can be obtained with the following code:<\/p>\n<p id=\"81be\"><span style=\"font-family: courier new,courier,monospace;\">import <\/span><span style=\"font-family: courier new,courier,monospace;\">keras<\/span><\/p>\n<p><span style=\"font-family: courier new,courier,monospace;\">from <\/span><span style=\"font-family: courier new,courier,monospace;\">keras<\/span><span style=\"font-family: courier new,courier,monospace;\">.datasets import <\/span><span style=\"font-family: courier new,courier,monospace;\">mnist<\/span><span style=\"font-family: courier new,courier,monospace;\"><br \/>\n(x_train, y_train), (x_test, y_test) =mnist.load_data()<\/span><\/p>\n<p><em>x_train<\/em>\u00a0and\u00a0<em>y_train<\/em>\u00a0contain the training set, while\u00a0<em>x_test<\/em>\u00a0and\u00a0<em>y_test<\/em>\u00a0contain the test data. The images are encoded as Numpy arrays and their corresponding labels ranging from 0 to 9. Following the strategy of the post to gradually introduce the concepts of the subject, as we have indicated, we will not see yet how to separate a part of the training data to use them as Validation data. We will only take into account the training and test data.<\/p>\n<p id=\"de26\">If we want to check what values we have loaded, we can choose any of the images of the MNIST set, for example image 8, and using the following Python code:<\/p>\n<p id=\"8222\"><span style=\"font-family: courier new,courier,monospace;\">import matplotlib.pyplot as <\/span><span style=\"font-family: courier new,courier,monospace;\">plt<\/span><br \/>\n<span style=\"font-family: courier new,courier,monospace;\">plt<\/span><span style=\"font-family: courier new,courier,monospace;\">.imshow(x_train[8], cmap=plt.cm.binary)<\/span><\/p>\n<p id=\"e4e8\">We get the following image:<\/p>\n<figure id=\"4c42\"><canvas width=\"75\" height=\"75\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*Tntk5mPqHTlnjGDDnxSyaQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*Tntk5mPqHTlnjGDDnxSyaQ.png\" \/><\/figure>\n<p id=\"e59d\">And if we want to see its corresponding label we can do it through:<\/p>\n<pre id=\"e3c6\">print(y_train[8])<\/pre>\n<pre id=\"0a8c\">1<\/pre>\n<p id=\"06d7\">That, as we see, it returns the value of \u201c1\u201d, as expected.<\/p>\n<h4 id=\"98f0\">Data representation in\u00a0Keras<\/h4>\n<p id=\"14f5\">Keras, which as we have seen uses a multidimensional array of Numpy as a basic data structure, calls this data structure a\u00a0<em>tensor<\/em>. In short, we could say that a tensor has three main attributes:<\/p>\n<ul>\n<li id=\"067a\"><em>Number of axes<\/em>\u00a0(<em>Rank<\/em>): a tensor containing a single number will be called scalar (or a 0-dimensional tensor, or tensor 0D). An array of numbers we call vector, or tensor 1D. An array of vectors will be a matrix, or 2D tensor. If we pack this matrix in a new array, we get a 3D tensor, which we can interpret visually as a cube of numbers. By packaging a 3D tensioner in an array, we can create a 4D tensioner, and so on. In the Python Numpy library this is called the\u00a0<em>tensor\u2019s ndim<\/em>.<\/li>\n<li id=\"92ef\"><em>Shape<\/em>: it is a tuple of integers that describe how many dimensions the tensor has along each axis. In the Numpy library this attribute is called\u00a0<em>shape<\/em>.<\/li>\n<li id=\"b20e\">D<em>ata type<\/em>:<strong>\u00a0<\/strong>this attribute indicates the type of data that contains the tensor, which can be for example\u00a0<em>uint8<\/em>,\u00a0<em>float32<\/em>,\u00a0<em>float64<\/em>, etc. In the Numpy library this attribute is called\u00a0<em>dtype<\/em>.<\/li>\n<\/ul>\n<p id=\"2c6b\">I propose that we obtain the number of axes and dimensions of the tensor\u00a0<em>train_images<\/em>\u00a0from our previous example:<\/p>\n<pre id=\"9f56\">print(x_train.ndim)<\/pre>\n<pre id=\"f110\">3<\/pre>\n<pre id=\"ca58\">print(x_train.shape)<\/pre>\n<pre id=\"b60f\">(60000, 28, 28)<\/pre>\n<p id=\"afc8\">And if we want to know what type of data it contains:<\/p>\n<pre id=\"c830\">print(x_train.dtype)<\/pre>\n<pre id=\"788f\">uint8<\/pre>\n<h4 id=\"9495\">Data normalization in\u00a0Keras<\/h4>\n<p id=\"37ac\">These MNIST images of 28\u00d728 pixels are represented as an array of numbers whose values range from [0, 255] of type uint8. But it is usual to scale the input values of neural networks to certain ranges. In the example of this post the input values should be scaled to values of type float32 within the interval [0, 1]. We can achieve this transformation with the following lines of code:<\/p>\n<p id=\"c55e\"><span style=\"font-family: courier new,courier,monospace;\">x_train = x_train.astype(\u2018float32\u2019)<br \/>\nx_test = x_test.astype(\u2018float32\u2019)<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-family: courier new,courier,monospace;\">x_train \/= 255<br \/>\nx_test \/= 255<\/span><\/p>\n<p id=\"9456\">On the other hand, to facilitate the entry of data into our neural network (we will see that in convolutionals it is not necessary) we must make a transformation of the tensor (image) from 2 dimensions (2D) to a vector of 1 dimension (1D). That is, the matrix of 28\u00d728 numbers can be represented by a vector (array) of 784 numbers (concatenating row by row), which is the format that accepts as input a densely connected neural network like the one we will see in this post. In Python, converting every image of the MNIST dataset to avector with 784 components can be accomplished as follows:<\/p>\n<p id=\"0d96\"><span style=\"font-family: courier new,courier,monospace;\">x_train = x_train.reshape(60000, 784)<br \/>\nx_test = x_test.reshape(10000, 784)<\/span><\/p>\n<p id=\"bc16\">After executing these Python instructions, we can verify that\u00a0<em>x_train.shape<\/em>takes the form of (60000, 784) and\u00a0<em>x_test.shape<\/em>\u00a0takes the form of (10000, 784), where the first dimension indexes the image and the second indexes the pixel in each image (now the intensity of the pixel is a value between 0 and 1):<\/p>\n<pre id=\"b25a\">print(x_train.shape)<\/pre>\n<pre id=\"c0a2\">(60000, 784)<\/pre>\n<pre id=\"ed1a\">print(x_test.shape)<\/pre>\n<pre id=\"7acd\">(10000, 784)<\/pre>\n<p id=\"1b95\">In addition to that, we have the labels for each input data (remember that in our case they are numbers between 0 and 9 that indicate which digit represents the image, that is, to which class is associated). In this example, and as we have already advanced, we will represent this label with a vector of 10 positions, where the position corresponding to the digit that represents the image contains a 1 and the remaining positions of the vector contain the value 0.<\/p>\n<p id=\"b58d\">In this example we will use what is known as\u00a0<em>one-hot encoding<\/em>, which we have already mentioned, which consists of transforming the labels into a vector of as many zeros as the number of different labels, and containing the value of 1 in the index that corresponds to the value of the label. Keras offers many support functions, including\u00a0<em>to_categorical<\/em>\u00a0to perform precisely this transformation, which we can import from\u00a0<em>keras.utils<\/em>:<\/p>\n<p id=\"548a\">from keras.utils import to_categorical<\/p>\n<p id=\"e225\">To see the effect of the transformation we can see the values before and after applying\u00a0<em>to_categorical<\/em>\u00a0:<\/p>\n<pre id=\"4f61\">print(y_test[0])<\/pre>\n<pre id=\"a0d2\">7\r\n\r\n<\/pre>\n<pre id=\"7d87\">print(y_train[0])<\/pre>\n<pre id=\"38bb\">5<\/pre>\n<pre id=\"5120\">print(y_train.shape)<\/pre>\n<pre id=\"b979\">(60000,)<\/pre>\n<pre id=\"6273\">print(x_test.shape)<\/pre>\n<pre id=\"f68d\">(10000, 784)<\/pre>\n<p id=\"d3b1\"><span style=\"font-family: courier new,courier,monospace;\">y_train = to_categorical(y_train, num_classes=10)<br \/>\ny_test = to_categorical(y_test, num_classes=10)<\/span><\/p>\n<pre id=\"632d\">print(y_test[0])<\/pre>\n<pre id=\"68c0\">[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]<\/pre>\n<pre id=\"8d11\">print(y_train[0])<\/pre>\n<pre id=\"faee\">[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]<\/pre>\n<pre id=\"d7e6\">print(y_train.shape)<\/pre>\n<pre id=\"bcb0\">(60000, 10)<\/pre>\n<pre id=\"9d08\">print(y_test.shape)<\/pre>\n<pre id=\"2f55\">(10000, 10)<\/pre>\n<p id=\"94b6\">Now we have the data ready to be used in our simple model example that we are going to program in Keras in the next section.<\/p>\n<h3 id=\"38f5\">Densely connected networks in\u00a0Keras<\/h3>\n<p id=\"251e\">In this section, we will present how to specify in Keras the model that we have defined in the previous sections.<\/p>\n<h4 id=\"1c66\">Sequential class in\u00a0Keras<\/h4>\n<p id=\"31cb\">The main data structure in Keras is the\u00a0<em>Sequential<\/em>\u00a0class, which allows the creation of a basic neural network. Keras\u00a0<a href=\"https:\/\/keras.io\/getting-started\/functional-api-guide\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/keras.io\/getting-started\/functional-api-guide\/\" data->also offers an API<\/a>\u00a0that allows implementing more complex models in the form of a graph that can have multiple inputs, multiple outputs, with arbitrary connections in between, but it is beyond the scope of this post.<\/p>\n<p id=\"9b0c\">The\u00a0<a href=\"https:\/\/keras.io\/models\/sequential\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/keras.io\/models\/sequential\/\" data-><em>Sequential<\/em>\u00a0class<\/a>\u00a0of the Keras library is a wrapper for the sequential neural network model that Keras offers and can be created in the following way:<\/p>\n<p id=\"b8a4\"><span style=\"font-family: courier new,courier,monospace;\">from <\/span><span class=\"gr-progress\" style=\"font-family: courier new,courier,monospace;\">keras<\/span><span style=\"font-family: courier new,courier,monospace;\">.models import Sequential<br \/>\nmodel = Sequential()<\/span><\/p>\n<p id=\"aaef\">In this case, the model in Keras is considered as a sequence of layers and each of them gradually \u201cdistills\u201d the input data to obtain the desired output. In Keras we can find all the required types of layers that can be easily added to the model through the\u00a0<em>add()<\/em>\u00a0method.<\/p>\n<h4 id=\"c825\">Defining the\u00a0model<\/h4>\n<p id=\"d329\">The construction in Keras of our model to recognize the images of digits could be the following:<\/p>\n<p id=\"c704\"><span style=\"font-family: courier new,courier,monospace;\">from <\/span><span style=\"font-family: courier new,courier,monospace;\">keras<\/span><span style=\"font-family: courier new,courier,monospace;\">.models import Sequential<br \/>\nfrom\u00a0<\/span><span style=\"font-family: courier new,courier,monospace;\">keras<\/span><span style=\"font-family: courier new,courier,monospace;\">.layers.core import Dense, Activation<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-family: courier new,courier,monospace;\">model = Sequential()<br \/>\nmodel.add(Dense(10, activation=\u2019sigmoid\u2019, input_shape=(784,)))<br \/>\nmodel.add(Dense(10, activation=\u2019softmax\u2019))<\/span><\/p>\n<p id=\"ef7c\">Here, the neural network has been defined as a sequence of two layers that are densely connected (or fully connected), meaning that all the neurons in each layer are connected to all the neurons in the next layer. Visually we could represent it in the following way:<\/p>\n<figure id=\"2635\"><canvas width=\"75\" height=\"40\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*c0fsl5kacv8nEwdIpji4bw.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*c0fsl5kacv8nEwdIpji4bw.png\" \/><\/figure>\n<p id=\"0855\">In the previous code we explicitly express in the\u00a0<em>input_shape<\/em>\u00a0argument of the first layer what the input data is like: a tensor that indicates that we have 784 features of the model (in fact the tensor that is being defined is\u00a0<em>(None, 784,)<\/em>as we will see more ahead).<\/p>\n<p id=\"0287\">A very interesting characteristic of the Keras library is that it will automatically deduce the shape of the tensors between layers after the first one. This means that the programmer only has to establish this information for the first of them. Also, for each layer we indicate the number of nodes that it has and the activation function that we will apply in it (in this example,\u00a0<em>sigmoid<\/em>).<\/p>\n<p id=\"812e\">The second layer in this example is a\u00a0<em>softmax<\/em>\u00a0layer of 10 neurons, which means that it will return a matrix of 10 probability values representing the 10 possible digits (in general, the output layer of a classification network will have as many neurons as classes, except in a binary classification, where only one neuron is needed). Each value will be the probability that the image of the current digit belongs to each one of them.<\/p>\n<p id=\"b446\">A very useful method that Keras provides to check the architecture of our model is\u00a0<em>summary()<\/em>:<\/p>\n<pre id=\"a6f5\">model.summary()<\/pre>\n<p>_________________________________________________________________<br \/>\n<span style=\"font-family: courier new,courier,monospace;\">Layer (type) Output Shape Param #<br \/>\n=================================================================<br \/>\ndense_1 (Dense) (None, 10) 7850<br \/>\n_________________________________________________________________<br \/>\ndense_2 (Dense) (None, 10) 110<br \/>\n=================================================================<br \/>\nTotal params: 7,960<br \/>\nTrainable params: 7,960<br \/>\nNon-trainable params: 0<\/span><\/p>\n<p id=\"3777\">Later we will go into more detail with the information that returns the\u00a0<em>summary()<\/em>\u00a0method, because this calculation of parameters and sizes of the data that the neural network has when we start to build very large network models is very valuable. For our simple example, we see that it indicates that 7,960 parameters are required (column\u00a0<em>Param #<\/em>), which correspond to 7,850 parameters to the first layer and 110 to the second.<\/p>\n<p id=\"8026\">In the first layer, for each neuron\u00a0<em>i<\/em>\u00a0(between 0 and 9) we require 784 parameters for the weights\u00a0<em>wij<\/em>\u00a0and therefore 10\u00d7784 parameters to store the weights of the 10 neurons. In addition to the 10 additional parameters for the 10\u00a0<em>bj<\/em>\u00a0biases corresponding to each one of them. In the second layer, being a softmax function, it is required to connect all 10 neurons with the 10 neurons of the previous layer. Therefore 10&#215;10\u00a0<em>wi<\/em>\u00a0parameters are required and in addition 10\u00a0<em>bj<\/em>\u00a0biases corresponding to each node.<\/p>\n<p id=\"1882\">The details of the arguments that we can indicate for the\u00a0<a href=\"https:\/\/keras.io\/layers\/core\/#dense\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/keras.io\/layers\/core\/#dense\" data-><em>Dense<\/em>\u00a0layer<\/a>\u00a0can be found in the Keras manual. In our example, the most relevant ones appear. The first argument indicates the number of neurons in the layer; the following is the activation function that we will use in it. In the next post (soon\u00a0\ud83d\ude42 )we will discuss in more detail other possible activation functions beyond the two presented here: sigmoid and softmax.<\/p>\n<p id=\"75b6\">The initialization of the weights is also often indicated as an argument of the\u00a0<em>Dense<\/em>\u00a0layers. The initial values must be adequate for the optimization problem to converge as quickly as possible. The various\u00a0<a href=\"https:\/\/keras.io\/initializers\/#usage-of-initializers\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/keras.io\/initializers\/#usage-of-initializers\" data->initialization options<\/a>\u00a0can also be found in the Keras manual.<\/p>\n<h3 id=\"cdad\">Basic steps to implement a neural network in\u00a0Keras<\/h3>\n<p id=\"4f2c\">Next, we will present a brief description of the steps we must perform to implement a basic neural network and, in the following posts (soon), we will gradually introduce more details about each of these steps.<\/p>\n<h4 id=\"80d3\">Configuration of the learning\u00a0process<\/h4>\n<p id=\"4924\">From the\u00a0<em>Sequential<\/em>\u00a0model, we can define the layers in a simple way with the\u00a0<em>add()<\/em>\u00a0method, as we have advanced in the previous section. Once we have our model defined, we can configure how its learning process will be with the\u00a0<em>compile()<\/em>\u00a0method, with which we can specify some properties through method arguments.<\/p>\n<p id=\"1478\">The first of these arguments is the\u00a0<em>loss function<\/em>\u00a0that we will use to evaluate the degree of error between calculated outputs and the desired outputs of the training data. On the other hand, we specify an\u00a0<em>optimizer<\/em>\u00a0that, as we will see, is the way we have to specify the optimization algorithm that allows the neural network to calculate the weights of the parameters from the input data and the defined loss function. More detail of the exact purpose of the loss function and the optimizer will be presented in the next post (soon).<\/p>\n<p id=\"db0b\">And finally we must indicate the metric that we will use to monitor the learning process (and test) of our neural network. In this first example we will only consider the\u00a0<em>accuracy<\/em>\u00a0(fraction of images that are correctly classified). For example, in our case we can specify the following arguments in\u00a0<em>compile()<\/em>method to test it on our computer:<\/p>\n<p id=\"5ce6\"><span style=\"font-family: courier new,courier,monospace;\">model.compile(loss=\u201dcategorical_crossentropy\u201d,<br \/>\noptimizer=\u201d<\/span><span style=\"font-family: courier new,courier,monospace;\">sgd<\/span><span style=\"font-family: courier new,courier,monospace;\">\u201d,<br \/>\nmetrics = [\u2018accuracy\u2019])<\/span><\/p>\n<p id=\"c191\">In this example we specify that the loss function is\u00a0<em>categorical_crossentropy<\/em>, the optimizer used is the\u00a0<em>stocastic gradient descent (sgd)<\/em>\u00a0and the metric is\u00a0<em>accuracy<\/em>, with which we will evaluate the percentage of correct guesses.<\/p>\n<h4 id=\"c2a2\">Model training<\/h4>\n<p id=\"7fe0\">Once our model has been defined and the learning method configured, it is ready to be trained. For this we can train or \u201cadjust\u201d the model to the training data available by invoking the\u00a0<em>fit()<\/em>\u00a0method of the model:<\/p>\n<p id=\"0488\"><span style=\"font-family: courier new,courier,monospace;\">model.fit(x_train, y_train, batch_size=100, epochs=5)<\/span><\/p>\n<p id=\"7ca7\">In the first two arguments we have indicated the data with which we will train the model in the form of Numpy arrays. The\u00a0<em>batch_size<\/em>\u00a0argument indicates the number of data that we will use for each update of the model parameters and with\u00a0<em>epochs<\/em>\u00a0we are indicating the number of times we will use all the data in the learning process. These last two arguments will be explained in much more detail in the next post (soon).<\/p>\n<p id=\"10a5\">This method finds the value of the parameters of the network through the iterative training algorithm that we mentioned and we will present in a bit more detail in the next post (soon). Roughly, in each iteration of this algorithm, this algorith takes training data from\u00a0<em>x_train<\/em>, passes them through the neural network (with the values that their parameters have at that moment), compares the obtained result with the expected one (indicated in\u00a0<em>y_train<\/em>) and calculates the\u00a0<em>loss<\/em>\u00a0to guide the adjustment process of the model parameters, which intuitively consists of applying the optimizer specified above in the\u00a0<em>compile()<\/em>\u00a0method to calculate a new value of each one of the model parameters (weights and biases)in each iteration in such a way that the loss is reduced.<\/p>\n<p id=\"3e45\">This is the method that, as we will see, may take longer and Keras allows us to see its progress using the\u00a0<em>verbose<\/em>\u00a0argument (by default, equal to 1), in addition to indicating an estimate of how long each\u00a0<em>epoch<\/em>\u00a0takes:<\/p>\n<p id=\"d500\"><span style=\"font-family: courier new,courier,monospace;\">Epoch 1\/5<br \/>\n60000\/60000 [========] \u2014 1s 15us\/step \u2014 loss: 2.1822 \u2014 acc: 0.2916<br \/>\nEpoch 2\/5<br \/>\n60000\/60000 [========] \u2014 1s 12us\/step \u2014 loss: 1.9180 \u2014 acc: 0.5283<br \/>\nEpoch 3\/5<br \/>\n60000\/60000 [========] \u2014 1s 13us\/step \u2014 loss: 1.6978 \u2014 acc: 0.5937<br \/>\nEpoch 4\/5<br \/>\n60000\/60000 [========] \u2014 1s 14us\/step \u2014 loss: 1.5102 \u2014 acc: 0.6537<br \/>\nEpoch 5\/5<br \/>\n60000\/60000 [========] \u2014 1s 13us\/step \u2014 loss: 1.3526 \u2014 acc: 0.7034<br \/>\n10000\/10000 [========] \u2014 0s 22us\/step<\/span><\/p>\n<p id=\"a89e\">This is a simple example so that the reader at the end of the post has already been able to program their first neural network but, as we will see, the\u00a0<em>fit()<\/em>method allows many more arguments that have a very important impact on the learning outcome. Furthermore, this method returns a\u00a0<em>History<\/em>\u00a0object that we have omitted in this example. Its\u00a0<em>History.history<\/em>\u00a0attribute is the record of the\u00a0<em>loss<\/em>\u00a0values for the training data and other metrics in successive\u00a0<em>epochs<\/em>, as well as other metrics for the validation data if they have been specified.<\/p>\n<h4 id=\"b8f4\">Model evaluation<\/h4>\n<p id=\"ebdd\">At this point, the neural network has been trained and its behavior with new test data can now be evaluated using the\u00a0<em>evaluation()<\/em>\u00a0method. This method returns two values:<\/p>\n<p id=\"c042\"><span style=\"font-family: courier new,courier,monospace;\">test_loss, test_acc = model.evaluate(x_test, y_test)<\/span><\/p>\n<p id=\"ae1e\">These values indicate how well or badly our model behaves with new data that it has never seen. These data have been stored in\u00a0<em>x_test<\/em>\u00a0and\u00a0<em>y_test<\/em>\u00a0when we have performed the\u00a0<em>mnist.load_data()\u00a0<\/em>and we pass them to the method as arguments. In the scope of this post we will only look at one of them, the accuracy:<\/p>\n<p id=\"148b\"><span style=\"font-family: courier new,courier,monospace;\">print(\u2018Test accuracy:\u2019, test_acc)<\/span><\/p>\n<p><span style=\"font-family: courier new,courier,monospace;\">Test accuracy: 0.9018<\/span><\/p>\n<p id=\"a743\">The accuracy is telling us that the model we have created in this post, applied to data that the model has never seen before, classifies 90% of them correctly.<\/p>\n<p id=\"a907\">The reader should note that, in this example, to evaluate the model we have only focused on its accuracy, that is, the ratio between the correct predictions that the model has made over the total predictions regardless of what category it is. However, although in this case it is sufficient, sometimes it is necessary to delve a little more and take into account the types of correct and incorrect predictions made by the model in each of its categories.<\/p>\n<p id=\"a5e6\">In Machine Learning, a very useful tool for evaluating models is the confusion matrix, a table with rows and columns that count the predictions in comparison with the real values. We use this table to better understand how well the model behaves and it is very useful to show explicitly when one class is confused with another. A confusion matrix for a binary classifier like the one explained in the\u00a0<a href=\"https:\/\/towardsdatascience.com\/basic-concepts-of-neural-networks-1a18a7aa2bd2\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/towardsdatascience.com\/basic-concepts-of-neural-networks-1a18a7aa2bd2\" data->previous post<\/a>\u00a0has this structure:<\/p>\n<figure id=\"e7be\"><canvas width=\"75\" height=\"34\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*UIFVpCx4h1yW1WqRS-2C2w.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*UIFVpCx4h1yW1WqRS-2C2w.png\" \/><\/figure>\n<p id=\"b3b6\">True positives (TP), true negatives (TN), false positives (FP), and false negatives (FN), are the four different possible outcomes of a single prediction for a two-class case with classes \u201c1\u201d (\u201cpositive\u201d) and \u201c0\u201d (\u201cnegative\u201d).<\/p>\n<p id=\"2529\">A false positive is when the outcome is incorrectly classified as positive, when it is in fact negative. A false negative is when the outcome is incorrectly classified as negative when it is in fact positive. True positives and true negatives are obviously correct classifications.<\/p>\n<p id=\"39fb\">With this confusion matrix, the accuracy can be calculated by adding the values of the diagonal and dividing them by the total:<\/p>\n<p id=\"7583\"><em>Accuracy<\/em>\u00a0= (TP + TN) \/ (TP + FP + FN + TN)<\/p>\n<p id=\"6d21\">Nonetheless, the accuracy can be misleading in terms of the quality of the model because, when measuring it for the concrete model, we do not distinguish between the false positive and false negative type errors, as if both had the same importance. For example, think of a model that predicts if a mushroom is poisonous. In this case, the cost of a false negative, that is, a poisonous mushroom given for consumption could be dramatic. On the contrary, a false positive has a very different cost.<\/p>\n<p id=\"66ef\">For this reason we have another metric called\u00a0<em>Sensitivity<\/em>\u00a0(or\u00a0<em>recall<\/em>) that tells us how well the model avoids false negatives:<\/p>\n<p id=\"62ee\"><em>Sensitivity<\/em>\u00a0= TP \/ (TP + FN)<\/p>\n<p id=\"140b\">In other words, from the total of positive observations (poisonous mushrooms), how many the model detects.<\/p>\n<p id=\"6e92\">From the confusion matrix, several metrics can be obtained to focus other cases as shown in\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Confusion_matrix\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/en.wikipedia.org\/wiki\/Confusion_matrix\" data->this link<\/a>, but it is beyond the scope of this post to enter more in detail on this topic. The convenience of using one metric or another will depend on each particular case and, in particular, the \u201ccost\u201d associated with each classification error of the model.<\/p>\n<p id=\"d877\">But the reader will wonder how is this confusion matrix in our classifier, where there are 10 classes instead of 2. In this case, I suggest using the\u00a0<a href=\"http:\/\/scikit-learn.org\/stable\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/scikit-learn.org\/stable\/\" data-><em>Scikit-learn<\/em>\u00a0package<\/a>\u00a0to evaluate the quality of the model by\u00a0<a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.confusion_matrix.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.confusion_matrix.html\" data->calculating the confusion matrix<\/a>, presented in the following figure:<\/p>\n<figure id=\"0772\"><canvas width=\"75\" height=\"37\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*k5CgjgX964OPqCRlegGV8Q.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*k5CgjgX964OPqCRlegGV8Q.png\" \/><\/figure>\n<p id=\"f072\">In this case, the elements of the diagonal represent the number of points in which the label predicted by the model coincides with the actual value of the label, while the other values indicate the cases in which the model has classified incorrectly. Therefore, the higher the values of the diagonal, the better the prediction will be. In this example, if the reader calculates the sum of the values of the diagonal divided by the total values of the matrix, he or she will see that it matches the accuracy that the\u00a0<em>evaluate()<\/em>\u00a0method has returned.<\/p>\n<p id=\"2ebf\">In the\u00a0<a href=\"https:\/\/github.com\/JordiTorresBCN\/DEEP-LEARNING-practical-introduction-with-Keras\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/JordiTorresBCN\/DEEP-LEARNING-practical-introduction-with-Keras\" data->GitHub of the post, the reader can find the code used<\/a>\u00a0to calculate this confusion matrix.<\/p>\n<h4 id=\"91a5\">Generate predictions<\/h4>\n<p id=\"ca9a\">Finally, readers need to know how we can use the model trained in the previous section to make predictions. In our example, it consists in predict which digit represents an image. In order to do this, Keras supply the\u00a0<em>predict()<\/em>method.<\/p>\n<p id=\"ce95\">To test this method we can choose any element. For ease, let\u2019s take one from the test dataset\u00a0<em>x_test<\/em>. For example let\u2019s choose the element 11 of this dataset\u00a0<em>x_test<\/em>.<\/p>\n<p id=\"e560\">Before seeing the prediction, let\u2019s see the image to be able to check ourselves if the model is making a correct prediction (before doing the previous reshape):<\/p>\n<p id=\"d719\"><span style=\"font-family: courier new,courier,monospace;\">plt<\/span><span style=\"font-family: courier new,courier,monospace;\">.imshow(x_test[11], cmap=plt.cm.binary)<\/span><\/p>\n<figure id=\"5fc2\"><canvas width=\"75\" height=\"31\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*y22Cq1iMdzL6qpBqxTi07A.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*y22Cq1iMdzL6qpBqxTi07A.png\" \/><\/figure>\n<p id=\"aef6\">I think the reader will agree that in this case it corresponds to number 6.<\/p>\n<p id=\"dde6\">Now let\u2019s see that the\u00a0<em>predict()<\/em>\u00a0method of the model, executing the following code, correctly predicts the value that we have just estimated that it should predict.<\/p>\n<pre id=\"bb4d\">predictions = model.predict(x_test)<\/pre>\n<p id=\"3d97\">The predict() method return a vector with the predictions for the whole dataset elements. We can know which class gives the most probability of belonging by means of the argmax function of Numpy, which returns the index of the position that contains the highest value of the vector. Specifically, for item 11:<\/p>\n<pre id=\"6af2\">np.argmax(predictions[11])<\/pre>\n<pre id=\"87c8\">6<\/pre>\n<p id=\"2a31\">We can check it printing the vector returned by the method:<\/p>\n<pre id=\"0399\">print(predictions[11])<\/pre>\n<pre id=\"3d1f\">[0.06 0.01 0.17 0.01 0.05 0.04 0.54 0. 0.11 0.02]<\/pre>\n<p id=\"b2ec\">We see that the highest value in the vector is in the position 6. We can also verify that the result of the prediction is a vector whose sum of all its components is equal to 1, as expected. For this we can use:<\/p>\n<pre id=\"5c6a\">np.sum(predictions[11])<\/pre>\n<pre id=\"159e\">1.0<\/pre>\n<p id=\"e9e9\">So far the reader has been able to create their first model in Keras that correctly classifies the MNIST digits 90% of the time. In the next post (soon), we will present how the learning process works and several of the hyperparameters that we can use in a neural network to improve these results.<\/p>\n<p id=\"6d4d\">In a future post (soon) we will see how we can improve these classification results using convolutional neural networks for the same example.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Keras is the recommended library for beginners since its learning curve is very smooth compared to others. Keras is a Python library that provides, in a simple way, the creation of a wide range of Deep Learning models using as backend other libraries such as TensorFlow, Theano or CNTK.&nbsp;Although Keras is currently included in Tensorflow package, but can also be used as a Python library. To start in the subject I consider that this second option is the most appropriate.<\/p>\n","protected":false},"author":475,"featured_media":4085,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[92],"ppma_author":[3001],"class_list":["post-1563","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-machine-learning"],"authors":[{"term_id":3001,"user_id":475,"is_guest":0,"slug":"jordi-torres","display_name":"Jordi Torres","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Torres","first_name":"Jordi","job_title":"","description":"<a href=\"https:\/\/torres.ai\/\">Jordi Torres<\/a> is Professor at UPC Barcelona Tech and Barcelona Supercomputing Center with 30 years of experience in teaching and research in high-performance computing, with important scientific publications and R&amp;D projects in companies and institutions. He is currently a Board Member of iThinkUPC and acts as a trainer, mentor, and expert for various organizations and companies. He has also written several technical books, gives lectures and has collaborated with different media, radio, and television."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1563","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/475"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1563"}],"version-history":[{"count":3,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1563\/revisions"}],"predecessor-version":[{"id":28906,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1563\/revisions\/28906"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/4085"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1563"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1563"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1563"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1563"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}