{"id":1013,"date":"2018-12-04T04:08:37","date_gmt":"2018-12-04T01:08:37","guid":{"rendered":"http:\/\/kusuaks7\/?p=618"},"modified":"2021-11-29T10:07:09","modified_gmt":"2021-11-29T10:07:09","slug":"learning-ai-if-you-suck-at-math-part5-deep-learning-and-convolutional-neural-nets-in-plain-english","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/learning-ai-if-you-suck-at-math-part5-deep-learning-and-convolutional-neural-nets-in-plain-english\/","title":{"rendered":"Learning AI if You Suck at Math\u200a &#8211; Part 5\u200a &#8211; \u200aDeep Learning and Convolutional Neural Nets in Plain English!"},"content":{"rendered":"<p><strong><em>Ready to learn Artificial Intelligence? <a href=\"https:\/\/www.experfy.com\/training\/courses\">Browse courses<\/a>\u00a0like\u00a0 <a href=\"https:\/\/www.experfy.com\/training\/courses\/uncertain-knowledge-and-reasoning-in-artificial-intelligence\">Uncertain Knowledge and Reasoning in Artificial Intelligence<\/a> developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/em><\/strong><\/p>\n<p>Welcome to part five of Learning AI if You Suck at Math. If you missed\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/learning-ai-if-you-suck-at-math-part-1\">part 1<\/a>,\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/learning-ai-if-you-suck-at-math-part-two-practical-projects\">part 2<\/a>,\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/learning-ai-if-you-suck-at-math-part3-building-an-ai-dream-machine\">part3<\/a> and <a href=\"https:\/\/www.experfy.com\/blog\/learning-ai-if-you-suck-at-math-part4-tensors-illustrated-with-cats\">part4<\/a>\u00a0be sure to check them out.<\/p>\n<p id=\"5f75\"><strong>Today, we\u2019re going to write our own Python image recognition program.<\/strong><\/p>\n<p id=\"ba05\"><strong>To do that, we\u2019ll explore a powerful deep learning architecture called a deep convolutional neural network (DCNN).<\/strong><\/p>\n<p id=\"6d00\">Convnets are the workhorses of computer vision. They power everything from self-driving cars to Google\u2019s image search. At TensorFlow Summit 2017,\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=toK1OSLep3s\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.youtube.com\/watch?v=toK1OSLep3s\" data->a researcher showed how they\u2019re using a convnet to detect skin cancer<\/a>\u00a0as well as a dermatologist with a smart phone!<\/p>\n<p id=\"fc28\">So why are neural networks so powerful? One key reason:<\/p>\n<p id=\"138a\"><strong>They do\u00a0<em>automatic pattern recognition<\/em>.<\/strong><\/p>\n<p id=\"2271\">So what\u2019s pattern recognition and why do we care if it\u2019s automatic?<\/p>\n<p id=\"0b00\">Patterns come in many forms but let\u2019s take two critical examples:<\/p>\n<ul>\n<li id=\"b039\">The features that define a physical form<\/li>\n<li id=\"ac3f\">The steps it takes to do a task<\/li>\n<\/ul>\n<h3 id=\"ef46\">Computer Vision<\/h3>\n<p id=\"c846\">In image processing pattern recognition is known as<strong>\u00a0<\/strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Feature_extraction\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/en.wikipedia.org\/wiki\/Feature_extraction\" data-><strong><em>feature extraction<\/em><\/strong><\/a><strong>.<\/strong><\/p>\n<p id=\"7e38\">When you look at a photo or something in the real world you\u2019re selectively picking out the key features that allow you to make sense of it. This is something you do unconsciously.<\/p>\n<p id=\"b1c7\">When you see the picture of my cat Dove you think \u201ccat\u201d or \u201cawwwwww\u201d but you don\u2019t really know\u00a0<em>how<\/em>\u00a0you do that. You just do it.<\/p>\n<p id=\"01c9\"><strong>You don\u2019t know how you do it because it\u2019s happening\u00a0<em>automatically<\/em>\u00a0and\u00a0<em>unconsciously<\/em>.<\/strong><\/p>\n<figure id=\"406c\"><canvas width=\"75\" height=\"72\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 699px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*jT34jY1zYQ8DXXMzRTjGPw.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*jT34jY1zYQ8DXXMzRTjGPw.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\">My beautiful cat Dove. Your built in neural network knows this is a\u00a0cat.<\/p>\n<p id=\"c4f2\">It seems simple to you because you do it every day, but that\u2019s because the complexity is hidden away from you.<\/p>\n<p id=\"c225\">Your brain is a black box. You come with no instruction manual.<\/p>\n<p id=\"d685\">Yet if you really stop to think about it, what you just did in a fraction of second involved a massive number of steps. On the surface it\u2019s deceptively simple but it\u2019s actually incredibly complex.<\/p>\n<ul>\n<li id=\"5b69\">You moved your eyes.<\/li>\n<li id=\"bb3d\">You took in light and you processed that light into component parts which sent signals to your brain.<\/li>\n<li id=\"8e80\">Then your brain went to work, doing its magic, converting that light to electro-chemical signals.<\/li>\n<li id=\"55a1\">Those signals fired through your built in neural network, activating different parts of it, including memories, associations and feelings.<\/li>\n<li id=\"f8f2\">At the most \u201cbasic\u201d level your brain highlighted low level patterns (ears, whiskers, tail) that it combined into higher order patterns (animal).<\/li>\n<li id=\"62db\">Lastly, you made a classification, which means you turned it into a word, which is a symbolic representation of the real life thing, in this case a \u201ccat.\u201d<\/li>\n<\/ul>\n<p id=\"e58b\"><strong>All of that happened in the blink of an eye.<\/strong><\/p>\n<p id=\"889a\">If you tried to teach a computer to do that, where would you even begin?<\/p>\n<ul>\n<li id=\"79df\">Could you tell it how to detect ears?<\/li>\n<li id=\"434e\">What are ears?<\/li>\n<li id=\"0901\">How do you describe them?<\/li>\n<li id=\"0754\">Why are cat ears different than human ears or bat ears (or Batman)?<\/li>\n<li id=\"af24\">What do ears look like from various angles?<\/li>\n<li id=\"15c4\">Are all cat ears the same (Nope, check out a Scottish Fold)?<\/li>\n<\/ul>\n<p id=\"f061\">The problems go on and on.<\/p>\n<p id=\"c979\">If you couldn\u2019t come up with a good answer on how to teach a computer all those steps with some C++ or Python, don\u2019t feel bad, because it stumped computer scientists for 50 years!<\/p>\n<p id=\"7446\"><strong>What you do naturally is one of the key uses for a deep learning neural network, which is a \u201cclassifier\u201d, in this case an image classifier.<\/strong><\/p>\n<p id=\"21cc\">In the beginning, AI researchers tried to do the exercise we just went through. They attempted to define all the steps manually. For example, when it comes to natural language processing or NLP, they assembled the best linguists and said \u201cwrite down all the \u2018rules\u2019 for languages.\u201d They called these early AI\u2019s \u201cexpert systems.\u201d<\/p>\n<p id=\"4e68\">The linguists sat down and puzzled out a dizzying array of if, then, unless, except statements:<\/p>\n<ul>\n<li id=\"575a\">Does a bird fly?<\/li>\n<\/ul>\n<p id=\"03f6\">Yes<\/p>\n<p id=\"2a9b\">Unless it\u2019s:<\/p>\n<ul>\n<li id=\"1608\">Dead<\/li>\n<li id=\"dd5f\">Injured<\/li>\n<li id=\"13fb\">A flightless bird like a Penguin<\/li>\n<li id=\"1530\">Missing a wing<\/li>\n<\/ul>\n<p id=\"0453\">These lists of rules and exceptions are endless. Unfortunately they\u2019re also terribly brittle and prone to all kinds of errors. They\u2019re time consuming to create, subject to debate and bias, hard to figure out, etc.<\/p>\n<p id=\"9e09\"><strong>Deep neural networks represent a real breakthrough because instead of you having to figure out all the steps, you can let the machine\u00a0<em>extract<\/em>\u00a0the<em>key features<\/em>\u00a0of a cat\u00a0<em>automatically<\/em>.<\/strong><\/p>\n<p id=\"5c49\">\u201cAutomatically\u201d is essential because we bypass the impossible problem of trying to figure out all those thousands or millions of hidden steps we take to do any complex action.<\/p>\n<p id=\"c785\"><strong>We can let the computer figure it out for itself!<\/strong><\/p>\n<h3 id=\"076a\">The Endless Steps of Everything<\/h3>\n<p id=\"d595\">Let\u2019s look at the second example: Figuring out the steps to do a task.<\/p>\n<p id=\"a66d\">Today we do this manually and define the steps for a computer. It\u2019s called programming. Let\u2019s say you want to find all the image files on your hard drive and move them to a new folder.<\/p>\n<p id=\"6e2e\">For most tasks the programmer is the neural network. He\u2019s the intelligence. He studies the task, decomposes it into steps and then defines each step for the computer one by one. He describes it to the computer with a symbolic representation known as a computer programming language.<\/p>\n<p id=\"6d32\">Here\u2019s an example in Python, from\u00a0<a href=\"http:\/\/stackoverflow.com\/questions\/11903037\/copy-all-jpg-file-in-a-directory-to-another-directory-in-python\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/stackoverflow.com\/questions\/11903037\/copy-all-jpg-file-in-a-directory-to-another-directory-in-python\" data->\u201cJolly Jumper\u201d on Stack Exchange<\/a>:<\/p>\n<div id=\"0ebc\"><span style=\"font-family: courier new,courier,monospace;\">import glob<br \/>\nimport<\/span><span style=\"font-family: courier new,courier,monospace;\">shutil<\/span><br \/>\n<span style=\"font-family: courier new,courier,monospace;\">import os<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\">src_dir = \u201cyour\/source\/dir\u201d<br \/>\ndst_dir = \u201cyour\/destination\/dir\u201d<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\">for <\/span><span style=\"font-family: courier new,courier,monospace;\">jpgfile<\/span><span style=\"font-family: courier new,courier,monospace;\"> in glob.iglob(os.path.join(src_dir, \u201c*.jpg\u201d)):<\/span><br \/>\n<span style=\"font-family: courier new,courier,monospace;\">shutil<\/span><span style=\"font-family: courier new,courier,monospace;\">.move(<\/span><span style=\"font-family: courier new,courier,monospace;\">jpgfile<\/span><span style=\"font-family: courier new,courier,monospace;\">, dst_dir)<\/span><\/div>\n<p>&nbsp;<\/p>\n<p id=\"19ff\">Jolly Jumper figured out all the steps and translated them for the computer, such as:<\/p>\n<ul>\n<li id=\"648d\">We need to know the source directory<\/li>\n<li id=\"a650\">Also, we need a destination<\/li>\n<li id=\"92f4\">We need a way of classifying the types of files we want, in this case a \u201cjpg\u201d file<\/li>\n<li id=\"fd12\">Lastly we go into the directory, search it for any jpgs and move them from the source to the destination directory<\/li>\n<\/ul>\n<p id=\"b8ac\">This works well for simple and even moderately complex problems. Operating systems are some of the most complex software on Earth, composed of 100&#8217;s of millions of lines of code. Each line is an explicit instruction for how computers do tasks ( like draw things on the screen, store and update information ) as well as how people do tasks ( copy files, input text, send email, view photos, chat with others, etc. ).<\/p>\n<p id=\"a5d7\">But as we evolve to try and solve more challenging problems we\u2019re running into the limits of our ability to manually define the steps of the problem.<\/p>\n<p id=\"caef\">For example, how do you define driving a car?<\/p>\n<p id=\"37e1\">There are hundreds of millions of tiny steps that we take to do this mind-numbingly complex task. We have to:<\/p>\n<ul>\n<li id=\"9ae1\">Stay in the lines<\/li>\n<li id=\"0519\">Know what a line is and be able to recognize it<\/li>\n<li id=\"85cb\">Navigate from one place to another<\/li>\n<li id=\"518c\">Recognize obstructions like walls, people, debris<\/li>\n<li id=\"9625\">Classify objects as helpful (street sign) or threat (pedestrian crossing a green light)<\/li>\n<li id=\"8caf\">Assess where all the drivers around us are constantly<\/li>\n<li id=\"3072\">Make split second decisions<\/li>\n<\/ul>\n<p id=\"105f\">In machine learning this is known as a\u00a0<em>decision making<\/em>\u00a0problem. Examples of complex decision making problems are:<\/p>\n<ul>\n<li id=\"f6fe\">Robot navigation and perception<\/li>\n<li id=\"7ed6\">Language translation systems<\/li>\n<li id=\"199f\">Self driving cars<\/li>\n<li id=\"c4cc\">Stock trading systems<\/li>\n<\/ul>\n<h3 id=\"8745\"><strong>The Secret Inner Life of Neural\u00a0Networks<\/strong><\/h3>\n<p id=\"eb4f\">Let\u2019s see how deep learning helps us solve the insane complexity of the real world by doing automatic feature extraction!<\/p>\n<figure id=\"a7ba\" data-scroll=\"native\"><canvas width=\"57\" height=\"75\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/540\/1*XAktSZg3tmE-VkqZzqmrcg.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/540\/1*XAktSZg3tmE-VkqZzqmrcg.png\" \/><\/figure>\n<p id=\"e317\">If you\u2019ve ever read the excellent book\u00a0<a href=\"http:\/\/amzn.to\/2lE2PqW\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/amzn.to\/2lE2PqW\" data-><strong>Think Like a Programmer<\/strong>, by V. Anton Spraul<\/a>\u00a0(and you should), you know that\u00a0<strong>programming is about problem solving<\/strong>. The programmer\u00a0<strong>decomposes a problem down into smaller problems, creates an action plan<\/strong>\u00a0to solve it\u00a0<strong>and then writes code<\/strong>\u00a0to make it happen.<\/p>\n<p id=\"a459\">Deep Learning solves problems for us, but AI still needs humans at this point (thank God) to design and test AI architectures (at least for now.) So let\u2019s decompose a neural net into its parts and build a program to recognize that the picture of my Dove is a cat.<\/p>\n<h3 id=\"0751\"><strong>The Deep in Deep\u00a0Learning<\/strong><\/h3>\n<p id=\"0fb1\">Deep learning is subfield of machine learning. It\u2019s name comes from the idea that we stack together a bunch of different<strong>\u00a0layers<\/strong>\u00a0to learn increasingly meaningful representations of data.<\/p>\n<p id=\"096c\">Each of those layers are\u00a0<strong>neural networks,\u00a0<\/strong>which consist of\u00a0<strong>linked connections between artificial neurons<\/strong>.<\/p>\n<p id=\"1dd7\">Before we had powerful GPUs to do the math for us we could only build very small \u201ctoy\u201d neural nets. They couldn\u2019t do very much. Today we can\u00a0<strong>stack many layers<\/strong>\u00a0together hence the \u201c<strong>deep<\/strong>\u201d in\u00a0<strong>deep learning<\/strong>.<\/p>\n<p id=\"0791\">Neural nets were inspired by biological research into the human brain in the 1950s. Researchers created a mathematical representation of a neuron, which you can see below (<a href=\"http:\/\/cs231n.github.io\/neural-networks-1\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/cs231n.github.io\/neural-networks-1\/\" data->courtesy of the awesome open courseware on Convolutional Neural Nets from Stanford<\/a>\u00a0and Wikimedia Commons):<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" style=\"width: 700px; height: 298px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*Mz0a4EEsdJYsbvf5M_u-Sw.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*Mz0a4EEsdJYsbvf5M_u-Sw.png\" \/><\/p>\n<p style=\"text-align: center;\">Biological neuron<\/p>\n<figure id=\"4ead\"><canvas width=\"75\" height=\"41\"><\/canvas><img decoding=\"async\" style=\"width: 659px; height: 376px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*Yf6BWJq0kdHTumErO99bUQ.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*Yf6BWJq0kdHTumErO99bUQ.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\">Math model of a\u00a0neuron.<\/p>\n<p id=\"f443\"><strong>Forget about all the more complex math symbols for now, because you don\u2019t need them.<\/strong><\/p>\n<p id=\"2ddc\">The basics are super simple. Data, represented by\u00a0<strong>x0,<\/strong>\u00a0travels through the connections between the neurons. The strength of the connections are represented by their weights (<strong>w0x0, w1x1<\/strong>, etc). If the signal is strong enough, it fires the neuron via its \u201c<strong>activation function<\/strong>\u201d and makes the neuron \u201c<strong>active<\/strong>.\u201d<\/p>\n<p id=\"cacf\">Here is an example of a three layer deep neural net:<\/p>\n<figure id=\"e053\"><canvas width=\"75\" height=\"27\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 275px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*jUydtMleiUS-6uQzVuttKw.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*jUydtMleiUS-6uQzVuttKw.png\" \/><\/figure>\n<p id=\"9d39\">By activating some neurons and not others and by strengthening the connections between neurons, the system learns what\u2019s important about the world and what\u2019s not.<\/p>\n<figure id=\"79b8\"><canvas width=\"75\" height=\"61\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 577px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*3R4Z-JIOB_QV2XQCc_oapg.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*3R4Z-JIOB_QV2XQCc_oapg.jpeg\" \/><\/figure>\n<h3 id=\"f430\"><strong>Building and Training a Neural\u00a0Network<\/strong><\/h3>\n<p id=\"7ceb\">Let\u2019s take a deeper look at deep learning and write some code as we go.\u00a0<a href=\"https:\/\/github.com\/the-laughing-monkey\/learning-ai-if-you-suck-at-math\/tree\/master\/Deep%20Learning%20Examples\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/the-laughing-monkey\/learning-ai-if-you-suck-at-math\/tree\/master\/Deep%20Learning%20Examples\" data-><strong>All the code is available on my Github here<\/strong><\/a>.<\/p>\n<p id=\"1d88\">The essential characteristics of the system are:<\/p>\n<ul>\n<li id=\"6a5e\"><strong>Training<\/strong><\/li>\n<li id=\"a91b\"><strong>Input data<\/strong><\/li>\n<li id=\"8498\"><strong>Layers<\/strong><\/li>\n<li id=\"5b2d\"><strong>Weights<\/strong><\/li>\n<li id=\"5b59\"><strong>Targets<\/strong><\/li>\n<li id=\"ff44\"><strong>Loss function<\/strong><\/li>\n<li id=\"cd14\"><strong>Optimizer function<\/strong><\/li>\n<li id=\"363c\"><strong>Predictions<\/strong><\/li>\n<\/ul>\n<h3 id=\"b936\"><strong>Training<\/strong><\/h3>\n<p id=\"56d7\">Training is how we teach a neural network what we want it to learn. It follows a simple five step process:<\/p>\n<ol>\n<li id=\"1639\">Create a\u00a0<strong>training data set<\/strong>, which we will call\u00a0<strong>x<\/strong>\u00a0and load its\u00a0<strong>labels as targets y<\/strong><\/li>\n<li id=\"5d51\"><strong>Feed the x data forward<\/strong>\u00a0through the network with the\u00a0<strong>result being predictions y\u2019<\/strong><\/li>\n<li id=\"3d12\">Figure out the\u00a0<strong>\u201closs\u201d of the network<\/strong>, which is the\u00a0<strong>difference between the predictions y\u2019 and the correct targets y<\/strong><\/li>\n<li id=\"203c\">Compute the\u00a0<strong>\u201cgradient\u201d of the loss (l)<\/strong>\u00a0and which tells us how fast we\u2019re moving towards or away from the correct targets<\/li>\n<li id=\"0f87\"><strong>Adjust the weights<\/strong>\u00a0of the network in the\u00a0<strong>opposite direction of the gradient<\/strong>\u00a0and go back to step two to try again<\/li>\n<\/ol>\n<figure id=\"0c99\" data-scroll=\"native\"><canvas width=\"75\" height=\"33\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 324px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*6SuGYTmD0Gg3R8hOgmfE2A.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*6SuGYTmD0Gg3R8hOgmfE2A.jpeg\" \/><\/figure>\n<h3 id=\"2adb\"><strong>Input Data<\/strong><\/h3>\n<p id=\"613b\">In this case the input data to a DCNN is a bunch of images. The more images the better. Unlike people, computers need a lot of examples to learn how to classify them. AI researchers are working on ways to learn with a lot less data but that\u2019s still a cutting edge problem.<\/p>\n<p id=\"a5dd\">A famous example is the\u00a0<a href=\"http:\/\/www.image-net.org\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/www.image-net.org\/\" data-><strong>ImageNet<\/strong><\/a>\u00a0data set. It consists of lots of hand labeled images. In other words, they crowd sourced the humans to use their built in neural nets to look at all the images and provide meaning to the data. People uploaded their photos and labeled it with tags, like \u201cdog\u201d, or a specific type of dog like a \u201cBeagle.\u201d<\/p>\n<p id=\"1b44\">Those\u00a0<strong>labels represent accurate predictions<\/strong>\u00a0for the network. The closer the network gets to\u00a0<strong>matching the hand labeled data (y)<\/strong>\u00a0with their\u00a0<strong>predictions (y\u2019)<\/strong>\u00a0the more accurate the network grows.<\/p>\n<p id=\"4834\"><strong>The data is broken into two pieces, a training set and testing set<\/strong>. The training set is the input that we feed to our neural network. It learns the key features of various kinds of objects and then we test whether it can accurately find those objects on random data in the test image set.<\/p>\n<p id=\"d5e3\">In our program we\u2019ll use the well known<strong>\u00a0<\/strong><a href=\"http:\/\/www.cs.toronto.edu\/~kriz\/cifar.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/www.cs.toronto.edu\/~kriz\/cifar.html\" data-><strong>CIFAR-10 dataset<\/strong><\/a>\u00a0which was developed by the Canadian Institute for Advanced Research.<\/p>\n<p id=\"9caa\">CIFAR-10 has 60000 32&#215;32 color images in 10 classes, with 6000 images per class. We get 50000 training images and 10000 test images.<\/p>\n<p id=\"c24c\">When I first started working with CIFAR I mistakenly assumed it would be an easier challenge than working with the larger images of the ImageNet challenge. It turns out CIFAR10 is more challenging because the images are so tiny and there are a lot less of them, so they have less identifiable characteristics for our neural network to lock in on.<\/p>\n<p id=\"93c4\">While some of the biggest and baddest DCNN architectures like\u00a0<a href=\"https:\/\/github.com\/KaimingHe\/deep-residual-networks\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/KaimingHe\/deep-residual-networks\" data->ResNet<\/a>\u00a0can hit 97% accuracy on ImageNet, it can only hit about 87% on CIFAR 10, in my experience. The current state of the art on CIFAR 10 is\u00a0<a href=\"https:\/\/github.com\/titu1994\/DenseNet\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/titu1994\/DenseNet\" data->DenseNet<\/a>, which can hit around 95% with a monstrous 250 layers and 15 million parameters! I link to those frameworks at the bottom of the article for further exploration. But it\u2019s best to start with something simpler before diving into those complex systems.<\/p>\n<figure id=\"8103\" data-scroll=\"native\"><canvas width=\"61\" height=\"75\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/540\/1*ZCtQWsq9kPAU768IsM82Pg.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/540\/1*ZCtQWsq9kPAU768IsM82Pg.jpeg\" \/><\/figure>\n<p id=\"0e98\">Enough theory! Let\u2019s write code.<\/p>\n<p id=\"85d8\">If you\u2019re not comfortable with Python, I highly, highly, highly recommend\u00a0<a href=\"http:\/\/amzn.to\/2ldE3fs\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/amzn.to\/2ldE3fs\" data-><strong>Learning Python by Fabrizio Romano<\/strong><\/a>. This book explains everything so well. I\u2019ve never found a better Python book and I have a bunch of them that failed to teach me much.<\/p>\n<p id=\"74d4\"><strong>The code for our DCNN is based on\u00a0<\/strong><a href=\"https:\/\/github.com\/fchollet\/keras\/tree\/master\/examples\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/fchollet\/keras\/tree\/master\/examples\" data-><strong>the Keras example code on Github<\/strong><\/a><strong>.<\/strong><\/p>\n<p id=\"93e1\"><strong>You can find\u00a0<\/strong><a href=\"https:\/\/github.com\/the-laughing-monkey\/learning-ai-if-you-suck-at-math\/tree\/master\/Deep%20Learning%20Examples\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/the-laughing-monkey\/learning-ai-if-you-suck-at-math\/tree\/master\/Deep%20Learning%20Examples\" data-><strong>my modifications here<\/strong><\/a><strong>.<\/strong><\/p>\n<p id=\"5c7b\">I\u2019ve adjusted the architecture and parameters, as well as added TensorBoard to help us visualize the network.<\/p>\n<p id=\"b984\">Let\u2019s initialize our Python program, import the dataset and the various classes we\u2019ll need to build our DCNN. Luckily, Keras already knows how to get this dataset automatically so we don\u2019t have too much work to do.<\/p>\n<div id=\"ec25\"><span style=\"font-family: courier new,courier,monospace;\">from __future__ import print_function<br \/>\nimport<\/span><span style=\"font-family: courier new,courier,monospace;\">numpy<\/span><span style=\"font-family: courier new,courier,monospace;\"> as np<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\">from <\/span><span style=\"font-family: courier new,courier,monospace;\">keras<\/span><span style=\"font-family: courier new,courier,monospace;\">.datasets import cifar10<br \/>\nfrom<\/span><span style=\"font-family: courier new,courier,monospace;\">keras<\/span><span style=\"font-family: courier new,courier,monospace;\">.callbacks import TensorBoard<br \/>\nfrom<\/span><span style=\"font-family: courier new,courier,monospace;\">keras<\/span><span style=\"font-family: courier new,courier,monospace;\">.models import Sequential<br \/>\nfrom<\/span><span style=\"font-family: courier new,courier,monospace;\">keras<\/span><span style=\"font-family: courier new,courier,monospace;\">.layers<\/span><span style=\"font-family: courier new,courier,monospace;\"> import Dense, Dropout, Activation, Flatten<br \/>\nfrom<\/span><span style=\"font-family: courier new,courier,monospace;\">keras<\/span><span style=\"font-family: courier new,courier,monospace;\">.layers<\/span><span style=\"font-family: courier new,courier,monospace;\"> import Convolution2D, MaxPooling2D<br \/>\nfrom<\/span><span style=\"font-family: courier new,courier,monospace;\">keras<\/span><span style=\"font-family: courier new,courier,monospace;\">.utils import np_utils<br \/>\nfrom<\/span><span style=\"font-family: courier new,courier,monospace;\">keras<\/span><span style=\"font-family: courier new,courier,monospace;\"> import backend as K<\/span><\/div>\n<p>&nbsp;<\/p>\n<p id=\"12b6\">Our neural net starts off with a random configuration. It\u2019s as good a starting place as any but we shouldn\u2019t expect it to start off very smart. Then again, it\u2019s possible that some random configuration gives us amazing results completely by accident, so we seed the random weights to make sure that we don\u2019t end up with state of the art results by sheer dumb luck!<\/p>\n<pre id=\"37aa\">np.random.seed(1337) # Very l33t<\/pre>\n<h3 id=\"17e0\"><strong>Layers<\/strong><\/h3>\n<p id=\"2129\">Now we\u2019ll add some layers.<\/p>\n<p id=\"be67\">Most neural networks use\u00a0<strong>fully connected layers<\/strong>. That means they connect every neuron to every other neuron.<\/p>\n<p id=\"86ad\">Fully connected layers are fantastic for solving all kinds of problems. Unfortunately they don\u2019t scale very well for image recognition.<\/p>\n<p id=\"61de\">So we\u2019ll build our system using\u00a0<strong>convolutional layers<\/strong>, which are unique because\u00a0<strong>they don\u2019t connect all the neurons together<\/strong>.<\/p>\n<p id=\"9f09\">Let\u2019s see\u00a0<a href=\"http:\/\/cs231n.github.io\/convolutional-networks\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/cs231n.github.io\/convolutional-networks\/\" data->what the Stanford course on computer vision has to say about convnet scaling<\/a>:<\/p>\n<blockquote id=\"3779\"><p>\u201cIn CIFAR-10, the image are merely 32x32x3 (32 wide, 32 high, 3 color channels), so a single fully-connected neuron in a first hidden layer of a regular Neural Network would have 32*32*3 = 3072 weights. This amount still seems manageable, but clearly this fully-connected structure does not scale to larger images. For example, an image of more respectible size, e.g. 200x200x3, would lead to neurons that have 200*200*3 = 120,000 weights. Moreover, we would almost certainly want to have several such neurons, so the parameters would add up quickly! Clearly, this full connectivity is wasteful and the huge number of parameters would quickly lead to overfitting.\u201d<\/p><\/blockquote>\n<p id=\"355b\"><strong>Overfitting<\/strong>\u00a0is when you train the network so well that it kicks ass on the training data but sucks when you show it images it\u2019s never seen. In other words it\u2019s not much use in the real world.<\/p>\n<p id=\"bbac\">It\u2019s as if you played the same game of chess over and over and over again until you had it perfectly memorized. Then someone makes a different move in a real game and you have no idea what to do. We\u2019ll look at overfitting more later.<\/p>\n<p id=\"525e\">Here\u2019s how data flows through a DCNN. It looks at only a small subset of the data, hunting for patterns. It then builds those observations up into higher order understandings.<\/p>\n<figure id=\"eced\" data-scroll=\"native\"><canvas width=\"75\" height=\"33\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 322px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*3rECTefgSkJJ6Sni5sxptA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*3rECTefgSkJJ6Sni5sxptA.png\" \/><\/figure>\n<p style=\"text-align: center;\">A visual representation of a convolutional neural net from the mNeuron plugin created for MIT\u2019s computer vision courses\/teams.<\/p>\n<p id=\"c663\">Notice how the first few layers are simple patterns like edges and colors and basic shapes.<\/p>\n<p id=\"3a76\">As the information flows through the layers, the system finds more and more complex patterns, like textures, and eventually it deduces various object classes.<\/p>\n<figure id=\"3c85\" data-scroll=\"native\"><canvas width=\"75\" height=\"27\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 278px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*mMoJS0rzLztM9hklSJC8ng.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*mMoJS0rzLztM9hklSJC8ng.png\" \/><\/figure>\n<p id=\"f3e3\">The ideas were based on experiments on cat vision that showed that different cells responded to only certain kinds of stimuli such as an edge or a particular color.<\/p>\n<figure id=\"1eeb\" data-scroll=\"native\"><canvas width=\"75\" height=\"61\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 574px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*3sThpKdW6V8iQxyYeoBjKA.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*3sThpKdW6V8iQxyYeoBjKA.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\"><a href=\"https:\/\/www.youtube.com\/watch?v=PlhFWT7vAEw\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.youtube.com\/watch?v=PlhFWT7vAEw\" data->Slides from the excellent Deep Learning open course at\u00a0Oxford<\/a>.<\/p>\n<p id=\"2ace\">The same is true for humans. Our visual cells respond only to very specific features.<\/p>\n<figure id=\"9792\" data-scroll=\"native\"><canvas width=\"75\" height=\"58\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 561px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*h28Re9Ug6STaptCcdSUwCw.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*h28Re9Ug6STaptCcdSUwCw.jpeg\" \/><\/figure>\n<p id=\"e530\">Here is a typical DCNN architecture diagram:<\/p>\n<figure id=\"5021\" data-scroll=\"native\"><canvas width=\"75\" height=\"19\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 190px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*N4h1SgwbWNmtrRhszM9EJg.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*N4h1SgwbWNmtrRhszM9EJg.png\" \/><\/figure>\n<p id=\"3f0d\">You\u2019ll notice a third kind of layer in there, a<strong>\u00a0pooling layer<\/strong>. You can find all kinds of detail in the\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=bEUX_56Lojc\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.youtube.com\/watch?v=bEUX_56Lojc\" data->Oxford lectures<\/a>\u00a0and the\u00a0<a href=\"http:\/\/cs231n.github.io\/convolutional-networks\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/cs231n.github.io\/convolutional-networks\/\" data->Standford lectures<\/a>. However, I\u2019m going to skip a lot of the granular detail because most people just find it confusing. I know I did when I first tried to make sense of it.<\/p>\n<p id=\"8434\">Here\u2019s what you need to know about pooling layers. Their goal is simple. They do\u00a0<strong>subsampling<\/strong>. In other words they\u00a0<strong>shrink the input image<\/strong>, which reduces the computational load and memory usage. With less information to crunch we can work with the images more easily.<\/p>\n<p id=\"919b\">They also help reduce a second kind of overfitting where the network zeros in on anomalies in the training set that really have nothing to do with picking out dogs or birds or cats. For example, there may be some garbled pixels or some lens flares on a bunch of the images. The network may then decide that lens flare and dog go together, when they\u2019re about as closely related as an asteroid and a baby rattle.<\/p>\n<p id=\"9a88\">Lastly, most DCNNs add a few\u00a0<strong>densely connected<\/strong>, aka\u00a0<strong>fully connected layers\u00a0<\/strong>to process out all the features maps detected in earlier layers and make predictions.<\/p>\n<p id=\"287d\">So let\u2019s add a few layers to our convnet.<\/p>\n<p id=\"80af\">First we add some variables that we will pull into our layers.<\/p>\n<div id=\"093d\"><span style=\"font-family: courier new,courier,monospace;\"># Defines how many images we will process at once<br \/>\nbatch_size = 128<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\"># Defines how many types of objects we can detect in this set.\u00a0 Since CIFAR 10 only detects 10 kinds of objects, we set this to 10.<br \/>\nnb_classes = 10<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\"># The epoch defines how lone we train the system.\u00a0 Longer is not always better.\u00a0 After a period of <\/span><span style=\"font-family: courier new,courier,monospace;\">time<\/span><span style=\"font-family: courier new,courier,monospace;\"> we reach the point of diminishing returns.\u00a0 Adjust this as necessary.<br \/>\nnb_epoch = 45<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\"># Here we put in the image dimensions.\u00a0 We know the images are 32 x 32.\u00a0 They are already preprocessed for us to be nicely uniform to work with at this point.<br \/>\nimg_rows, img_cols = 32, 32<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\"># Here we set the number of convolutional filters to use<br \/>\nnb_filters = 32<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\"># size of pooling area for max pooling<br \/>\npool_size = (2, 2)<br \/>\n# convolution kernel size<br \/>\nkernel_size = (3, 3)<\/span><\/div>\n<p>&nbsp;<\/p>\n<p id=\"a1d8\">The\u00a0<strong>kernel<\/strong>\u00a0and\u00a0<strong>pooling size<\/strong>\u00a0define how the convolutional network passes over the image looking for features. The smallest kernel size would be 1&#215;1, which means we think key features are only 1 pixel wide. Typical kernel sizes check for useful features over 3 pixels at a time and then pool those features down to a 2&#215;2 grid.<\/p>\n<p id=\"0f0f\">The 2&#215;2 grid pulls the features out of the image and stacks them up like trading cards. This disconnects them from a specific spot on the image and allows the system to look for straight lines or swirls anywhere, not just in the spot it found them in the first place.<\/p>\n<p id=\"6d24\">Most tutorials describe this as dealing with \u201c<a href=\"http:\/\/stats.stackexchange.com\/questions\/208936\/what-is-translation-invariance-in-computer-vision-and-convolutional-netral-netwo\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/stats.stackexchange.com\/questions\/208936\/what-is-translation-invariance-in-computer-vision-and-convolutional-netral-netwo\" data->translation invariance<\/a>.\u201d<\/p>\n<p id=\"4613\">What the heck does that mean? Good question.<\/p>\n<p id=\"7689\">Take a look at this image again:<\/p>\n<figure id=\"7568\" data-scroll=\"native\"><canvas width=\"75\" height=\"27\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 278px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*mMoJS0rzLztM9hklSJC8ng.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*mMoJS0rzLztM9hklSJC8ng.png\" \/><\/figure>\n<p id=\"c6ca\">Without yanking the features out, like you see in layer 1 or layer 2, the system might decide that the circle of a cat\u2019s nose was only important right smack in the center of the image where it found it.<\/p>\n<p id=\"7e98\">Let\u2019s see how that works with my Dove. If the system originally finds a circle in her eye then it might mistakenly assume that the position of the circle in an image is relevant to detecting cats.<\/p>\n<figure id=\"c1c2\"><canvas width=\"75\" height=\"75\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 700px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*I3X7isryYS0M12qdliY8ag.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*I3X7isryYS0M12qdliY8ag.jpeg\" \/><\/figure>\n<p id=\"1f5b\">Instead the system should look for circles wherever they may roam, as we see below.<\/p>\n<figure id=\"02bb\"><canvas width=\"75\" height=\"75\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 700px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*AarVv_DnjzvyFnvRK8Uz5w.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*AarVv_DnjzvyFnvRK8Uz5w.jpeg\" \/><\/figure>\n<p id=\"e32d\">Before we can add the layers we need to load and process the data.<\/p>\n<div id=\"f589\"><span style=\"font-family: courier new,courier,monospace;\"># This splits the data into training and test sets and loads the data.\u00a0 Cifar10 is a standard test data set for Keras so it can download it automatically.\u00a0 It&#8217;s about 186MB expanded.<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\">(X_train, y_train), (X_test, y_test) = cifar10.load_data()<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\"># Unfortunately, TensorFlow and Theano want their tenor parameters in a different order, so we check for the backend from the json initialization file and set them accordingly.<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\">if K.image_dim_ordering() == &#8216;th&#8217;:<br \/>\nX_train = X_train.reshape(X_train.shape[0], 3, img_rows, img_cols)<br \/>\nX_test = X_test.reshape(X_test.shape[0], 3, img_rows, img_cols)<br \/>\ninput_shape = (1, img_rows, img_cols)<br \/>\nelse:<br \/>\nX_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 3)<br \/>\nX_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 3)<br \/>\ninput_shape = (img_rows, img_cols, 3)<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\">X_train = X_train.astype(&#8216;float32&#8217;)<br \/>\nX_test = X_test.astype(&#8216;float32&#8217;)<br \/>\nX_train \/= 255<br \/>\nX_test \/= 255<br \/>\nprint(&#8216;X_train shape:&#8217;, X_train.shape)<br \/>\nprint(X_train.shape[0], &#8216;train samples&#8217;)<br \/>\nprint(X_test.shape[0], &#8216;test samples&#8217;)<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\"># convert class vectors to binary class matrices<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\">Y_train = np_utils.to_categorical(y_train, nb_classes)<br \/>\nY_test = np_utils.to_categorical(y_test, nb_classes)<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\">Ok, now we\u2019re finally ready to add some layers to our program:<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\">model = Sequential()<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\">model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],<br \/>\nborder_mode=&#8217;valid&#8217;,<br \/>\ninput_shape=input_shape))<br \/>\nmodel.add(Activation(&#8216;relu&#8217;))<br \/>\nmodel.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))<br \/>\nmodel.add(Activation(&#8216;relu&#8217;))<br \/>\nmodel.add(MaxPooling2D(pool_size=pool_size))<br \/>\nmodel.add(Dropout(0.25))<\/span><\/div>\n<p>&nbsp;<\/p>\n<p id=\"9b83\">The layers are stacked as follows:<\/p>\n<ul>\n<li id=\"d3d0\">Convolution<\/li>\n<li id=\"509b\">Activation<\/li>\n<li id=\"5422\">Convolution<\/li>\n<li id=\"df40\">Activation<\/li>\n<li id=\"01f9\">Pooling<\/li>\n<li id=\"b5fe\">Dropout<\/li>\n<\/ul>\n<p id=\"ffe9\">We\u2019ve already discussed most of these layer types except for two of them,\u00a0<strong>dropout<\/strong>\u00a0and\u00a0<strong>activation<\/strong>.<\/p>\n<p id=\"288a\">Dropout is the easiest to understand. Basically it\u2019s a percentage of how much of the model to randomly kill off. This is similar to how Netflix uses\u00a0<a href=\"http:\/\/techblog.netflix.com\/2012\/07\/chaos-monkey-released-into-wild.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/techblog.netflix.com\/2012\/07\/chaos-monkey-released-into-wild.html\" data->Chaos Monkey<\/a>. They have scripts that turn off random servers in their network to ensure the network can survive with its built in resilience and redundancy. The same is true here. We want to make sure the network is not too dependent on any one feature.<\/p>\n<p id=\"b2f8\">The activation layer is a way to decide if the neuron \u201cfires\u201d or gets \u201cactivated.\u201d There are dozens of activation functions at this point. RELU is the one of the most successful because of its computational efficiency. Here is\u00a0<a href=\"https:\/\/keras.io\/activations\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/keras.io\/activations\/\" data->a list of all the different kinds of activation functions available in Keras<\/a>.<\/p>\n<p id=\"4712\">We\u2019ll also add a second stack of convolutional layers that mirror the first one. If we were rewriting this program for efficiency we would create a model generator and do a for loop to create however many stacks we want. But in this case we will just cut and paste the layers from above, violating<a href=\"http:\/\/wiki.c2.com\/?PythonPhilosophy\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/wiki.c2.com\/?PythonPhilosophy\" data->\u00a0the zen rules of Python<\/a>\u00a0for expediency sake.<\/p>\n<div id=\"4af8\"><span style=\"font-family: courier new,courier,monospace;\">model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))<br \/>\nmodel.add(Activation(&#8216;relu&#8217;))<br \/>\nmodel.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))<br \/>\nmodel.add(Activation(&#8216;relu&#8217;))<br \/>\nmodel.add(MaxPooling2D(pool_size=pool_size))<br \/>\nmodel.add(Dropout(0.25))<\/span><\/div>\n<p>&nbsp;<\/p>\n<p id=\"fc89\">Lastly, we add the dense layers, some more drop out layers and we flatten all the features maps.<\/p>\n<div id=\"147e\"><span style=\"font-family: courier new,courier,monospace;\">model.add(Flatten())<br \/>\nmodel.add(Dense(256))<br \/>\nmodel.add(Activation(&#8216;relu&#8217;))<br \/>\nmodel.add(Dropout(0.5))<br \/>\nmodel.add(Dense(nb_classes))<br \/>\nmodel.add(Activation(&#8216;softmax&#8217;))<\/span><\/div>\n<p id=\"66bd\">We use a different kind of activation called softmax on the last layer, because it defines a probability distribution over the classes.<\/p>\n<h3 id=\"9823\"><strong>Weights<\/strong><\/h3>\n<p id=\"5caa\">We talked briefly about what weights were earlier but now we\u2019ll look at them in depth.<\/p>\n<p id=\"becd\"><strong>Weights are the strength of the connection between the various neurons<\/strong>.<\/p>\n<p id=\"f453\">We have parallels for this in our own minds. In your brain, you have a series of\u00a0<strong>biological neurons<\/strong>. They\u2019re connected to other neurons with electrical\/chemical signals passing between them.<\/p>\n<p id=\"6b2c\">But the connections are not static. Over time\u00a0<strong>some of those connections get stronger and some weaker<\/strong>.<\/p>\n<p id=\"817a\">The more electro-chemical signals flowing between two biological neurons, the stronger those connections get. In essence, your brain rewires itself constantly as you have new experiences. It encodes your memories and feelings and ideas about those experiences by strengthening the connections between some neurons.<\/p>\n<figure id=\"1c5e\"><canvas width=\"61\" height=\"75\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 872px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*LXZzdoZGTyA63OW0z7edGg.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*LXZzdoZGTyA63OW0z7edGg.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\">Source U.S. National Institute of Health \u2014 Wikimedia Commons.<\/p>\n<p id=\"5ab5\">Computer based neural networks are inspired by biological ones. We call them\u00a0<strong>Artificial Neural Networks<\/strong>\u00a0or\u00a0<strong>ANN<\/strong>s for short. Usually when we say \u201cneural network\u201d what we really mean is ANN. ANN\u2019s don\u2019t function exactly the same as biological ones, so don\u2019t make the mistake of thinking an ANN is some kind of simulated brain. It\u2019s not. For example in a biological neural network (BNN), every neuron does\u00a0<em>not<\/em>\u00a0connect to every other neuron whereas in an ANN every neuron in one layer generally connects to every neuron in the next layer.<\/p>\n<p id=\"b500\">Below is an image of a BNN showing connections between various neurons. Notice they\u2019re\u00a0<em>not<\/em>\u00a0all linked.<\/p>\n<figure id=\"dc42\"><canvas width=\"41\" height=\"75\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 1272px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*KbG5-_CcuzALE32qbBG_YQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*KbG5-_CcuzALE32qbBG_YQ.png\" \/><\/figure>\n<p style=\"text-align: center;\">Source:\u00a0<a href=\"http:\/\/www.plosone.org\/article\/info%3Adoi%2F10.1371%2Fjournal.pone.0057831\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/www.plosone.org\/article\/info%3Adoi%2F10.1371%2Fjournal.pone.0057831\" data->Wikimedia Commons<\/a>: Soon-Beom HongAndrew ZaleskyLuca CocchiAlex FornitoEun-Jung ChoiHo -Hyun KimJeong -Eun SuhChang-Dai KimJae -Won KimSoon -Hyung Yi<\/p>\n<p id=\"4a4a\">Though there are many differences, there are also very strong parallels between BNNs and ANNs.<\/p>\n<p id=\"2a46\">Just like the neurons in your head form stronger or weaker connections, the weights in our artificial neural network define the strength of the connections between neurons. Each neuron knows a little bit about the world. Wiring them together allows them to have a more comprehensive view of the world when taken together. The ones that have stronger connections are considered more important for the problem we\u2019re trying to solve.<\/p>\n<p id=\"25b3\">Let\u2019s look at several screenshots of\u00a0<a href=\"http:\/\/playground.tensorflow.org\/#activation=tanh&amp;batchSize=10&amp;dataset=circle&amp;regDataset=reg-plane&amp;learningRate=0.03&amp;regularizationRate=0&amp;noise=0&amp;networkShape=4,2&amp;seed=0.45414&amp;showTestData=false&amp;discretize=false&amp;percTrainData=50&amp;x=true&amp;y=true&amp;xTimesY=false&amp;xSquared=false&amp;ySquared=false&amp;cosX=false&amp;sinX=false&amp;cosY=false&amp;sinY=false&amp;collectStats=false&amp;problem=classification&amp;initZero=false&amp;hideText=false\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/playground.tensorflow.org\/#activation=tanh&amp;batchSize=10&amp;dataset=circle&amp;regDataset=reg-plane&amp;learningRate=0.03&amp;regularizationRate=0&amp;noise=0&amp;networkShape=4,2&amp;seed=0.45414&amp;showTestData=false&amp;discretize=false&amp;percTrainData=50&amp;x=true&amp;y=true&amp;xTimesY=false&amp;xSquared=false&amp;ySquared=false&amp;cosX=false&amp;sinX=false&amp;cosY=false&amp;sinY=false&amp;collectStats=false&amp;problem=classification&amp;initZero=false&amp;hideText=false\" data-><strong>the Neural Network Playground, a visualizer for TensorFlow<\/strong><\/a>\u00a0to help understand this better.<\/p>\n<p id=\"d6a2\">The first network shows a simple six layer system. What the network is trying to do is\u00a0<strong>cleanly separate the blue dots from the orange dots in the picture on the far right<\/strong>. It\u2019s looking for the best pattern that separates them with a high degree of accuracy.<\/p>\n<p id=\"4fc3\">I have not yet started training the system here. Because of that we can see weights between neurons are mostly equal. The thin dotted lines are weak connections and the thicker lines are strong connections. The network is initialized with random weights as a starting point.<\/p>\n<figure id=\"6ba1\" data-scroll=\"native\"><canvas width=\"75\" height=\"44\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 418px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*8w8KRINjTkIvWAZexHzZaA.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*8w8KRINjTkIvWAZexHzZaA.jpeg\" \/><\/figure>\n<p id=\"2ada\">Now let\u2019s take a look at the network after we\u2019ve trained it.<\/p>\n<figure id=\"9c94\" data-scroll=\"native\"><canvas width=\"75\" height=\"44\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 435px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*zDadr-0UuBv-gGear4ueAQ.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*zDadr-0UuBv-gGear4ueAQ.jpeg\" \/><\/figure>\n<p id=\"6970\">First notice the picture on the far right. It now has a nice blue dot in the middle around the blue dots and orange around the rest of the picture. As you can see it\u2019s done pretty well, with a high degree of accuracy. This happened over 80 \u201cepochs\u201d or training rounds.<\/p>\n<p id=\"57e3\"><strong>Also notice that many of the weights have strong blue dotted lines between various neurons<\/strong>. The weights have increased and now the system is trained and ready to take on the world!<\/p>\n<h3 id=\"4dac\">Training Our Neural Net and Optimizing It<\/h3>\n<p id=\"3998\">Now let\u2019s have the model crunch some numbers. To do that we compile it and set its\u00a0<strong>optimizer<\/strong>\u00a0function.<\/p>\n<div id=\"3467\"><span style=\"font-family: courier new,courier,monospace;\">model.compile(loss=&#8217;categorical_crossentropy&#8217;,<br \/>\noptimizer=&#8217;adam&#8217;,<br \/>\nmetrics=[&#8216;accuracy&#8217;])<\/span><\/div>\n<p>&nbsp;<\/p>\n<p id=\"b523\">It took me a long time to understand the optimizer function because I find most explanations miss the \u201cwhy\u201d behind the \u201cwhat.\u201d<\/p>\n<p id=\"f5a3\">In other words, why the heck do I need an optimizer?<\/p>\n<p id=\"dfb7\">Remember that a network has\u00a0<strong>target predictions y<\/strong>\u00a0and as it\u2019s trained over many epochs it makes new\u00a0<strong>predictions y\u2019<\/strong>. The system tests these predictions against a random sample from the test dataset and that determines the system\u2019s\u00a0<strong>validation accuracy<\/strong>. A system can end up 99% accurate on the training data and only hit 50% or 70% on test images, so the real name of the game is validation accuracy, not accuracy.<\/p>\n<p id=\"ef12\"><strong>The optimizer calculates the gradient (also known as partial derivatives in math speak) of the error function with respect to the model weights<\/strong>.<\/p>\n<p id=\"9377\">What does that mean? Think of the weights distributed across a 3D hilly landscape (like you see below), which is called the \u201cerror landscape.\u201d The \u201ccoordinates\u201d of the landscape represent specific weight configurations (like coordinates on a map), while the \u201caltitude\u201d of the landscape corresponds to the total error\/cost for the different weight configurations.<\/p>\n<figure id=\"447b\"><canvas width=\"75\" height=\"61\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 575px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*YTDwPXrnfbPndXxNw77O-w.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*YTDwPXrnfbPndXxNw77O-w.png\" \/><\/figure>\n<p style=\"text-align: center;\">Error landscape<\/p>\n<p id=\"f6c5\">The\u00a0<strong>optimizer<\/strong>\u00a0serves one important function. It figures out<strong>\u00a0how to adjust the weights to try to minimize the errors<\/strong>. It does this by taking a page from the book of calculus.<\/p>\n<p id=\"18e8\">What is calculus? Well if you turn to any math text book you\u2019ll find some super unhelpful explanations such as it\u2019s all about calculating derivatives or differentials. But what the heck does that mean?<\/p>\n<figure id=\"f922\" data-scroll=\"native\"><canvas width=\"53\" height=\"75\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/540\/1*WCsum9lcn6MPfkewWIHPog.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/540\/1*WCsum9lcn6MPfkewWIHPog.jpeg\" \/><\/figure>\n<p id=\"aeff\">I didn\u2019t understand it until I read\u00a0<a href=\"http:\/\/amzn.to\/2lOCeJT\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/amzn.to\/2lOCeJT\" data-><strong>Calculus Better Explained, by Kalid Azad<\/strong><\/a>.<\/p>\n<p id=\"4e0b\">Here\u2019s what nobody bothers to explain.<\/p>\n<p id=\"6fbf\"><strong>Calculus does two things:<\/strong><\/p>\n<ul>\n<li id=\"209f\"><strong>Breaks things down into smaller chunks, aka a circle into rings.<\/strong><\/li>\n<li id=\"eb18\"><strong>Figures out rates of change.<\/strong><\/li>\n<\/ul>\n<p id=\"cd37\">In other words if I slice up a circle into rings:<\/p>\n<figure id=\"ecce\"><canvas width=\"75\" height=\"38\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*IZ7PhVFiz_nOJ1tCxfMIHg.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*IZ7PhVFiz_nOJ1tCxfMIHg.png\" \/><\/figure>\n<p style=\"text-align: center;\">Courtesy of the awesome\u00a0<a href=\"https:\/\/betterexplained.com\/calculus\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/betterexplained.com\/calculus\/\" data->Calculus Explained website<\/a>.<\/p>\n<p id=\"b7d7\">I can unroll the rings to do some simple math on it:<\/p>\n<figure id=\"7197\"><canvas width=\"75\" height=\"63\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*kmqnNplLKSTZglJ6qcigAg.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*kmqnNplLKSTZglJ6qcigAg.png\" \/><\/figure>\n<p id=\"3fe3\">Bam!<\/p>\n<p id=\"883a\">In our case we run a bunch of tests, adjust the weights of the network but did we actually get any closer to an better solution to the problem? The optimizer tells us that!<\/p>\n<p id=\"70d6\">You can read about\u00a0<strong>gradient descent<\/strong>\u00a0with\u00a0an incredible amount of detail here\u00a0or in the\u00a0<a href=\"http:\/\/cs231n.github.io\/optimization-1\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/cs231n.github.io\/optimization-1\/\" data->Stanford course<\/a>\u00a0but you\u2019ll probably find like I did that they\u2019re long on detail and light on the crucial question of why.<\/p>\n<p id=\"08aa\">In essence, what you\u2019re trying to do is minimize the errors. It\u2019s a bit like driving around in the fog. In an earlier version of this post, I characterized gradient descent as a way to to find an optimal solution. But actually, there is really no way to know if we have an \u201coptimal\u201d solution at all. If we knew what that was, we would just go right to it. Instead we are trying to find a \u201cbetter\u201d solution that works. This is a bit like evolution. We find something that is fit enough to survive but that doesn\u2019t mean we created Einstein!<\/p>\n<figure id=\"a10a\" data-scroll=\"native\"><canvas width=\"75\" height=\"52\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 496px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*dU9pqAQk4ZeGjKo8tebzIw.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*dU9pqAQk4ZeGjKo8tebzIw.jpeg\" \/><\/figure>\n<p id=\"9396\"><strong>Think of gradient descent like when you played Marco Polo as a kid<\/strong>.<\/p>\n<p id=\"1308\">You closed your eyes and all your friends spread out in the pool. You shouted out \u201cMarco\u201d and all the kids had to answer \u201cPolo.\u201d You used your ears to figure if you were getting closer or farther away. If you were farther away you adjusted and tried a different path. If you were closer you kept going in that direction. Here we\u2019re figuring out how best to adjust the weights of the network to help them get closer to understanding the world.<\/p>\n<p id=\"86cf\">We chose the \u201cadam\u201d optimizer\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1412.6980\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/arxiv.org\/abs\/1412.6980\" data->described in this paper<\/a>. I\u2019ve found through brute force changing my program that it seems to produce the best results. This is the art of data science. There is no one algorithm to rule them all. If I changed the architecture of the network, I might find a different optimizer worked better.<\/p>\n<p id=\"c786\">Here is a list of all<a href=\"https:\/\/keras.io\/optimizers\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/keras.io\/optimizers\/\" data->\u00a0the various optimizers in Keras<\/a>.<\/p>\n<p id=\"a160\">Next we set up TensorBoard so we can visualize how the network performs.<\/p>\n<div id=\"918f\"><span style=\"font-family: courier new,courier,monospace;\"># Set up TensorBoard<\/span><br \/>\n<span style=\"font-family: courier new,courier,monospace;\">tb<\/span><span style=\"font-family: courier new,courier,monospace;\"> = TensorBoard(log_dir=&#8217;.\/logs&#8217;)<\/span><\/div>\n<p>&nbsp;<\/p>\n<p id=\"3179\">All we did was create a log directory. Now we will train the model and point TensorBoard at the logs.<\/p>\n<div id=\"6adf\"><span style=\"font-family: courier new,courier,monospace;\">model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, validation_data=(X_test, Y_test), callbacks=[tb])<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\">score = model.evaluate(X_test, Y_test, verbose=0)<\/span><\/div>\n<div><span style=\"font-family: courier new,courier,monospace;\">print(&#8216;Test score:&#8217;, score[0])<br \/>\nprint(&#8220;Accuracy: %.2f%%&#8221; % (score[1]*100))<\/span><\/div>\n<p>&nbsp;<\/p>\n<p id=\"586d\">All right, let\u2019s fire this bad boy up and see how it does!<\/p>\n<div id=\"3d8e\">50000\/50000 [==============================] &#8211; 3s &#8211; loss: 0.4894 &#8211; acc: 0.8253 &#8211; val_loss: 0.6288 &#8211; val_acc: 0.7908<br \/>\nEpoch 89\/100<br \/>\n50000\/50000 [==============================] &#8211; 3s &#8211; loss: 0.4834 &#8211; acc: 0.8269 &#8211; val_loss: 0.6286 &#8211; val_acc: 0.7911<br \/>\nEpoch 90\/100<br \/>\n50000\/50000 [==============================] &#8211; 3s &#8211; loss: 0.4908 &#8211; acc: 0.8224 &#8211; val_loss: 0.6169 &#8211; val_acc: 0.7951<br \/>\nEpoch 91\/100<br \/>\n50000\/50000 [==============================] &#8211; 4s &#8211; loss: 0.4817 &#8211; acc: 0.8238 &#8211; val_loss: 0.6052 &#8211; val_acc: 0.7952<br \/>\nEpoch 92\/100<br \/>\n50000\/50000 [==============================] &#8211; 4s &#8211; loss: 0.4863 &#8211; acc: 0.8228 &#8211; val_loss: 0.6151 &#8211; val_acc: 0.7930<br \/>\nEpoch 93\/100<br \/>\n50000\/50000 [==============================] &#8211; 3s &#8211; loss: 0.4837 &#8211; acc: 0.8255 &#8211; val_loss: 0.6209 &#8211; val_acc: 0.7964<br \/>\nEpoch 94\/100<br \/>\n50000\/50000 [==============================] &#8211; 4s &#8211; loss: 0.4874 &#8211; acc: 0.8260 &#8211; val_loss: 0.6086 &#8211; val_acc: 0.7967<br \/>\nEpoch 95\/100<br \/>\n50000\/50000 [==============================] &#8211; 3s &#8211; loss: 0.4849 &#8211; acc: 0.8248 &#8211; val_loss: 0.6206 &#8211; val_acc: 0.7919<br \/>\nEpoch 96\/100<br \/>\n50000\/50000 [==============================] &#8211; 4s &#8211; loss: 0.4812 &#8211; acc: 0.8256 &#8211; val_loss: 0.6088 &#8211; val_acc: 0.7994<br \/>\nEpoch 97\/100<br \/>\n50000\/50000 [==============================] &#8211; 3s &#8211; loss: 0.4885 &#8211; acc: 0.8246 &#8211; val_loss: 0.6119 &#8211; val_acc: 0.7929<br \/>\nEpoch 98\/100<br \/>\n50000\/50000 [==============================] &#8211; 3s &#8211; loss: 0.4773 &#8211; acc: 0.8282 &#8211; val_loss: 0.6243 &#8211; val_acc: 0.7918<br \/>\nEpoch 99\/100<br \/>\n50000\/50000 [==============================] &#8211; 3s &#8211; loss: 0.4811 &#8211; acc: 0.8271 &#8211; val_loss: 0.6201 &#8211; val_acc: 0.7975<br \/>\nEpoch 100\/100<br \/>\n50000\/50000 [==============================] &#8211; 3s &#8211; loss: 0.4752 &#8211; acc: 0.8299 &#8211; val_loss: 0.6140 &#8211; val_acc: 0.7935<br \/>\nTest score: 0.613968349266<br \/>\nAccuracy: 79.35%<\/div>\n<p>&nbsp;<\/p>\n<p id=\"bd9b\">We hit 79% accuracy after 100 epochs. Not bad for a few lines of code. Now you might think 79% is not that great, but remember that in 2011, that was better than state of the art on Imagenet and it took a decade to get there! And we did that with just some example code from the Keras Github and a few tweaks.<\/p>\n<figure id=\"e6d3\" data-scroll=\"native\"><canvas width=\"75\" height=\"41\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 393px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*LF6NWh79JPp6lRJa4JyusQ.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*LF6NWh79JPp6lRJa4JyusQ.jpeg\" \/><\/figure>\n<p id=\"ae7e\">You\u2019ll notice that in 2012 is when new ideas started to make an appearance.<\/p>\n<p id=\"bf8f\">AlexNet, by AI researchers Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton, is the first orange dot. It marked the beginning of the current renaissance in deep learning. By the next year everyone was using deep learning. By 2014 the winning architecture was better than human level image recognition.<\/p>\n<p id=\"7743\">Even so, these architectures are often very tied to certain types of problems. Several of the most popular architectures today, like\u00a0<a href=\"https:\/\/github.com\/raghakot\/keras-resnet\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/raghakot\/keras-resnet\" data-><strong>ResNet<\/strong><\/a>\u00a0and\u00a0<strong>Google\u2019s\u00a0<\/strong><a href=\"https:\/\/github.com\/tensorflow\/models\/tree\/master\/inception\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/tensorflow\/models\/tree\/master\/inception\" data-><strong>Inception V3<\/strong><\/a><strong>\u00a0do\u00a0<\/strong><a href=\"http:\/\/oduerr.github.io\/blog\/2016\/04\/06\/Deep-Learning_for_lazybones\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/oduerr.github.io\/blog\/2016\/04\/06\/Deep-Learning_for_lazybones\" data-><strong>only 88% on the tiny CIFAR10 images<\/strong><\/a>. They do even worse on the larger CIFAR100 set.<\/p>\n<p id=\"3b3b\">The current state of the art is\u00a0<strong>DenseNet<\/strong>, which won the ImageNet contest last year in 2016. It chews through CIFAR10,<strong>\u00a0hitting a killer 94.81% accuracy<\/strong>with an insanely deep 250 layers and 15.3 million connections! It is an absolute monster to run. On a single Nvidia 1080GTX, if you run it with the 40 x 12 model which hits the 93% accuracy mark you see in the chart below, it will take a month to run. Ouch!<\/p>\n<figure id=\"84a3\" data-scroll=\"native\"><canvas width=\"75\" height=\"30\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 292px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*wYmTbgX6nwRe-zZedtdwpA.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*wYmTbgX6nwRe-zZedtdwpA.jpeg\" \/><\/figure>\n<p id=\"db19\">That said, I encourage you to explore these models in depth to see what you can learn from them.<\/p>\n<p id=\"ed26\"><strong>I did some experimenting and managed to hack together a weird architecture through brute force experimentation that achieve 81.40% accuracy using nothing but the build in Keras layers and no custom layers.\u00a0<\/strong><a href=\"https:\/\/github.com\/the-laughing-monkey\/learning-ai-if-you-suck-at-math\/blob\/master\/Deep%20Learning%20Examples\/keras-example-simple-convnet-15c6.py\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/the-laughing-monkey\/learning-ai-if-you-suck-at-math\/blob\/master\/Deep%20Learning%20Examples\/keras-example-simple-convnet-15c6.py\" data-><strong>You can find that on Github here<\/strong><\/a><strong>.<\/strong><\/p>\n<div id=\"5b4c\">\u00a0Epoch 70\/75<br \/>\n50000\/50000 [==============================] &#8211; 10s &#8211; loss: 0.3503 &#8211; acc: 0.8761 &#8211; val_loss: 0.6229 &#8211; val_acc: 0.8070<br \/>\nEpoch 71\/75<br \/>\n50000\/50000 [==============================] &#8211; 10s &#8211; loss: 0.3602 &#8211; acc: 0.8740 &#8211; val_loss: 0.6039 &#8211; val_acc: 0.8085<br \/>\nEpoch 72\/75<br \/>\n50000\/50000 [==============================] &#8211; 10s &#8211; loss: 0.3543 &#8211; acc: 0.8753 &#8211; val_loss: 0.5986 &#8211; val_acc: 0.8094<br \/>\nEpoch 73\/75<br \/>\n50000\/50000 [==============================] &#8211; 10s &#8211; loss: 0.3461 &#8211; acc: 0.8780 &#8211; val_loss: 0.6052 &#8211; val_acc: 0.8147<br \/>\nEpoch 74\/75<br \/>\n50000\/50000 [==============================] &#8211; 10s &#8211; loss: 0.3418 &#8211; acc: 0.8775 &#8211; val_loss: 0.6457 &#8211; val_acc: 0.8019<br \/>\nEpoch 75\/75<br \/>\n50000\/50000 [==============================] &#8211; 10s &#8211; loss: 0.3440 &#8211; acc: 0.8776 &#8211; val_loss: 0.5992 &#8211; val_acc: 0.8140<br \/>\nTest score: 0.599217191744<br \/>\nAccuracy: 81.40%<\/div>\n<p>&nbsp;<\/p>\n<p id=\"0ccd\">We can load up TensorBoard to visualize how we did as well.<\/p>\n<pre id=\"d4d2\">tensorboard --logdir=.\/logs<\/pre>\n<p id=\"ba24\">Now open a browser and go to the following URL:<\/p>\n<pre id=\"c63f\">127.0.1.1:6006<\/pre>\n<p id=\"ad6e\">Here is a screenshot of the training over time.<\/p>\n<figure id=\"c011\" data-scroll=\"native\"><canvas width=\"75\" height=\"47\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 445px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*J7_zbkcxImFfXjxgt7yvew.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*J7_zbkcxImFfXjxgt7yvew.jpeg\" \/><\/figure>\n<p id=\"ffb1\">You can see we quickly start to pass the point of diminishing returns at around 35 epochs and 79%. The rest of the time is spent getting it to 81.40% and likely overfitting at anything beyond 75 epochs.<\/p>\n<p id=\"78b5\">So how would you improve this model?<\/p>\n<p id=\"6256\">Here are a few strategies:<\/p>\n<ul>\n<li id=\"5262\">Implement your own custom layers<\/li>\n<li id=\"81ae\">Do image augmentation, like flipping images, enhancing them, warping them, cloning them, etc<\/li>\n<li id=\"c748\">Go deeper<\/li>\n<li id=\"f72e\">Change the settings on the layers<\/li>\n<li id=\"7cf1\">Read through the winning architecture papers and stack up your own model that has similar characteristics<\/li>\n<\/ul>\n<p id=\"4f9a\">And thus you have reached the real\u00a0<strong>art of data science, which is using your brain to understand the data and hand craft a model to understand it better. Perhaps you dig deep into CIFAR10 and notice that upping the contrast on those images would really make images stand out.\u00a0<\/strong>Do it!<\/p>\n<p id=\"41a7\">Don\u2019t be afraid to load things up in Photoshop and start messing with filters to see if images get sharper and clearer. Figure out if you can do the same thing with Keras image manipulation functions.<\/p>\n<p id=\"7ae7\">Deep learning is far from a magic bullet. It requires patience and dedication to get right.<\/p>\n<p id=\"01b7\">It can do incredible things but you may find yourself glued to your workstation watching numbers tick by for hours until 2 in the morning, getting absolutely nowhere.<\/p>\n<p id=\"be73\"><strong>But then you hit a breakthrough!<\/strong><\/p>\n<p id=\"c3c8\">It\u2019s a bit like the trial and error a neural net goes through. Try some stuff, get closer to an answer. Try something else and get farther away.<\/p>\n<p id=\"1f4e\">I am now exploring\u00a0<a href=\"http:\/\/nn.cs.utexas.edu\/?neat\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/nn.cs.utexas.edu\/?neat\" data->how to use genetic algorithms to auto-evolve neural nets<\/a>. There\u2019s been\u00a0<a href=\"http:\/\/neat-python.readthedocs.io\/en\/latest\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/neat-python.readthedocs.io\/en\/latest\/\" data->a bunch of work done on this front<\/a>\u00a0but not enough!<\/p>\n<p id=\"1d06\">Eventually we\u2019ll hit a point where many of the architectures are baked and easy to implement by pulling in some libraries and some pre-trained weights files but that is a few years down the road for enterprise IT.<\/p>\n<p id=\"209d\">This field is still fast developing and new ideas are coming out every day. The good news is you are on the early part of the wave. So get comfortable and start playing around with your own models.<\/p>\n<p id=\"00ee\">Study. Experiment. Learn.<\/p>\n<p id=\"31c7\">Do that and you can\u2019t go wrong.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today, we&rsquo;re going to write our own Python image recognition program. To do that, we&rsquo;ll explore a powerful deep learning architecture called a deep convolutional neural network (DCNN). Convnets are the workhorses of computer vision. They power everything from self-driving cars to Google&rsquo;s image search.&nbsp; So why are neural networks so powerful? One key reason: They do&nbsp;automatic pattern recognition. So what&rsquo;s pattern recognition and why do we care if it&rsquo;s automatic? Patterns come in many forms but let&rsquo;s take two critical examples: The features that define a physical form.<\/p>\n","protected":false},"author":393,"featured_media":24223,"comment_status":"open","ping_status":"open","sticky":false,"template":"single-post-2.php","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[97],"ppma_author":[2209],"class_list":["post-1013","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence"],"authors":[{"term_id":2209,"user_id":393,"is_guest":0,"slug":"daniel-jeffries","display_name":"Daniel Jeffries","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Jeffries","first_name":"Daniel","job_title":"","description":"Dan Jeffries is an author, engineer and serial entrepreneur. During his two decades in the computer industry, he&#039;s covered a broad range of tech from Linux to networks and virtualization.&nbsp;"}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1013","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/393"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1013"}],"version-history":[{"count":4,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1013\/revisions"}],"predecessor-version":[{"id":27954,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1013\/revisions\/27954"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/24223"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1013"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1013"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1013"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1013"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}