{"id":1007,"date":"2018-11-29T02:08:44","date_gmt":"2018-11-28T23:08:44","guid":{"rendered":"http:\/\/kusuaks7\/?p=612"},"modified":"2021-11-25T08:11:01","modified_gmt":"2021-11-25T08:11:01","slug":"learning-ai-if-you-suck-at-math-part4-tensors-illustrated-with-cats","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/learning-ai-if-you-suck-at-math-part4-tensors-illustrated-with-cats\/","title":{"rendered":"Learning AI if You Suck at Math\u200a-Part 4-\u200aTensors Illustrated (with Cats!)"},"content":{"rendered":"<p><strong><em>Ready to learn Machine Learning? <a href=\"https:\/\/www.experfy.com\/training\/courses\">Browse courses<\/a>\u00a0like\u00a0<a href=\"https:\/\/www.experfy.com\/training\/courses\/machine-learning-foundations-supervised-learning\">Machine Learning Foundations: Supervised Learning<\/a> developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/em><\/strong><\/p>\n<p id=\"8d45\">Welcome to part four of Learning AI if You Suck at Math. If you missed\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/learning-ai-if-you-suck-at-math-part-1\">part 1<\/a>,\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/learning-ai-if-you-suck-at-math-part-two-practical-projects\">part 2<\/a> and <a href=\"https:\/\/www.experfy.com\/blog\/learning-ai-if-you-suck-at-math-part3-building-an-ai-dream-machine\">part3<\/a>\u00a0be sure to check them out.<\/p>\n<p id=\"c179\">Maybe you\u2019ve downloaded\u00a0TensorFlow\u00a0and you\u2019re ready to get started with some deep learning?<\/p>\n<p id=\"7198\"><strong>But then you wonder: What the hell is a tensor?<\/strong><\/p>\n<p id=\"b8a5\">Perhaps you looked it up on\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Tensor\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/en.wikipedia.org\/wiki\/Tensor\" data->Wikipedia<\/a>\u00a0and now you\u2019re more confused than ever. Maybe you found this\u00a0<a href=\"https:\/\/www.grc.nasa.gov\/www\/k-12\/Numbers\/Math\/documents\/Tensors_TM2002211716.pdf\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.grc.nasa.gov\/www\/k-12\/Numbers\/Math\/documents\/Tensors_TM2002211716.pdf\" data->NASA tutorial<\/a>\u00a0and still have no idea what it\u2019s talking about?<\/p>\n<p id=\"2c0e\">The problem is most guides talk about tensors\u00a0<em>as if you already understand all the terms they\u2019re using<\/em>\u00a0to describe the math.<\/p>\n<p id=\"5210\">Have no fear!<\/p>\n<p id=\"0707\">I hated math as a kid, so if I can figure it out, you can too! We just have to explain everything in simpler terms.<\/p>\n<p id=\"cb75\"><strong>So what is a tensor and why does it flow?<\/strong><\/p>\n<h3 id=\"935f\">Tensors = Containers<\/h3>\n<p id=\"b917\">A tensor is the basic building block of modern machine learning.<\/p>\n<p id=\"1544\">At its core it\u2019s a data container. Mostly it contains numbers. Sometimes it even contains strings, but that\u2019s rare.<\/p>\n<p id=\"33aa\"><strong>So think of it as a bucket of numbers.<\/strong><\/p>\n<p id=\"5cae\">There are multiple sizes of tensors. Let\u2019s go through the most basic ones that you\u2019ll run across in deep learning, which will be between 0 and 5 dimensions.<\/p>\n<p id=\"caba\">We can visualize the various types of tensors like this (cats come later!):<\/p>\n<figure id=\"a972\" data-scroll=\"native\"><canvas width=\"75\" height=\"48\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 456px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/2000\/1*_D5ZvufDS38WkhK9rK32hQ.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/2000\/1*_D5ZvufDS38WkhK9rK32hQ.jpeg\" \/><\/figure>\n<h3 id=\"0e34\"><strong>0D Tensors\/Scalars<\/strong><\/h3>\n<p id=\"ef7e\">Every number that goes into a tensor\/container bucket is called a \u201cscalar.\u201d<\/p>\n<p id=\"dbc9\"><strong>A scalar is a single number.<\/strong><\/p>\n<p id=\"7b15\">Why don\u2019t they just call it a number you ask?<\/p>\n<p id=\"0b12\">I don\u2019t know. Maybe math peeps just like to sound cool? Scalar does sound cooler than number.<\/p>\n<p id=\"3351\">In fact you can have a single number tensor, which we call a 0D tensor, aka a tensor with 0 dimensions. It\u2019s nothing more than a bucket with a one number in it. Imagine a bucket with a single drop of water and you have a 0D tensor.<\/p>\n<p id=\"1d53\"><strong>In this tutorial we\u2019ll use Python, Keras and TensorFlow, as well as the Python library NumPy. We set all of that up in\u00a0<\/strong><a href=\"https:\/\/hackernoon.com\/learning-ai-if-you-suck-at-math-p3-building-an-ai-dream-machine-or-budget-friendly-special-d5a3023140ef#.6frka033t\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/hackernoon.com\/learning-ai-if-you-suck-at-math-p3-building-an-ai-dream-machine-or-budget-friendly-special-d5a3023140ef#.6frka033t\" data-><strong>my last tutorial, Learning AI if You Suck at Math (LAIYSAM) \u2014 Part 3<\/strong><\/a><strong>, so be sure to check that out if you want to get your deep learning workstation running fast.<\/strong><\/p>\n<p id=\"38df\">In Python, these tensors are typically stored in a NumPy arrays.\u00a0<a href=\"http:\/\/www.numpy.org\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/www.numpy.org\/\" data->NumPy<\/a>\u00a0is a scientific library for manipulating numbers that is used by pretty much every AI framework on the planet.<\/p>\n<pre id=\"e01a\">import numpy<\/pre>\n<pre id=\"14ed\">x = np.array(5)<\/pre>\n<pre id=\"c169\">print(x)<\/pre>\n<p id=\"174a\">Our output is:<\/p>\n<pre id=\"f907\">5<\/pre>\n<p id=\"5780\">On\u00a0<a href=\"https:\/\/www.kaggle.com\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.kaggle.com\/\" data->Kaggle (the data science competition site)<\/a>\u00a0you will often see Jupyter Notebooks (also installed in LAIYSAM -Part 3) that talk about turning data into a NumPy arrays. Jupyter notebooks are essentially a markup document with working code embedded. Think of it as an explanation and program rolled into one.<\/p>\n<p id=\"6526\"><strong>Why the heck would we want to turn data into a NumPy array?<\/strong><\/p>\n<p id=\"42ff\"><strong>Simple. Because we need to transform any input of data, be that strings of text, images, stock prices, or video into a universal standard that we can work with easily.<\/strong><\/p>\n<p id=\"f6a8\">In this case we transform that data into buckets of numbers so we can manipulate them with TensorFlow.<\/p>\n<p id=\"9cd0\">It\u2019s nothing more than organizing data into a usable format. In web programming you might represent via XML, so you can define its features and manipulate it quickly. Same thing. In deep learning we use tensor buckets as our basic Lego block.<\/p>\n<h3 id=\"a51a\"><strong>1D Tensors\/Vectors<\/strong><\/h3>\n<p id=\"b552\">If you\u2019re a programmer, you already know about something similar to a 1D tensor:\u00a0<strong>an array<\/strong>.<\/p>\n<p id=\"dc59\">Every programming language has arrays, which are nothing but a string of data chunks in a single row or column. In deep learning this is called a 1D tensor. Tensors are defined by how many axes they have in total. A 1D tensor has exactly one axis.<\/p>\n<p id=\"4755\"><strong>A 1D tensor is called a \u201cvector.\u201d<\/strong><\/p>\n<p id=\"498f\">We can visualize a vector as a single column or row of numbers.<\/p>\n<figure id=\"94ff\" data-scroll=\"native\"><canvas width=\"13\" height=\"75\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1200\/1*cvYGAHw_MTPp7f_XPG5teg.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1200\/1*cvYGAHw_MTPp7f_XPG5teg.jpeg\" \/><\/figure>\n<p id=\"7f11\">If we wanted to see this in NumPy we could do the following:<\/p>\n<pre id=\"cb31\">x = np.array([1,2,3,4])<\/pre>\n<pre id=\"dd6f\">print(x)<\/pre>\n<p id=\"8ca8\">Our output is:<\/p>\n<pre id=\"c0c5\">array([1,2,3,4])<\/pre>\n<p id=\"1790\">We can also visualize how many axes a tensor has by using NumPy\u2019s ndim function. Let\u2019s try it with a 1D tensor.<\/p>\n<pre id=\"17b6\">x.ndim<\/pre>\n<p id=\"26db\">Our output is:<\/p>\n<pre id=\"9d3b\">1<\/pre>\n<h3 id=\"29e8\"><strong>2D Tensors<\/strong><\/h3>\n<p id=\"ee0c\">You probably already know about another kind of tensor:\u00a0<strong>a matrix.<\/strong><\/p>\n<p id=\"07f2\"><strong>A 2D tensor is called a matrix.<\/strong><\/p>\n<figure id=\"027e\" data-scroll=\"native\"><canvas width=\"75\" height=\"63\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1200\/1*YyHZDV98sl4VY1LjMVNRJg.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1200\/1*YyHZDV98sl4VY1LjMVNRJg.jpeg\" \/><\/figure>\n<p id=\"0c4b\">No, not the movie with Keanu Reeves. Think of an Excel sheet.<\/p>\n<p id=\"f1bc\">We can visualize this as a grid of numbers with rows and columns.<\/p>\n<p id=\"df27\">Those columns and rows represent two axes. A matrix is a 2D tensor, meaning it is two dimensional, aka a tensor with 2 axes.<\/p>\n<p id=\"7af5\">In NumPy we would represent that as:<\/p>\n<pre id=\"c6e6\">x = np.array([[5,10,15,30,25],<\/pre>\n<pre id=\"ee64\">[20,30,65,70,90],<\/pre>\n<pre id=\"fe19\">[7,80,95,20,30]])<\/pre>\n<p id=\"6f68\">We can store characteristics of people in a 2D tensor. For example, a typical mailing list would fit in here.<\/p>\n<p id=\"09f6\">Let\u2019s say we have 10,000 people. We also have the following features or characteristics about each person:<\/p>\n<ul>\n<li id=\"ee76\">First Name<\/li>\n<li id=\"1eef\">Last Name<\/li>\n<li id=\"1048\">Street Address<\/li>\n<li id=\"c295\">City<\/li>\n<li id=\"5ee2\">State<\/li>\n<li id=\"a6ed\">Country<\/li>\n<li id=\"92b6\">Zip<\/li>\n<\/ul>\n<p id=\"05b0\">That means we seven characteristics for each of our ten thousand people.<\/p>\n<p id=\"3b8f\">A tensor has a \u201cshape.\u201d The shape is a bucket that fits our data perfectly and defines the maximum size of our tensor. We can fit all the data about our people into a 2D tensor that is (10000,7).<\/p>\n<p id=\"8b72\">You might be tempted to say it has 10,000 columns and 7 rows.<\/p>\n<p id=\"005b\">Don\u2019t.<\/p>\n<p id=\"f811\">A tensor can be transformed or manipulated so that columns become rows and vice versa.<\/p>\n<h3 id=\"679f\"><strong>3D Tensors<\/strong><\/h3>\n<p id=\"5dca\">This is where tensors really start to get useful. Often we have to store a number of examples of 2D tensors in their own bucket, which gives us a 3D tensor.<\/p>\n<p id=\"4ae6\">In NumPy we could represent it as follows:<\/p>\n<div id=\"c57b\"><span style=\"font-family: courier new,courier,monospace;\"><span style=\"background-color: #f0f8ff;\">x = np.array([[[5,10,15,30,25],<\/span><br \/>\n<span style=\"background-color: #f0f8ff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 [20,30,65,70,90],<\/span><br \/>\n<span style=\"background-color: #f0f8ff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 [7,80,95,20,30]]<\/span><br \/>\n<span style=\"background-color: #f0f8ff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 [[3,0,5,0,45],<\/span><br \/>\n<span style=\"background-color: #f0f8ff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 [12,-2,6,7,90],<\/span><br \/>\n<span style=\"background-color: #f0f8ff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 [18,-9,95,120,30]]<\/span><br \/>\n<span style=\"background-color: #f0f8ff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 [[17,13,25,30,15],<\/span><br \/>\n<span style=\"background-color: #f0f8ff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 [23,36,9,7,80],<\/span><br \/>\n<span style=\"background-color: #f0f8ff;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 [1,-7,-5,22,3]]])<\/span><\/span><\/div>\n<p id=\"b1af\">A 3D tensor has, you guessed it, 3 axes. We can see that like so:<\/p>\n<pre id=\"7f76\">x.ndim<\/pre>\n<p id=\"66f1\">Our output is:<\/p>\n<pre id=\"2d7e\">3<\/pre>\n<p id=\"9920\">So let\u2019s take our mailing list above. Now say we have 10 mailing lists. We would store our 2D tensor in another bucket, creating a 3D tensor. It\u2019s shape would look like this:<\/p>\n<pre id=\"a53e\">(number_of_mailing_lists, number_of_people, number_of_characteristics_per_person)<\/pre>\n<pre id=\"c2be\">(10,10000,7)<\/pre>\n<figure id=\"4176\" data-scroll=\"native\"><canvas width=\"75\" height=\"63\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 601px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1200\/1*vYFRBHZLsSu9gaN24OwxZQ.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1200\/1*vYFRBHZLsSu9gaN24OwxZQ.jpeg\" \/><\/figure>\n<p id=\"c63a\"><strong>You might have already guessed it but a 3D tensor is a cube of numbers!<\/strong><\/p>\n<p id=\"998c\">We can keep stacking cubes together to create bigger and bigger tensors to encode different types of data aka 4D tensors, 5D tensors and so on up to N. N is used by math peeps to define an unknown number of additional units in a set continuing into the future. It could be 5, 10 or a zillion.<\/p>\n<p id=\"4de7\">Actually, a 3D tensor might be better visualized as a layer of grids, which looks something like the graphic below:<\/p>\n<figure id=\"480d\"><canvas width=\"75\" height=\"58\"><\/canvas><img decoding=\"async\" style=\"width: 631px; height: 504px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*d7q997pF9vPnWC6zFHuSzg.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*d7q997pF9vPnWC6zFHuSzg.jpeg\" \/><\/figure>\n<h3 id=\"2f8b\">Common Data Stored in\u00a0Tensors<\/h3>\n<p id=\"9f25\">Here are some common types of datasets that we store in various types of tensors:<\/p>\n<ul>\n<li id=\"425e\">3D = Time series<\/li>\n<li id=\"4b8f\">4D = Images<\/li>\n<li id=\"9c2d\">5D = Videos<\/li>\n<\/ul>\n<p id=\"10c5\"><strong>In almost every one of these tensors the common thread will be sample size.\u00a0<\/strong>Sample size is the number of things in the set. That could be the number of images, the number of videos, the number of documents, or the number of tweets.<\/p>\n<p id=\"89af\">Typically, the actual data will be one less the sample_size:<\/p>\n<pre id=\"9009\">rest_of_dimensions - sample_size = actual_dimensions_of_data<\/pre>\n<p id=\"c64c\">Think of the various dimensions in the shape as fields. We are looking for the minimum number of fields that describe the data.<\/p>\n<p id=\"1ba2\">So even though a 4D tensor typically stores images, that\u2019s because sample size takes up the 4th field in the tensor.<\/p>\n<p id=\"f66b\">For example, an image is really represented by three fields, like this:<\/p>\n<pre id=\"548f\">(width, height, color_depth) = 3D<\/pre>\n<p id=\"eabd\">But we don\u2019t usually work with a single image or document in machine learning. We have a set. We might have 10,000 images of tulips, which means we have a 4D tensor, like this:<\/p>\n<pre id=\"149a\">(sample_size, width, height, color_depth) = 4D<\/pre>\n<p id=\"2623\">Let\u2019s look at multiple examples of various tensors as storage buckets.<\/p>\n<h3 id=\"9371\"><strong>Time Series\u00a0Data<\/strong><\/h3>\n<p id=\"5ab0\">3D tensors are very effective for time series data.<\/p>\n<h4 id=\"038c\">Medical Scans<\/h4>\n<p id=\"8f77\">We can encode an electroencephalogram EEG signal from the brain as a 3D tensor, because it can be encapsulated as 3 parameters:<\/p>\n<pre id=\"bc73\">(time, frequency, channel)<\/pre>\n<p id=\"c8e2\">The transformation would look like this:<\/p>\n<figure id=\"7cbe\" data-scroll=\"native\"><canvas width=\"75\" height=\"18\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 179px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/2000\/1*ggsbEHGnH6OROeQPvjRfbQ.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/2000\/1*ggsbEHGnH6OROeQPvjRfbQ.jpeg\" \/><\/figure>\n<p id=\"7ee2\">Now if we had multiple patients with EEG scans, that would become a 4D tensor, like this:<\/p>\n<pre id=\"fa38\">(sample_size, time, frequency, channel)<\/pre>\n<h4 id=\"0d1d\">Stock Prices<\/h4>\n<p id=\"73de\">Stock prices have a high, a low and a final price every minute. The New York Stock Exchange is open from 9:30 AM to 4 PM. That\u2019s 6 1\/2 hours. There are 60 minutes in an hour so 6.5 x 60 = 390 minutes. These are typically represented by a candle stick graph.<\/p>\n<figure id=\"fdf6\"><canvas width=\"75\" height=\"45\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*-bnzWcILGQQgKZKcCnHGpQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*-bnzWcILGQQgKZKcCnHGpQ.png\" \/><\/figure>\n<p id=\"9590\">We would store the high, low and final stock price for every minute in a 2D tensor of (390,3). If we captured a typical week of trading (five days), we would have a 3D tensor with the shape:<\/p>\n<pre id=\"a82d\">(week_of_data, minutes, high_low_price)<\/pre>\n<p id=\"73f3\">That would look like this:<\/p>\n<pre id=\"3faf\">(5,390,3)<\/pre>\n<p id=\"7c9d\">If we had a 10 different stocks, with one week of data each, we would have a 4D tensor with the following shape:<\/p>\n<pre id=\"6593\">(10,5,390,3)<\/pre>\n<p id=\"feea\">Let\u2019s now pretend that we had a mutual fund, which is a collection of stocks, which is represented by our 4D tensor. Perhaps we also have a collection of 25 mutual funds representing our portfolio, so now we have a collection of 4D tensors, which means we have a 5D tensor of shape:<\/p>\n<pre id=\"04e2\">(25,10,5,390,3)<\/pre>\n<h4 id=\"2a83\"><strong>Text Data<\/strong><\/h4>\n<p id=\"062e\">We can store text data in a 3D tensor too. Let\u2019s take a look at tweets.<\/p>\n<p id=\"0b16\">Tweets are 140 characters. Twitter uses the UTF-8 standard, which allows for millions of types of characters, but we are realistically only interested in the first 128 characters, as they are the same as basic ASCII. A single tweet could be encapsulated as a 2D vector of shape (140,128).<\/p>\n<p id=\"365f\">If we downloaded 1 million Donald Trump tweets ( I think he tweeted that much last week alone) we would store that as 3D tensor of shape:<\/p>\n<pre id=\"d32c\">(number_of_tweets_captured, tweet, character)<\/pre>\n<p id=\"a427\">That means our Donald Trump tweet collection would look like this:<\/p>\n<pre id=\"ea34\">(1000000,140,128)<\/pre>\n<h3 id=\"93ab\">Images<\/h3>\n<p id=\"dd72\">4D tensors are great at storing a series of images like Jpegs. As we noted earlier, an image is stored with three parameters:<\/p>\n<ul>\n<li id=\"70a3\">Height<\/li>\n<li id=\"ad9f\">Width<\/li>\n<li id=\"4373\">Color depth<\/li>\n<\/ul>\n<p id=\"dcbc\">The image is a 3D tensor, but the set of images makes it 4D. Remember that fourth field is for sample_size.<\/p>\n<p id=\"2f18\">The famous MNIST data set is a series of handwritten numbers that stood as a challenge for many data scientists for decades, but are now considered a solved problem, with machines able to achieve 99% and higher accuracy. Still, the data set remains a good way to benchmark new machine learning applications, or just to try things out for yourself.<\/p>\n<figure id=\"e93a\"><canvas width=\"75\" height=\"41\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*7HmSJOABTcRzWMVOB3fJlA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*7HmSJOABTcRzWMVOB3fJlA.png\" \/><\/figure>\n<p id=\"13a6\">Keras even allows us to automatically import the MNIST data set with the following command:<\/p>\n<div id=\"1118\"><span style=\"font-family: courier new,courier,monospace;\"><span style=\"background-color: #e6e6fa;\">from <\/span><\/span><span style=\"font-family: courier new,courier,monospace;\"><span style=\"background-color: #e6e6fa;\">keras<\/span><\/span><span style=\"font-family: courier new,courier,monospace;\"><span style=\"background-color: #e6e6fa;\">.datasets import <\/span><\/span><span style=\"font-family: courier new,courier,monospace;\"><span style=\"background-color: #e6e6fa;\">mnist<\/span><\/span><br \/>\n<span style=\"font-family: courier new,courier,monospace;\"><span style=\"background-color: #e6e6fa;\">(train_images, train_labels), (test_images, test_labels) = <\/span><\/span><span style=\"font-family: courier new,courier,monospace;\"><span style=\"background-color: #e6e6fa;\">mnist<\/span><\/span><span style=\"font-family: courier new,courier,monospace;\"><span style=\"background-color: #e6e6fa;\">.load_data()<\/span><\/span><\/div>\n<p id=\"521d\">The data set is split into two buckets:<\/p>\n<ul>\n<li id=\"0757\">training set<\/li>\n<li id=\"02d2\">test set<\/li>\n<\/ul>\n<p id=\"9ccd\">Each of the images in the sets has a label. This label gives the image the correct identification, such as the number 3 or 7 or 9, which was added by hand by a human being.<\/p>\n<p id=\"f591\">The training set is used to teach a neural net and the test set contains the data the network tries to categorize after learning.<\/p>\n<p id=\"4aa2\">The MNIST images are gray scale, which means they could be encoded as a 2D tensor, however all images are traditionally encoded as 3D tensors, with the third axis being a representation of color depth.<\/p>\n<p id=\"211c\">There are 60,000 images in the MNIST dataset. They are 28 pixels wide x 28 pixels high. They have a color depth of 1, which represents gray scale.<\/p>\n<p id=\"7f40\">TensorFlow stores image data like this:<\/p>\n<pre id=\"0aa5\">(sample_size, height, width, color_depth).<\/pre>\n<p id=\"50b0\">So we could say the 4D tensor for the MNIST dataset has a shape of:<\/p>\n<pre id=\"dffd\">(60000,28,28,1)<\/pre>\n<h3 id=\"0068\">Color Images<\/h3>\n<p id=\"f302\">Color photos can have different color depths, depending on their resolution and encoding. A typical JPG image would use RGB and so it would have a color depth of 3, one each for each red, green, blue.<\/p>\n<p id=\"605a\">This is a picture of my awesome cat Dove. It\u2019s a 750 pixel x 750 pixel image. (Actually it\u2019s 751 x 750 because I cut it wrong in Photoshop, but we\u2019ll pretend it is 750 x 750). That means we have a 3D tensor with the following characteristics:<\/p>\n<pre id=\"f36d\">(750,750,3)<\/pre>\n<figure id=\"7852\"><canvas width=\"75\" height=\"73\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 699px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*jT34jY1zYQ8DXXMzRTjGPw.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*jT34jY1zYQ8DXXMzRTjGPw.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\">My beautiful cat Dove (750 x 750\u00a0pixels)<\/p>\n<p id=\"058f\">Hence my Dove would get reduced to a series of cold equations that would look like this as it \u201ctransformed\u201d or \u201cflowed.\u201d<\/p>\n<figure id=\"28ac\" data-scroll=\"native\"><canvas width=\"75\" height=\"27\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 259px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/2000\/1*CoYFskpcBVPILLEv1wI64g.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/2000\/1*CoYFskpcBVPILLEv1wI64g.jpeg\" \/><\/figure>\n<p id=\"a0c1\">Then let\u2019s say we had a bunch of images of different types of cats, (though none will be as beautiful as Dove). Perhaps we have 100,000 not-Dove cats that were 750 pixels high by 750 pixels wide. We would define that set of data to Keras as a 4D tensor of shape:<\/p>\n<pre id=\"3153\">(10000,750,750,3)<\/pre>\n<figure id=\"7f91\" data-scroll=\"native\"><canvas width=\"75\" height=\"35\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 335px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/2000\/1*8mE6HDs6lN3sIbfOdNjO6A.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/2000\/1*8mE6HDs6lN3sIbfOdNjO6A.jpeg\" \/><\/figure>\n<h3 id=\"912e\"><strong>5D Tensors<\/strong><\/h3>\n<p id=\"f76a\">A 5D tensor can store video data. In TensorFlow video data is encoded as:<\/p>\n<pre id=\"d455\">sample_size, frames, width, height, color_depth)<\/pre>\n<p id=\"aefa\">If we took a five minute video (60 seconds x 5 = 300), at 1080p HD, which is 1920 pixels x 1080 pixels, at 15 sampled frames per second (which gives us 300 seconds x 15 = 4500), with a color depth of 3, we would store that a 4D tensor that looks like this:<\/p>\n<pre id=\"743b\">(4500,1920,1080,3)<\/pre>\n<p id=\"9576\">The fifth field in the tensor comes into play when we have multiple videos in our video set. So if we had 10 videos just like that top one, we would have a 5D tensor of shape:<\/p>\n<pre id=\"52cb\">(10,4500,1920,1080,3)<\/pre>\n<p id=\"3ed7\"><strong>Actually this example is totally insane.<\/strong><\/p>\n<p id=\"12e8\"><strong>The size of the tensor would be absolutely ridiculous, over a terabyte.<\/strong>\u00a0But let\u2019s stick with it for a moment as there\u2019s a point to doing it. Know that in the real world, we would want to down-sample the video as much as possible to make it more realistic to deal with or we would be training this model until the end of time.<\/p>\n<p id=\"65f5\">The number of values in this 5D tensor would be:<\/p>\n<pre id=\"c086\">10 x 4500 x 1920 x 1080 x 3 = 279,936,000,000<\/pre>\n<p id=\"73fd\">Keras allows us to store things as floating point numbers with 32 bits or 64 bits with a data value call (dtype):<\/p>\n<pre id=\"03ad\">float32\r\nfloat64<\/pre>\n<p id=\"5a23\">Each of these values would be stored as a 32 bit number, which means that we multiply the total number of values by 32 to transform it into bits and then convert it to Terabytes.<\/p>\n<pre id=\"aa0c\">279,936,000,000 x 32 = 8,957,952,000,000<\/pre>\n<figure id=\"06cb\"><canvas width=\"75\" height=\"23\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*pFyvb-IhW1lldEcl_rMerQ.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*pFyvb-IhW1lldEcl_rMerQ.jpeg\" \/><\/figure>\n<p id=\"acb2\">I don\u2019t even think the values would fit in a float32 (I\u2019ll let someone else do the math on that), so get down-sampling my friend!<\/p>\n<p id=\"eb70\"><strong>Actually, I used this last insane example for a reason.<\/strong><\/p>\n<p id=\"6dcc\"><strong>You just got your first lesson in\u00a0<\/strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_pre-processing\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/en.wikipedia.org\/wiki\/Data_pre-processing\" data-><strong>pre-processing<\/strong><\/a><strong>\u00a0and\u00a0<\/strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_reduction\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/en.wikipedia.org\/wiki\/Data_reduction\" data-><strong>data-reduction<\/strong><\/a><strong>.<\/strong><\/p>\n<p id=\"2001\">You can\u2019t just hurl data at an AI model with no work on your part. You have to massage and shrink the data to make it easier to work with efficiently.<\/p>\n<p id=\"2330\">Reduce the resolution, drop unneeded data (aka deduping), limit the number of frames you use, etc, etc. That is the work of a data scientists.<\/p>\n<p id=\"e064\">If you can\u2019t munge the data, you can\u2019t do anything useful with it.<\/p>\n<h3 id=\"07a9\">Conclusion<\/h3>\n<p id=\"f799\">There you have it. Now you have a much better understanding of tensors and the types of data that fit in them.<\/p>\n<p id=\"8f22\">In the next post we\u2019ll learn how to do various transformations on the tensors, also known as math.<\/p>\n<p id=\"e786\">In other words, we\u2019ll make the tensors \u201cflow.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The problem is most guides talk about tensors&nbsp;as if you already understand all the terms they&rsquo;re using&nbsp;to describe the math. So what is a tensor and why does it flow? At its core it&rsquo;s a data container. Mostly it contains numbers. Sometimes it even contains strings, but that&rsquo;s rare. There are multiple sizes of tensors. Let&rsquo;s go through the most basic ones that you&rsquo;ll run across in deep learning<\/p>\n","protected":false},"author":393,"featured_media":24219,"comment_status":"open","ping_status":"open","sticky":false,"template":"single-post-2.php","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[97],"ppma_author":[2209],"class_list":["post-1007","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence"],"authors":[{"term_id":2209,"user_id":393,"is_guest":0,"slug":"daniel-jeffries","display_name":"Daniel Jeffries","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Jeffries","first_name":"Daniel","job_title":"","description":"Dan Jeffries is an author, engineer and serial entrepreneur. During his two decades in the computer industry, he&#039;s covered a broad range of tech from Linux to networks and virtualization.&nbsp;"}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1007","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/393"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1007"}],"version-history":[{"count":4,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1007\/revisions"}],"predecessor-version":[{"id":27825,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1007\/revisions\/27825"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/24219"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1007"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1007"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1007"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1007"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}