{"id":1408,"date":"2019-02-15T10:32:08","date_gmt":"2019-02-15T10:32:08","guid":{"rendered":"http:\/\/kusuaks7\/?p=1013"},"modified":"2023-08-21T17:41:50","modified_gmt":"2023-08-21T17:41:50","slug":"the-deep-learning-dictionary","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/the-deep-learning-dictionary\/","title":{"rendered":"The Deep Learning Dictionary"},"content":{"rendered":"<p><strong><em>Ready to learn Artificial Intelligence? <a href=\"https:\/\/www.experfy.com\/training\/courses\">Browse courses<\/a>\u00a0like\u00a0 <a href=\"https:\/\/www.experfy.com\/training\/courses\/uncertain-knowledge-and-reasoning-in-artificial-intelligence\">Uncertain Knowledge and Reasoning in Artificial Intelligence<\/a> developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/em><\/strong><\/p>\n<blockquote><p>Ever struggle to recall what Adam, ReLU or YOLO mean? Look no further and check out every term you need to master Deep Learning.<\/p><\/blockquote>\n<p>Surviving in the Deep Learning world means understanding and navigating through the jungle of technical terms. You\u2019re not sure what AdaGrad, Dropout, or Xavier Initialization mean? Use this guide as a reference to freshen up your memory when you stumble upon a term that you safely parked in a dusty corner in the back of your mind.<\/p>\n<p>This dictionary aims to briefly explain the most important terms of Deep Learning. It contains short explanations of the terms, accompanied by links to follow-up posts, images, and original papers. The post aims to be equally useful for Deep Learning beginners and practitioners.<\/p>\n<p>Let\u2019s open the encyclopedia of deep learning.<\/p>\n<section>\n<p id=\"a3ee\"><a href=\"https:\/\/towardsdatascience.com\/activation-functions-neural-networks-1cbd9f8d91d6\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/towardsdatascience.com\/activation-functions-neural-networks-1cbd9f8d91d6\" data-><strong>Activation Function<\/strong><\/a>\u2014 Used to create a non-linear transformation of the input. The inputs are multiplied by weights and added to a bias term. Popular Activation functions include ReLU, tanh or sigmoid.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*Em576iPQRQpL21_uCknVHA.jpeg\" \/><\/p>\n<p style=\"text-align: center;\">Source:\u00a0<a href=\"https:\/\/bit.ly\/2GBeocg\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/bit.ly\/2GBeocg\" data->https:\/\/bit.ly\/2GBeocg<\/a><\/p>\n<p id=\"eb1c\"><a href=\"https:\/\/machinelearningmastery.com\/adam-optimization-algorithm-for-deep-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/machinelearningmastery.com\/adam-optimization-algorithm-for-deep-learning\/\" data-><strong>Adam Optimization<\/strong> <\/a>\u2014 Can be used instead of stochastic gradient descent optimization methods to iteratively adjust network weights. Adam is computationally efficient, works well with large data sets, and requires little hyperparameter tuning, according to the\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1412.6980\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/arxiv.org\/abs\/1412.6980\" data->inventors<\/a>.\u00a0Adam uses an adaptive learning rate\u00a0<em>\u03b1<\/em>, instead of a predefined and fixed learning rate.\u00a0Adam is currently the default optimization algorithm in deep learning models.<\/p>\n<p id=\"fc35\"><a href=\"https:\/\/www.youtube.com\/watch?v=8NgKbjFbwJg\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.youtube.com\/watch?v=8NgKbjFbwJg\" data-><strong>Adaptive Gradient Algorithm <\/strong><\/a><strong>\u2014<\/strong> AdaGrad is a gradient descent optimization algorithm that features an adjustable learning rate for every parameter. AdaGrad adjusts the parameters on frequently updated parameters in smaller steps than for less frequently updated parameters. It thus fares well on very sparse data sets, e.g. for adapting word embeddings in Natural Language Processing tasks. Read the paper\u00a0<a href=\"http:\/\/www.jmlr.org\/papers\/volume12\/duchi11a\/duchi11a.pdf\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/www.jmlr.org\/papers\/volume12\/duchi11a\/duchi11a.pdf\" data->here<\/a>.<\/p>\n<figure id=\"528b\" data-scroll=\"native\"><canvas width=\"75\" height=\"28\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/480\/1*q0lk6B6gzvsSQSDn-20zJA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/480\/1*q0lk6B6gzvsSQSDn-20zJA.png\" \/><\/figure>\n<p id=\"2770\"><a href=\"http:\/\/ufldl.stanford.edu\/tutorial\/supervised\/Pooling\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/ufldl.stanford.edu\/tutorial\/supervised\/Pooling\/\" data-><strong>Average Pooling<\/strong><\/a> \u2014 Averages the results of a convolutional operation. It is often used to shrink the size of an input. Average pooling was primarily used in older Convolutional Neural Networks architectures, while recent architectures favor maximum pooling.<\/p>\n<p id=\"179d\"><a href=\"http:\/\/vision.stanford.edu\/teaching\/cs231b_spring1415\/slides\/alexnet_tugce_kyunghee.pdf\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/vision.stanford.edu\/teaching\/cs231b_spring1415\/slides\/alexnet_tugce_kyunghee.pdf\" data-><strong>AlexNet <\/strong><\/a>\u2014 A popular CNN architecture with eight layers. It is a more extensive network architecture than LeNet and takes longer to train. AlexNet won the 2012 ImageNet image classification challenge. Read the paper\u00a0<a href=\"https:\/\/papers.nips.cc\/paper\/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/papers.nips.cc\/paper\/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf\" data->here<\/a>.<\/p>\n<figure id=\"a14b\"><canvas width=\"75\" height=\"34\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*fHUrcQXJgvM9CNVMn9ImLA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*fHUrcQXJgvM9CNVMn9ImLA.png\" \/><\/figure>\n<p style=\"text-align: center;\">Source:\u00a0<a href=\"https:\/\/goo.gl\/BVXbhL\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">https:\/\/goo.gl\/BVXbhL<\/a><\/p>\n<p id=\"2045\"><a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap2.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap2.html\" data-><strong>Backpropagation<\/strong>\u00a0<\/a>\u2014The general framework used to adjust network weights to minimize the loss function of a neural network. The algorithm travels backward through the network and adjusts the weights through a form of gradient descent of each activation function.<\/p>\n<figure id=\"a3c3\"><canvas width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*z__EbffsSBQI90Ox56EEMg.gif\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*z__EbffsSBQI90Ox56EEMg.gif\" \/>.<\/figure>\n<p style=\"text-align: center;\">Backpropagation travels back through the network and adjusts the\u00a0weights<\/p>\n<p id=\"f29d\"><a href=\"https:\/\/towardsdatascience.com\/difference-between-batch-gradient-descent-and-stochastic-gradient-descent-1187f1291aa1\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>Batch Gradient Descent<\/strong><\/a> \u2014 Regular gradient descent optimization algorithm. Performs parameter updates for the entire training set. The algorithm needs to calculate the gradients of the whole training set before completing a step of parameter updates. Thus, batch gradient can be very slow for large training sets.<\/p>\n<p id=\"59db\"><strong>Batch Normalization<\/strong> \u2014 Normalizes the values in a neural network layer to values between 0 and 1. This helps train the neural network faster.<\/p>\n<p id=\"1966\"><a href=\"https:\/\/datascience.stackexchange.com\/questions\/361\/when-is-a-model-underfitted\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/datascience.stackexchange.com\/questions\/361\/when-is-a-model-underfitted\" data-><strong>Bias\u00a0<\/strong><\/a>\u2014Occurs when the model does not achieve a high accuracy on the training set. It is also called underfitting. When a model has a high bias, it will generally not yield high accuracy on the test set.<\/p>\n<figure id=\"bb78\"><canvas width=\"75\" height=\"28\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*3O5pvKZ95nzJsSYamsfDLA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*3O5pvKZ95nzJsSYamsfDLA.png\" \/><\/figure>\n<p style=\"text-align: center;\">Source:\u00a0<a href=\"https:\/\/goo.gl\/htKsQS\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/goo.gl\/htKsQS\" data->https:\/\/goo.gl\/htKsQS<\/a><\/p>\n<p id=\"10b4\"><a href=\"https:\/\/medium.com\/fuzz\/machine-learning-classification-models-3040f71e2529\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>Classification <\/strong><\/a>\u2014 When the target variable belongs to a distinct class, not a continuous variable. Image classification, fraud detection or natural language processing are examples of deep learning classification tasks.<\/p>\n<p id=\"ee46\"><a href=\"http:\/\/colah.github.io\/posts\/2014-07-Understanding-Convolutions\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/colah.github.io\/posts\/2014-07-Understanding-Convolutions\/\" data-><strong>Convolution <\/strong><\/a>\u2014 A mathematical operation which multiplies an input with a filter. Convolutions are the foundation of Convolutional Neural Networks, which excel at identifying edges and objects in images.<\/p>\n<figure id=\"be9a\"><canvas width=\"75\" height=\"53\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*z-Q6M52pZvJu4a_83kvpyQ.gif\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*z-Q6M52pZvJu4a_83kvpyQ.gif\" \/><\/figure>\n<p id=\"cc75\"><a href=\"https:\/\/towardsdatascience.com\/machine-learning-fundamentals-via-linear-regression-41a5d11f5220\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/towardsdatascience.com\/machine-learning-fundamentals-via-linear-regression-41a5d11f5220\" data-><strong>Cost Function<\/strong><\/a> \u2014 Defines the difference between the calculated output and what it should be. Cost functions are one of the key ingredients of learning in deep neural networks, as they form the basis for parameter updates. The network compares the outcome of its forward propagation with the ground-truth and adjusts the network weights accordingly to minimize the cost function. The root mean squared error is a simple example of a cost function.<\/p>\n<p id=\"b87f\"><a href=\"https:\/\/deeplearning4j.org\/neuralnet-overview\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/deeplearning4j.org\/neuralnet-overview\" data-><strong>Deep Neural Network<\/strong><\/a> \u2014 A neural network with many hidden layers, usually more than five. It is not defined how many layers minimum a deep neural network has to have. Deep Neural Networks are a powerful form of machine learning algorithms which are used to determine credit risk, steer self-driving cars and detect new planets in the universe.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/480\/1*fH_japo_roB4pZzdAb8y-Q.png\" \/><\/p>\n<p style=\"text-align: center;\">Derivative of a function. Source:\u00a0<a href=\"https:\/\/goo.gl\/HqKdeg\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/goo.gl\/HqKdeg\" data->https:\/\/goo.gl\/HqKdeg<\/a><\/p>\n<p id=\"ef00\"><a href=\"https:\/\/stackoverflow.com\/questions\/14829785\/why-derivative-of-a-function-is-used-to-calculate-local-minimum-instead-of-the-a\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/stackoverflow.com\/questions\/14829785\/why-derivative-of-a-function-is-used-to-calculate-local-minimum-instead-of-the-a\" data-><strong>Derivative <\/strong><\/a>\u2014 The derivative is the slope of a function at a specific point. Derivatives are calculated to let the gradient descent algorithm adjust weight parameters towards the local minimum.<\/p>\n<p id=\"8b98\"><a href=\"https:\/\/machinelearningmastery.com\/dropout-regularization-deep-learning-models-keras\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/machinelearningmastery.com\/dropout-regularization-deep-learning-models-keras\/\" data-><strong>Dropout <\/strong><\/a>\u2014 A regularization technique which randomly eliminates nodes and its connections in deep neural networks. Dropout reduces overfitting and enables faster training of deep neural networks. Each parameter update cycle, different nodes are dropped during training. This forces neighboring nodes to avoid relying on each other too much and figuring out the correct representation themselves. It also improves the performance of certain classification tasks. Read the paper\u00a0<a href=\"http:\/\/jmlr.org\/papers\/v15\/srivastava14a.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/jmlr.org\/papers\/v15\/srivastava14a.html\" data->here<\/a>.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*QTG_LMHDhoTceb4lzO124Q.png\" \/><\/p>\n<p style=\"text-align: center;\">Source:\u00a0<a href=\"https:\/\/goo.gl\/obY4L5\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/goo.gl\/obY4L5\" data->https:\/\/goo.gl\/obY4L5<\/a><\/p>\n<p id=\"c9c5\"><a href=\"https:\/\/www.quora.com\/What-is-end-to-end-learning-in-machine-learning?utm_medium=organic&amp;utm_source=google_rich_qa&amp;utm_campaign=google_rich_qa\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>End-to-End Learning<\/strong><\/a> \u2014 An algorithm is able to solve the entire task by itself. Additional human intervention, like model switching or new data labeling, is not necessary. For example,\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1604.07316\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/arxiv.org\/abs\/1604.07316\" data->end-to-end driving<\/a>\u00a0means that the neural network figures out how to adjust the steering command just by evaluating images.<\/p>\n<p id=\"bc8f\"><a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap1.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap1.html\" data-><strong>Epoch\u00a0<\/strong><\/a>\u2014Encompasses a single forward and backward pass through the training set for\u00a0<em>every\u00a0<\/em>example. A single epoch touches every training example in an iteration.<\/p>\n<p id=\"8950\"><a href=\"https:\/\/towardsdatascience.com\/under-the-hood-of-neural-network-forward-propagation-the-dreaded-matrix-multiplication-a5360b33426\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>Forward Propagation<\/strong> <\/a>\u2014 A forward pass in deep neural networks. The input travels through the activation functions of the hidden layers until it produces a result at the end. Forward propagation is also used to predict the result of an input example after the weights have been properly trained.<\/p>\n<p id=\"e3be\"><a href=\"http:\/\/cs231n.github.io\/convolutional-networks\/#fc\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/cs231n.github.io\/convolutional-networks\/#fc\" data-><strong>Fully-Connected layer<\/strong><\/a> \u2014 A fully-connected layer transforms an input with its weights and passes the result to the following layer. This layer has access to all inputs or activations from the previous layer.<\/p>\n<p id=\"ff72\"><a href=\"https:\/\/www.quora.com\/Are-GRU-Gated-Recurrent-Unit-a-special-case-of-LSTM\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>Gated Recurrent Unit\u00a0<\/strong><\/a>\u2014A Gated Recurrent Unit (GRU) conducts multiple transformations on the given input. It is mostly used in Natural Language Processing Tasks. GRUs prevent the vanishing gradients problem in RNNs, similar to LSTMs. In contrast to LSTMs, GRUs don\u2019t use a memory unit and are thus more computationally efficient while achieving a similar performance. Read the paper\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1406.1078\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/arxiv.org\/abs\/1406.1078\" data->here<\/a>.<\/p>\n<figure id=\"d745\"><canvas width=\"75\" height=\"37\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*1m6t4gmkBZf3w8ITFxRr-w.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*1m6t4gmkBZf3w8ITFxRr-w.png\" \/><\/figure>\n<p style=\"text-align: center;\">No forget gate, in contrast to LSTM. Source:\u00a0<a href=\"https:\/\/goo.gl\/dUPtdV\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/goo.gl\/dUPtdV\" data->https:\/\/goo.gl\/dUPtdV<\/a><\/p>\n<p id=\"c50d\"><a href=\"http:\/\/datalya.com\/blog\/2017\/machine-learning-versus-human-level-performance\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>Human-Level Performance<\/strong><\/a> \u2014 The best possible performance of a group of human experts. Algorithms can exceed human-level performance. Valuable metric to compare and improve neural network against.<\/p>\n<p id=\"82c3\"><a href=\"https:\/\/www.quora.com\/What-are-hyperparameters-in-machine-learning\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>Hyperparameters <\/strong><\/a>\u2014 Determine performance of your neural network. Examples of hyperparameters are, e.g. learning rate, iterations of gradient descent, number of hidden layers, or the activation function. Not to be confused with parameters or weights, which the DNN learns itself.<\/p>\n<p id=\"6e32\"><a href=\"http:\/\/www.image-net.org\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/www.image-net.org\/\" data-><strong>ImageNet <\/strong><\/a>\u2014 Collection of thousands of images and their annotated classes. Very useful resource for image classification tasks.<\/p>\n<figure id=\"ef8e\"><canvas width=\"75\" height=\"21\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*S1WhBMR7wLN4DViZTljQ5w.gif\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*S1WhBMR7wLN4DViZTljQ5w.gif\" \/><\/figure>\n<p id=\"be3a\"><a href=\"https:\/\/stackoverflow.com\/questions\/4752626\/epoch-vs-iteration-when-training-neural-networks\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/stackoverflow.com\/questions\/4752626\/epoch-vs-iteration-when-training-neural-networks\" data-><strong>Iteration <\/strong><\/a>\u2014 Total number of forward and backward passes of a neural network. Every batch counts as one pass. If your training set has 5 batches and trains 2 epochs, then it will run 10 iterations.<\/p>\n<p id=\"4cfd\"><a href=\"https:\/\/www.kdnuggets.com\/2017\/04\/simple-understand-gradient-descent-algorithm.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.kdnuggets.com\/2017\/04\/simple-understand-gradient-descent-algorithm.html\" data-><strong>Gradient Descent <\/strong><\/a>\u2014 Helps Neural Network decide how to adjust parameters to minimize the cost function. Repeatedly adjust parameters until the global minimum is found.\u00a0<a href=\"http:\/\/ruder.io\/optimizing-gradient-descent\/\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">This post<\/a>\u00a0contains a well-explained, holistic overview of different gradient descent optimization methods.<\/p>\n<figure id=\"acf3\"><canvas width=\"75\" height=\"37\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*JZE7sKoBk6WIVkCmTo1JDw.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*JZE7sKoBk6WIVkCmTo1JDw.png\" \/><\/figure>\n<p style=\"text-align: center;\">Source:\u00a0<a href=\"https:\/\/bit.ly\/2JnOeLR\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"broken_link\">https:\/\/bit.ly\/2JnOeLR<\/a><\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/480\/1*WdPVmEQq3EW57F-rg_VDhw.png\" \/><\/p>\n<p id=\"7950\"><a href=\"http:\/\/ufldl.stanford.edu\/tutorial\/supervised\/MultiLayerNeuralNetworks\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/ufldl.stanford.edu\/tutorial\/supervised\/MultiLayerNeuralNetworks\/\" data-><strong>Layer <\/strong><\/a>\u2014 A set of activation functions which transform the input. Neural networks use multiple hidden layers to create output. You generally distinguish between the input, hidden, and output layers.<\/p>\n<p id=\"f7d8\"><a href=\"https:\/\/towardsdatascience.com\/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>Learning Rate Decay<\/strong> <\/a>\u2014 A concept to adjust the learning rate during training. Allows for flexible learning rate adjustments. In deep learning, the learning rate typically decays the longer the network is trained.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/480\/1*sGlpE-XJCSnEkTt95fID5g.png\" \/><\/p>\n<p style=\"text-align: center;\">Max pooling.<\/p>\n<p id=\"2e82\"><a href=\"http:\/\/ufldl.stanford.edu\/tutorial\/supervised\/Pooling\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/ufldl.stanford.edu\/tutorial\/supervised\/Pooling\/\" data-><strong>Maximum Pooling<\/strong><\/a> \u2014 Only selects the maximum values of a specific input area. It is often used in convolutional neural networks to reduce the size of the input.<\/p>\n<p id=\"9829\"><a href=\"http:\/\/colah.github.io\/posts\/2015-08-Understanding-LSTMs\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/colah.github.io\/posts\/2015-08-Understanding-LSTMs\/\" data-><strong>Long Short-Term Memory<\/strong><\/a> \u2014 A special form of RNN which is able to learn the context of an input. While regular RNNs suffer from vanishing gradients when corresponding inputs are located far away from each other, LSTMs can learn these long-term dependencies. Read the paper\u00a0<a href=\"http:\/\/www.bioinf.jku.at\/publications\/older\/2604.pdf\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/www.bioinf.jku.at\/publications\/older\/2604.pdf\" data->here<\/a>.<\/p>\n<figure id=\"7543\"><canvas width=\"75\" height=\"40\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*U1Wu_6KVzlN8Wls6HNcDSw.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*U1Wu_6KVzlN8Wls6HNcDSw.png\" \/><\/figure>\n<p style=\"text-align: center;\">Input and Output of an LSTM unit. Source:\u00a0<a href=\"https:\/\/bit.ly\/2GlKyMF\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/bit.ly\/2GlKyMF\" data->https:\/\/bit.ly\/2GlKyMF<\/a><\/p>\n<p id=\"33c1\"><a href=\"https:\/\/machinelearningmastery.com\/gentle-introduction-mini-batch-gradient-descent-configure-batch-size\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/machinelearningmastery.com\/gentle-introduction-mini-batch-gradient-descent-configure-batch-size\/\" data-><strong>Mini-Batch Gradient Descent<\/strong><\/a>\u2014 An optimization algorithm which runs gradient descent on smaller subsets of the training data. The method enables parallelization as different workers separately iterate through different mini-batches. For every mini-batch, compute the cost and update the weights of the mini-batch. It\u2019s an efficient combination of batch and stochastic gradient descent.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/480\/1*4Er2Lw2HtwSgd1WKzC8xoQ.jpeg\" \/><\/p>\n<p style=\"text-align: center;\">Source:\u00a0<a href=\"https:\/\/bit.ly\/2Iz7uob\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/bit.ly\/2Iz7uob\" data->https:\/\/bit.ly\/2Iz7uob<\/a><\/p>\n<p id=\"7386\"><a href=\"http:\/\/www.cs.bham.ac.uk\/~jxb\/NN\/l8.pdf\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/www.cs.bham.ac.uk\/~jxb\/NN\/l8.pdf\" data-><strong>Momentum <\/strong><\/a>\u2014 A gradient descent optimization algorithm to smooth the oscillations of stochastic gradient descent methods. Momentum calculates the average direction of the direction of the previously taken steps and adjusts the parameter update in this direction. Imagine a ball rolling downhill and using this momentum when adjusting to roll left or right. The ball rolling downhill is an analogy to gradient descent finding the local minimum.<\/p>\n<p id=\"d15c\"><a href=\"https:\/\/www.youtube.com\/watch?v=aircAruvnKk\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.youtube.com\/watch?v=aircAruvnKk\" data-><strong>Neural Network<\/strong><\/a> \u2014 A machine learning model which transforms inputs. A vanilla neural network has an input, hidden, and output layer. Neural Networks have become the tool of choice for finding complex patterns in data.<\/p>\n<p id=\"ae08\"><a href=\"https:\/\/www.youtube.com\/watch?v=SnYMimFnKuY\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.youtube.com\/watch?v=SnYMimFnKuY\" data-><strong>Non-Max Suppression<\/strong><\/a> \u2014 Algorithm used as a part of YOLO. It helps detect the correct bounding box of an object by eliminating overlapping bounding boxes with a lower confidence of identifying the object. Read the paper\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1705.02950\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/arxiv.org\/abs\/1705.02950\" data->here<\/a>.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*Mi6IPSd0gSdtCvKUBzZi9Q.jpeg\" \/><\/p>\n<p style=\"text-align: center;\">Source:https:\/\/bit.ly\/2H303sF<\/p>\n<p id=\"8f58\"><a href=\"http:\/\/karpathy.github.io\/2015\/05\/21\/rnn-effectiveness\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/karpathy.github.io\/2015\/05\/21\/rnn-effectiveness\/\" data-><strong>Recurrent Neural Networks<\/strong> <\/a>\u2014 RNNs allow the neural network to understand the context in speech, text or music. The RNN allows information to loop through the network, thus persisting important features of the input between earlier and later layers.<\/p>\n<figure id=\"c47c\"><canvas width=\"75\" height=\"25\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*JluFN2aG9VvHP0I7kFVLqw.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*JluFN2aG9VvHP0I7kFVLqw.png\" \/><\/figure>\n<p style=\"text-align: center;\">Source:\u00a0<a href=\"https:\/\/goo.gl\/nr7Hf8\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/goo.gl\/nr7Hf8\" data->https:\/\/goo.gl\/nr7Hf8<\/a><\/p>\n<p id=\"3697\"><a href=\"https:\/\/github.com\/Kulbear\/deep-learning-nano-foundation\/wiki\/ReLU-and-Softmax-Activation-Functions\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/Kulbear\/deep-learning-nano-foundation\/wiki\/ReLU-and-Softmax-Activation-Functions\" data-><strong>ReLU<\/strong><\/a>\u2014 A Rectified Linear Unit, is a simple linear transformation unit where the output is zero if the input is less than zero and the output is equal to the input otherwise. ReLU is the activation function of choice because it allows neural networks to train faster and it prevents information loss.<\/p>\n<p id=\"8a88\"><a href=\"https:\/\/machinelearningmastery.com\/linear-regression-for-machine-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/machinelearningmastery.com\/linear-regression-for-machine-learning\/\" data-><strong>Regression\u00a0<\/strong><\/a>\u2014Form of statistical learning where the output variable is a\u00a0<em>continuous\u00a0<\/em>instead of a categorical value. While classification assigns a class to the input variable, regression assigns a value that has an infinite number of possible values, typically a number. Examples are the prediction of house prices or customer age.<\/p>\n<p id=\"ce8d\"><a href=\"https:\/\/www.quora.com\/What-is-an-intuitive-explanation-of-RMSProp\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>Root Mean Squared Propagation<\/strong><\/a> \u2014 RMSProp is an extension of the stochastic gradient descent optimization method. The algorithm features a learning rate for every parameter, but not a learning rate for the entire training set. RMSProp adjusts the learning rate based on how quickly the parameters changed in previous iterations. Read the paper here.<\/p>\n<p id=\"ada5\"><a href=\"https:\/\/machinelearningmastery.com\/difference-between-a-parameter-and-a-hyperparameter\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/machinelearningmastery.com\/difference-between-a-parameter-and-a-hyperparameter\/\" data-><strong>Parameters <\/strong><\/a>\u2014 Weights of a DNN which transform the input before applying the activation function. Each layer has its own set of parameters. The parameters are adjusted through backpropagation to minimize the loss function.<\/p>\n<figure id=\"c782\"><canvas width=\"75\" height=\"34\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*g1MLj3pjVEkCa8rOrv--4w.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*g1MLj3pjVEkCa8rOrv--4w.png\" \/><\/figure>\n<p style=\"text-align: center;\">Weights of a neural\u00a0network<\/p>\n<p id=\"34f2\"><a href=\"http:\/\/ufldl.stanford.edu\/tutorial\/supervised\/SoftmaxRegression\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/ufldl.stanford.edu\/tutorial\/supervised\/SoftmaxRegression\/\" data-><strong>Softmax <\/strong><\/a>\u2014 An extension of the logistic regression function which calculates the probability of the input belonging to every one of the existing classes. Softmax is often used in the final layer of a DNN. The class with the highest probability is chosen as the predicted class. It is well-suited for classification tasks with more than two output classes.<\/p>\n<figure id=\"cfef\"><canvas width=\"75\" height=\"37\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*a-IkzViSItE1LuCV1LOIRA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*a-IkzViSItE1LuCV1LOIRA.png\" \/><\/figure>\n<p style=\"text-align: center;\">\u00a0Source:\u00a0<a href=\"https:\/\/bit.ly\/2HdWZHL\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/bit.ly\/2HdWZHL\" data->https:\/\/bit.ly\/2HdWZHL<\/a><\/p>\n<p id=\"84ce\"><a href=\"http:\/\/ufldl.stanford.edu\/tutorial\/supervised\/OptimizationStochasticGradientDescent\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/ufldl.stanford.edu\/tutorial\/supervised\/OptimizationStochasticGradientDescent\/\" data-><strong>Stochastic Gradient Descent<\/strong> <\/a>\u2014 An optimization algorithm which performs a parameter update for every\u00a0<em>single\u00a0<\/em>training example. The algorithm converges usually much faster than batch gradient descent, which performs a parameter update after calculating the gradients for the\u00a0<em>entire\u00a0<\/em>training set.<\/p>\n<p id=\"c5a0\"><a href=\"https:\/\/medium.com\/machine-learning-for-humans\/supervised-learning-740383a2feab\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>Supervised Learning<\/strong><\/a> \u2014 Form of Deep Learning where an output label exists for every input example. The labels are used to compare the output of a DNN to the ground-truth values and minimize the cost function. Other forms of Deep Learning tasks are semi-supervised training and unsupervised training.<\/p>\n<p id=\"dac2\"><a href=\"https:\/\/machinelearningmastery.com\/transfer-learning-for-deep-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/machinelearningmastery.com\/transfer-learning-for-deep-learning\/\" data-><strong>Transfer Learning<\/strong> <\/a>\u2014 A technique to use the parameters from one neural network for a different task without retraining the entire network. Use weights from a previously trained network and remove output layer. Replace the last layer with your own softmax or logistic layer and train network again. Works because lower layers often detect similar things like edges which are useful for other image classification tasks.<\/p>\n<p id=\"d405\"><a href=\"https:\/\/medium.com\/machine-learning-for-humans\/unsupervised-learning-f45587588294\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>Unsupervised Learning<\/strong><\/a> \u2014 A form of machine learning where the output class is not known. GANs or Variational Auto Encoders are used in unsupervised Deep Learning tasks.<\/p>\n<p id=\"bc7e\"><a href=\"http:\/\/www.fast.ai\/2017\/11\/13\/validation-sets\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/www.fast.ai\/2017\/11\/13\/validation-sets\/\" data-><strong>Validation Set <\/strong><\/a>\u2014 The validation set is used to find the optimal hyperparameters of a deep neural network. Generally, the DNN is trained with different combinations of hyperparameters are tested on the validation set. The best performing set of hyperparameters is then applied to make the final prediction on the test set. Pay attention to balancing the validation set. If lots of data is available, use as much as 99% for the training, 0.5% for the validation and 0.5% the test set.<\/p>\n<p id=\"9d3d\"><a href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap5.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/neuralnetworksanddeeplearning.com\/chap5.html\" data-><strong>Vanishing Gradients<\/strong><\/a> \u2014 The problem arises when training very deep neural networks. In backpropagation, weights are adjusted based on their gradient, or derivative. In deep neural networks, the gradients of the earlier layers can become so vanishingly small, that the weights are not updated at all. The ReLU activation function is suited to address this problem because it doesn\u2019t squash the input as much as other functions. Read the paper\u00a0<a href=\"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.24.7321\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.24.7321\" data->here<\/a>.<\/p>\n<p id=\"a46b\"><a href=\"https:\/\/elitedatascience.com\/overfitting-in-machine-learning\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>Variance\u00a0<\/strong><\/a>\u2014Occurs when the DNN overfits to the training data. The DNN fails to distinguish noise from pattern and models every variance in the training data. A model with high variance usually fails to accurately generalize to new data.<\/p>\n<p id=\"f5c8\"><a href=\"https:\/\/stackoverflow.com\/questions\/38379905\/what-is-vector-in-terms-of-machine-learning\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/stackoverflow.com\/questions\/38379905\/what-is-vector-in-terms-of-machine-learning\" data-><strong>Vector <\/strong><\/a>\u2014 A combination of values that are passed as inputs into an activation layer of a DNN.<\/p>\n<figure id=\"9111\" data-scroll=\"native\"><canvas width=\"75\" height=\"43\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/480\/1*uoHIcgS-Gl0R1kpGS8o56w.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/480\/1*uoHIcgS-Gl0R1kpGS8o56w.png\" \/><\/figure>\n<p id=\"58d1\"><a href=\"https:\/\/www.quora.com\/What-is-the-VGG-neural-network\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\"><strong>VGG-16<\/strong><\/a> \u2014 A popular network architecture for CNNs. It simplifies the architecture of AlexNet and has a total of 16 layers. There are many pretrained VGG models which can be applied to novel use cases through transfer learning. Read the paper\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1409.1556.pdf\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/arxiv.org\/pdf\/1409.1556.pdf\" data->here<\/a>.<\/p>\n<p id=\"1fe9\"><a href=\"http:\/\/andyljones.tumblr.com\/post\/110998971763\/an-explanation-of-xavier-initialization\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/andyljones.tumblr.com\/post\/110998971763\/an-explanation-of-xavier-initialization\" data-><strong>Xavier Initialization<\/strong> <\/a>\u2014 Xavier initialization assigns the start weights in the first hidden layer so that the input signals reach deep into the neural network. It scales the weights based on the number of neurons and outputs. This way, it prevents the signal from either becoming too small or too large later in the network.<\/p>\n<p id=\"ee14\"><a href=\"https:\/\/towardsdatascience.com\/yolo-you-only-look-once-real-time-object-detection-explained-492dc9230006\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/towardsdatascience.com\/yolo-you-only-look-once-real-time-object-detection-explained-492dc9230006\" data-><strong>YOLO <\/strong><\/a>\u2014 You Only Look Once, is an algorithm to identify objects in an image. Convolutions are used to determine the probability of an object being in a part of an image. Non-max suppression and anchor boxes are then used to correctly locate the objects. Read the paper\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1612.08242v1.pdf\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/arxiv.org\/pdf\/1612.08242v1.pdf\" data->here<\/a>.<\/p>\n<\/section>\n<section>\n<hr \/>\n<p id=\"1a03\">I hope this dictionary helped you get a clearer understanding of the terms used in the deep learning world. Keep this guide handy when taking the Coursera Deep Learning Specialization to quickly look up terms and concepts.<\/p>\n<\/section>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Surviving in the Deep Learning world means understanding and navigating through the jungle of technical terms. Use this guide as a reference to freshen up your memory when you stumble upon a term that you safely parked in a dusty corner in the back of your mind. This dictionary aims to briefly explain the most important terms of the Deep Learning. It contains short explanations of the terms, accompanied by links to follow-up posts, images, and original papers. The post aims to be equally useful for Deep Learning beginners and practitioners.&nbsp;<\/p>\n","protected":false},"author":344,"featured_media":3329,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[97],"ppma_author":[2067],"class_list":["post-1408","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence"],"authors":[{"term_id":2067,"user_id":344,"is_guest":0,"slug":"jan-zawadzki","display_name":"Jan Zawadzki","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Zawadzki","first_name":"Jan","job_title":"","description":"Jan Zawadzki is Data Scientist at Volkswagon Grooup Services with 4 years of global experience in machine learning and management consulting."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1408","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/344"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1408"}],"version-history":[{"count":4,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1408\/revisions"}],"predecessor-version":[{"id":31020,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1408\/revisions\/31020"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3329"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1408"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1408"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1408"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1408"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}