{"id":25117,"date":"2021-06-25T19:00:30","date_gmt":"2021-06-25T19:00:30","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=25117"},"modified":"2023-08-19T11:12:28","modified_gmt":"2023-08-19T11:12:28","slug":"5-distinctions-between-machine-learning-and-deep-learning","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/5-distinctions-between-machine-learning-and-deep-learning\/","title":{"rendered":"5 Distinctions Between Machine Learning And Deep Learning"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"25117\" class=\"elementor elementor-25117\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-71a2119 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"4014\" data-id=\"71a2119\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-01790a8\" data-eae-slider=\"18296\" data-id=\"01790a8\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f45f04d elementor-widget elementor-widget-text-editor\" data-id=\"f45f04d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"has-normal-font-size\"><strong>What is Deep Learning, Anyway?<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4a7da18 elementor-widget elementor-widget-text-editor\" data-id=\"4a7da18\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>As machine learning has gained significant notoriety for its wide-spread use across an immense number of applications, from retailers targeting products\/marketing to individual consumers to high-frequency trading and quantitative models revolutionizing modern finance, and not to mention the seemingly constant media attention it gets in polemics surrounding privacy, user data, cybersecurity, etc., \u201cmachine-learning\u201d has now become part of the mainstream vernacular. As such, we can assume that the reader has some general familiarity with machine learning and the problems it attempts to solve, which the layperson (with some help\/prodding) might describe as: using quantitative methods on data to optimize predictions about some future\/unknown events or phenomena. Even that general description is a bit too narrow, as making predictions is a task specific to \u201csupervised\u201d machine learning, which is distinguished only by the fact that there is <em>something to predict<\/em> in the data (a label(s) that algorithms can use to \u201csupervise\u201d the performance of their predictions and optimize them). It would be a gross oversight to exclude \u201cunsupervised\u201d learning from the machine learning umbrella, which has more to do with <em>describing <\/em>events\/phenomena with quantitative methods on data (having no clear \u201clabels\u201d in the data to predict\/supervise performance, but still wanting to understand\/make inferences from features in the data like clusters, distributional properties, etc.). Notwithstanding is also \u201creinforcement learning\u201d that falls under the general scope of machine learning tasks, which differs from both supervised\/unsupervised learning in that (broadly speaking) the objective has more to do with \u201chow\u201d the machine makes decisions from data rather than the features\/labels themselves (e.g., procedures for a machine to predict that a particular sequence of decisions\/events will maximize some reward or minimize cost).\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e326a4d elementor-widget elementor-widget-text-editor\" data-id=\"e326a4d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Deep learning is more specific to the <a href=\"http:\/\/www.experfy.com\/blog\/ai-ml\/ai-series-deep-into-deep-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">AI<\/a> tasks that it\u2019s used for solving. Broadly speaking, deep learning performs supervised\/unsupervised tasks with very large (\u201cdeep\u201d) Artificial Neural Networks (ANNs), ostensibly replicating not only <em>what<\/em> a human would decide, but also <em>how<\/em> humans come to those decisions. Standard examples include mimicking the firing of a critical mass of neurons in the human brain with deep Multi-Layer Perceptrons (MLPs) in AI, or using Convolutional Neural Networks (CNNs) in AI to approximate how groups of neurons are stimulated as we focus\/narrow our sight to a visual receptive field, or even mimicking how humans learn language as we experience a vast collection of words, phrases, or documents with AI that uses Natural Language Processing (NLP), word-embeddings, encoders, transformers, Long\/Short Term Memory models (LSTMs\/GRUs), and many others. While the human brain doesn\u2019t actually process information and make decisions like these ANNs do, much like machine learning itself, the architecture of these deep-learning models was inspired by the way that humans think and observe processes.\u00a0\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3b9dbd0 elementor-widget elementor-widget-text-editor\" data-id=\"3b9dbd0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Thus, before diving too deeply into what deep learning is and its technical characteristics, we have already stumbled onto our first point of \u201cdistinction\u201d between machine\/deep learning.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-89fcdd3 elementor-widget elementor-widget-text-editor\" data-id=\"89fcdd3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol class=\"has-normal-font-size\">\n<li><strong>Machine Learning is More General than Deep Learning<\/strong><\/li>\n<\/ol>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-471557c elementor-widget elementor-widget-text-editor\" data-id=\"471557c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>It would be wrong to think of machine learning and deep learning as \u201cdifferent.\u201d Indeed, deep learning is a <em>subset<\/em> of machine learning. What we will attempt to do in this article is highlight key features that distinguish deep learning specifically from other machine learning procedures.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-78f547c elementor-widget elementor-widget-text-editor\" data-id=\"78f547c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Since \u201cdeep learning\u201d is synonymous with \u201cdeep neural networks,\u201d we might start with what distinguishes deep learning from other, simple neural networks that are used in machine learning, which we will call shallow neural networks \/ shallow learning. This distinction must go beyond simply the number of layers or trainable parameters in the models, as this would inevitably create an arbitrary cut-off between deep\/shallow learning.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fa9ff8c elementor-widget elementor-widget-text-editor\" data-id=\"fa9ff8c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Some of the distinction between deep learning and simpler neural networks should include the space of problems\/experiments that the network seeks to solve. In particular, deep learning has been wildly successful for tasks in computer vision, listening, and language. That\u2019s not to say that deep learning is restricted to this space of tasks, but there is huge literature in these areas on the optimal architectures and parameters for deep learning tasks. Much of the literature and publicly available models focus on <em>supervised <\/em>learning, particularly for classification problems (e.g., standard pre-trained models in libraries like Keras are used for image classification, or Google\u2019s <a href=\"https:\/\/research.google\/pubs\/pub45611\/\" target=\"_blank\" rel=\"noreferrer noopener\">recently released<\/a> a pre-trained <a href=\"https:\/\/github.com\/tensorflow\/models\/tree\/master\/research\/audioset\/vggish\" target=\"_blank\" rel=\"noreferrer noopener\">\u201cVGGish\u201d<\/a> model that mimics the VGG architecture from image classification to do audio processing, embedding and sound classification).\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-893b423 elementor-widget elementor-widget-text-editor\" data-id=\"893b423\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>What distinguishes the set of problems associated with vision, language, and listening, and why deep learning has been so successful at modeling them, is the complexity of the <em>input features <\/em>associated with these tasks. This leads to the second property that characterizes deep learning: deep learning not only requires a neural network architecture, but the hidden layers of that ANN sequentially generate new <em>representations <\/em>of the features inputted to the model.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2f4ab08 elementor-widget elementor-widget-text-editor\" data-id=\"2f4ab08\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol class=\"has-normal-font-size\" start=\"2\">\n<li><strong>Deep Learning Uses Input <\/strong><strong><em>Tensors<\/em><\/strong><strong> to Replace Traditional Feature Selection with <\/strong><strong><em>Representation Learning<\/em><\/strong><\/li>\n<\/ol>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2ff0cfd elementor-widget elementor-widget-text-editor\" data-id=\"2ff0cfd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>A fundamental property of deep-learning algorithms is the powerful and robust way deep learning automates many traditional feature engineering and feature selection procedures with <em>representation learning. <\/em>Deep learning provides a wholesale solution to the feature selection process, including approaches to manage problems of overfitting and dependent observations. Unlike traditional machine learning models, deep learning inputs are not limited to a single table, where each row is a vector of features for a particular individual\/observation. Rather, the inputs to a deep learning model are <em>tensors<\/em>, general mathematical constructs that may have their own dependencies, geometry, feature relationships, etc. Thus, instead of a single row of data representing each individual for the model to learn, the individuals may themselves consist of complex, multidimensional tables.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1c08c9d elementor-widget elementor-widget-text-editor\" data-id=\"1c08c9d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>For example, in image processing, representation learning is to say that the entire image, represented as a <em>tensor <\/em>with dimensions (264, 264, 3) of the RGB values, can be directly inputted to the model and learned by the algorithms. The inputs are large, and the model architecture is designed specifically to codify and learn <em>all <\/em>relevant features, dependencies, and relationships from the raw tensors through a complex and highly specialized sequence of learning nodes tailormade for that task.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ecf48e5 elementor-widget elementor-widget-text-editor\" data-id=\"ecf48e5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Early on, CNNs were shown to be highly effective at capturing the short-term dependencies in image data, and they became the standard for deep learning vision tasks. Sliding \u201cwindows\u201d scan the images and identify patterns in a manner that is designed to mimic the stimulation of neurons from our visual receptive field when we focus our sight. In CNNs these windows are different sized <em>kernels<\/em> and <em>filters<\/em> that identify signals\/relationships <em>within<\/em> and <em>across<\/em> each color channel, respectively, and combinations of them can be customized to learn nearly any set of images. Large, standardized datasets like ImageNet and pre-trained models were created by researchers testing and competing for optimal learning architectures classifying those images, and nearly all of the top performers employ multiple CNNs in some form, like ResNet, VGG, Inception, etc. While other architectures have also been proven effective for these tasks, what distinguishes all of them is their ability to take in large collections of raw images and accurately <em>represent <\/em>them, detecting key features automatically.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f828485 elementor-widget elementor-widget-text-editor\" data-id=\"f828485\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>This naturally leads us to our next distinguishing property of deep neural networks, which we have already noted and seems rather obvious: they are neural networks! Neural networks address <em>how <\/em>deep learning algorithms perform feature learning. The example of CNNs from computer vision will be helpful because representation learning always depends on the specific inputs for the deep learning task.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-442371e elementor-widget elementor-widget-text-editor\" data-id=\"442371e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol class=\"has-normal-font-size\" start=\"3\">\n<li><strong>Deep Learning Uses Large Neural Networks and Sequential Layers<\/strong><\/li>\n<\/ol>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-42beaa9 elementor-widget elementor-widget-text-editor\" data-id=\"42beaa9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The \u201cdeep\u201d in deep learning refers to the large number of \u201chidden layers\u201d that are common to their neural network architecture. Accurate representation learning typically requires very large <em>depth<\/em> for the neural network, often with dozens of layers, and thus, \u201cdeep neural networks\u201d was coined.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7d28cd7 elementor-widget elementor-widget-text-editor\" data-id=\"7d28cd7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The input to each layer in a neural network is either the raw input of initial features or an output from a previous layer of the model. Each layer of a deep neural network offers a new representation of the initial input features, with the goal of completely automating this feature learning, along with codifying and discriminating all key features and patterns. The input to the network may be a raw image, but the output of the final layer is a set of activated kernels\/neurons of lower dimensionality that the ANN uses as a representation of the initial image.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b06e7d9 elementor-widget elementor-widget-text-editor\" data-id=\"b06e7d9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>So, how many layers are necessary for this representation to <em>accurately <\/em>codify all of the key characteristics of pictures, videos, sound files, or human language? It turns out that the answer is usually: quite a lot. Machine learning of any kind is notoriously prone to overfitting and bias, but there are powerful controls to aid in the generality of what\u2019s learned. Additional layers can accomplish this task, like dropout layers that identify heavily weighted neurons\/kernels and randomly drop them from the model, forcing the network to generalize and not \u201cget stuck\u201d on any one detail, or batch-normalization in order to normalize the data across different \u201cbatches\u201d fed to the model. In the end, accurately learning key features of our data, with the ability to generalize those characteristics to understand features of new, similar data, typically requires a very large neural network.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c92ee6a elementor-widget elementor-widget-text-editor\" data-id=\"c92ee6a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Indeed, standard models that are publicly available for tasks, like computer vision and language processing, can be enormous. The VGG16 architecture (shown below, reproduced from <a href=\"https:\/\/www.researchgate.net\/publication\/328966158_A_review_of_deep_learning_in_the_study_of_materials_degradation\" target=\"_blank\" rel=\"noreferrer noopener\">Nash, et al., 2018<\/a>) has 16 convolution\/fully connected (dense) layers alone, not including the max-pooling and softmax layers, bringing the total to 22 network layers. Language models get even larger, with the largest BERT architecture having <a href=\"http:\/\/jalammar.github.io\/illustrated-bert\/\" target=\"_blank\" rel=\"noreferrer noopener\">24 encoder layers<\/a> alone, with each encoder representing a block of attention layers\/feedforward RNNs.\u00a0\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-5d80577 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"51410\" data-id=\"5d80577\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-559313d\" data-eae-slider=\"77053\" data-id=\"559313d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2454348 elementor-widget elementor-widget-heading\" data-id=\"2454348\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">VGG16 (left) and BERT (right) Architectures<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-b428721 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"94647\" data-id=\"b428721\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b0a008f\" data-eae-slider=\"3905\" data-id=\"b0a008f\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-51d3719 elementor-widget elementor-widget-text-editor\" data-id=\"51d3719\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>As one can imagine, deep learning requires an immense amount of data in order to generate sufficiently robust predictions. Fortunately, if such data is not available, the data scientist is not out of luck, as there are very robust and successful pre-trained models from publicly available packages or APIs that they can start with, and tune to their particular task. This leads right to our next distinguishing feature of deep learning.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-40f3df3 elementor-widget elementor-widget-text-editor\" data-id=\"40f3df3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol class=\"has-normal-font-size\" start=\"4\">\n<li><strong>Deep Learning Uses Transfer Learning: Pre-Trained Layers and Fine Tuning<\/strong><\/li>\n<\/ol>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b7b9654 elementor-widget elementor-widget-text-editor\" data-id=\"b7b9654\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The importance of very large training\/test samples for deep learning cannot be overstated. For things like computer vision, a single image may be transformed in dozens of different ways (rotations\/reflections, adjusting pixel size\/granularity, hue, contrast, color quality, etc.) to greatly expand sample sizes and learn all identifiable versions of that image. For computer language, <em>vast <\/em>collections of documents from web-crawl\/cached websites, news, Wikipedia, etc. are used to give deep learners an ample supply of text and phrases to train large language models.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-df9f5ea elementor-widget elementor-widget-text-editor\" data-id=\"df9f5ea\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>If a researcher or a working data scientist wishes to use deep learning for their own customized vision\/language tasks, but lacks the time, the resources, or the extreme computing demands required for processing\/training a model on such an enormous universe of data, hope is not lost. Deep learning models are so powerfully generalizable that data scientists can reliably turn to others who have already done much of that heavy lifting. Even when we need to classify data for new objects outside of the initial classification groups, pre-trained models and transfer learning are robust. It may seem absurd that a model trained on ImageNet to classify images of things like cats\/dogs could be useful for tasks like tumor identification or retinal scans, but they give the data scientist a head start for their algorithms to quickly recognize what are <em>not <\/em>tumors or retinal images. Consider the classic \u201cbird or the branch\u201d bias problem, where a deep learner is tasked for animal classification, and trained\/tested only on images of birds perched on tree branches. This model may seem to perform well at identifying birds, when in fact, it had only learned to detect features of the much simpler background branches than the object in question.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f05addb elementor-widget elementor-widget-text-editor\" data-id=\"f05addb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Pre-trained models can enable the data scientist to employ a very deep architecture that they could not have used otherwise by <em>fine tuning <\/em>the pre-trained model to their data. Unlike statistical models, neural networks can update model parameters <em>incrementally <\/em>without re-estimating all parameters from the full training sample (thanks to <em>backpropagation)<\/em>. In this way, researchers can input their own data to a pre-trained model as a new set of epochs and tune the parameters of the representation layers in the deep architecture, and even add new classification categories (like tumors) with additional dense\/softmax layers, for example.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4dad66d elementor-widget elementor-widget-text-editor\" data-id=\"4dad66d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>It wasn\u2019t until recently, with the advent of <a href=\"https:\/\/arxiv.org\/abs\/1801.06146\" target=\"_blank\" rel=\"noreferrer noopener\">ULMFiT<\/a> and <a href=\"https:\/\/ai.googleblog.com\/2018\/11\/open-sourcing-bert-state-of-art-pre.html\" target=\"_blank\" rel=\"noreferrer noopener\">BERT<\/a> in 2018, that this kind of transfer learning was possible for language models. Here, the task is significantly more complicated, as words are learned from their use in <em>context<\/em>, which can vary for words that have multiple meanings or for words that are specific to technical\/industry documents where they may be used in contexts never seen by the pretrained layers. Now, transfer learning and fine tuning are standard and important practices for deep learning tasks of all kinds.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e830d09 elementor-widget elementor-widget-text-editor\" data-id=\"e830d09\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol class=\"has-normal-font-size\" start=\"5\">\n<li><strong>Deep Learning is Computationally Intensive and Uses Cloud\/Distributed Programming<\/strong><\/li>\n<\/ol>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e0247ee elementor-widget elementor-widget-text-editor\" data-id=\"e0247ee\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>We couldn\u2019t conclude an article on deep learning without pointing out the (often) extreme computational demands of training\/tuning a deep neural network. Fortunately, for the data scientist who can test\/deploy with a pre-trained model, the computational requirements are not as prohibitive, even with a few additional layers (dense\/softmax) to customize to their own data. The challenge arises when one wishes to develop a very deep architecture from scratch, which can take days to run on some of the largest cloud environments\/clusters. The aforementioned VGG16 has over 138 million trainable parameters, and BERT has over 110 million, so even simple fine tuning can take hours to run.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c4f42dc elementor-widget elementor-widget-text-editor\" data-id=\"c4f42dc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>As such, training a deep learning model on any dataset that is sufficiently large\/robust will almost surely require cloud architecture and multiple clusters. Fortunately, deep learning APIs like Keras\/TensorFlow and pyTorch make training over multiple CPUs\/GPUs relatively straightforward, and cloud providers like AWS, GCP, and Azure have their own tools and APIs to optimize training, like TPUs\/gCloud\u2019s AI Platform, AWS\u2019 SageMaker\/Deep Learning AMIs, Azure Machine Learning Notebooks, etc. One can hardly become a skilled practitioner of deep learning methods without also acquiring non-trivial skills as a cloud engineer with one or more of these tools.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>What is Deep Learning, Anyway? As machine learning has gained significant notoriety for its wide-spread use across an immense number of applications, from retailers targeting products\/marketing to individual consumers to high-frequency trading and quantitative models revolutionizing modern finance, and not to mention the seemingly constant media attention it gets in polemics surrounding privacy, user data,<\/p>\n","protected":false},"author":1168,"featured_media":25118,"comment_status":"open","ping_status":"open","sticky":false,"template":"multiple_author_template.php","format":"standard","meta":{"footnotes":""},"categories":[183],"tags":[111,206,92],"ppma_author":[3959,3960],"class_list":["post-25117","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-ai-amp-machine-learning","tag-deep-learning","tag-machine-learning"],"authors":[{"term_id":3959,"user_id":1168,"is_guest":0,"slug":"manik1","display_name":"Benjamin Wellmann","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/06\/benjamin-150x150.jpg","author_category":"","user_url":"","last_name":"Wellmann","first_name":"Benjamin","job_title":"","description":"Benjamin Wellmann is Head of the CIO Machine Learning &amp; Data Science at FIS, a Fortune 200 Fintech and Financial Services company and largest FinTech provider according to IDC. FIS is focused on innovative software platforms and services across the Banking, Capital Markets and Merchant services segments. Driving internal and client-facing innovation through Machine Learning across FIS\u2019 platforms, corporate functions and the three product segments is Benjamin\u2019s teams\u2019 top priority. During his time at a Big 4 Consulting firm, he has advised the largest Financial Services institutions globally on topics such as AI, Big Data, and Enterprise ML and filed numerous AI and Fintech patents."},{"term_id":3960,"user_id":1169,"is_guest":0,"slug":"gary","display_name":"Gary Duma","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/06\/gary--150x150.png","author_category":"","user_url":"","last_name":"Duma","first_name":"Gary","job_title":"","description":"Gary Duma is a Senior Manager of Machine Learning &amp; Data Science at FIS, a Fortune 200 Fintech and Financial Services company and largest FinTech provider according to IDC. With 10+ years working in Data Science, he has developed and implemented AI\/Deep Learning models for a wide array of clients and applications, particularly as they relate to NLP\/insight generation, acoustic\/speech classification, and even novel uses of CNNs\/long memory models for large time-dependent datasets, in lieu of traditional time-series methods from the auto-covariance\/spectral domains. He has also filed numerous AI\/Fintech patents for innovation with Deep Learning during his time at FIS."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/25117","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1168"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=25117"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/25117\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/25118"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=25117"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=25117"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=25117"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=25117"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}