{"id":22625,"date":"2021-02-16T06:34:00","date_gmt":"2021-02-16T06:34:00","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/5-deep-learning-advancements-keep-your-eye-2021\/"},"modified":"2023-09-04T07:41:02","modified_gmt":"2023-09-04T07:41:02","slug":"5-deep-learning-advancements-keep-your-eye-2021","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/5-deep-learning-advancements-keep-your-eye-2021\/","title":{"rendered":"5 Exciting Deep Learning Advancements to Keep Your Eye on in 2021"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22625\" class=\"elementor elementor-22625\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-2a867c7d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"18606\" data-id=\"2a867c7d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-7750db49\" data-eae-slider=\"29592\" data-id=\"7750db49\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e1a1a3f elementor-widget elementor-widget-text-editor\" data-id=\"e1a1a3f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"has-medium-font-size\"><em>The Undercurrent of Ongoing Research<\/em><\/p>\n<p id=\"6111\">2021 is here, and deep learning is as active as ever; research in the field is speeding up exponentially. There are obviously many more deep learning advancements that are fascinating and exciting. To me, though, the five presented demonstrate a central undercurrent in ongoing deep learning research:\u00a0<em>how necessary is the largeness of deep learning models?<\/em><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-70d39c4 elementor-widget elementor-widget-heading\" data-id=\"70d39c4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">1. GrowNet<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c493c35 elementor-widget elementor-widget-text-editor\" data-id=\"c493c35\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"299c\"><strong>tl;dr<\/strong>: GrowNet applies gradient boosting to shallow neural networks. It has been rising in popularity, yielding superior results in classification, regression, and ranking. It may indicate research supporting larger ensembles and shallower networks on non-specialized data (non-image or sequence).<\/p>\n<p id=\"0b0e\">Gradient boosting has proven to become very popular in recent years, rivalling that of a neural network. The idea is to have an ensemble of weak (simple) learners, where each corrects the mistake of the previous. For instance, an ideal 3-model gradient boosting ensemble may look like this, where the real label of the example is\u00a0<code>1<\/code>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-29544c8 elementor-widget elementor-widget-text-editor\" data-id=\"29544c8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol>\n<li>Model 1 predicts\u00a0<code>0.734<\/code>. Current prediction is\u00a0<code><strong>0.734<\/strong><\/code>.<\/li>\n<li>Model 2 predicts\u00a0<code>0.464<\/code>. Current prediction is\u00a0<code>0.734+0.464=<strong>1.198<\/strong><\/code>\u00a0.<\/li>\n<li>Model 3 predicts\u00a0<code>-0.199<\/code>. Current prediction is\u00a0<code>1.198-0.199=<strong>0.999<\/strong><\/code>.<\/li>\n<\/ol>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0f1cfd5 elementor-widget elementor-widget-text-editor\" data-id=\"0f1cfd5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9110\">Each model is trained on the residual of the previous. Although each model may be individually weak, as a whole the ensemble can develop incredible complexity. Gradient boosting frameworks like XGBoost use gradient boosting on decision trees, which are one of the simplest machine learning algorithms.<\/p>\n<p id=\"5908\">There has been some discussion on neural networks not being weak enough for gradient boosting; because gradient boosting has so much capability for overfitting, it is crucial that each <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/ensemble-learning-bagging-boosting\/\" target=\"_blank\" rel=\"noreferrer noopener\">learner in the ensemble<\/a> be weak.<\/p>\n<p id=\"d37f\">However, recent work has shown that extremely deep neural networks can be decomposed into a collection of many smaller subnetworks. Therefore, massive neural networks may just be sophisticated ensembles of small neural networks. This challenges the idea that neural networks are too strong to be weak learners in gradient boosting.<\/p>\n<p id=\"f030\">The GrowNet ensemble consists of\u00a0<em>k<\/em>\u00a0models. Each model is fed the original features and the predictions of the previous model. The predictions of all the models are summed to produce a final output. Every model can be as simple as having only one hidden layer.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ac7c102 elementor-widget elementor-widget-image\" data-id=\"ac7c102\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"369\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CYyB__xpOq5QPpPzjz-s7g-1024x369.png\" class=\"attachment-large size-large wp-image-18723\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CYyB__xpOq5QPpPzjz-s7g-1024x369.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CYyB__xpOq5QPpPzjz-s7g-300x108.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CYyB__xpOq5QPpPzjz-s7g-768x277.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CYyB__xpOq5QPpPzjz-s7g-1536x554.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CYyB__xpOq5QPpPzjz-s7g-2048x739.png 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CYyB__xpOq5QPpPzjz-s7g-610x220.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CYyB__xpOq5QPpPzjz-s7g-750x271.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CYyB__xpOq5QPpPzjz-s7g-1140x411.png 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\"><figcaption>Source:\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/2002.07971.pdf\" target=\"_blank\" rel=\"noopener\">GrowNet paper<\/a>.<\/figcaption>\n<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9db05f6 elementor-widget elementor-widget-text-editor\" data-id=\"9db05f6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"b620\">GrowNet is easy to tune and requires less computational cost and time to train, yet it outperforms deep neural networks in regression, classification, and ranking on multiple datasets. Data scientists have picked up on these benefits and it is growing in popularity.<\/p>\n<p id=\"13aa\"><a href=\"https:\/\/arxiv.org\/pdf\/2002.07971.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>,\u00a0<a href=\"https:\/\/www.kaggle.com\/tmhrkt\/grownet-gradient-boosting-neural-networks\" target=\"_blank\" rel=\"noreferrer noopener\">Pytorch Implementation<\/a><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b97da75 elementor-widget elementor-widget-heading\" data-id=\"b97da75\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">2. TabNet<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a0bba14 elementor-widget elementor-widget-text-editor\" data-id=\"a0bba14\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"543b\"><strong>tl;dr:\u00a0<\/strong>TabNet is a deep leaning model for tabular data, designed with the ability to represent hierarchical relationships and draws inspiration from decision tree models. It has yielded superior results on many real-world tabular datasets.<\/p>\n<p id=\"b388\">Neural networks are famously bad at modelling tabular data, and the accepted explanation is because their structure \u2014 very prone to overfitting \u2014 instead succeeds in recognizing the complex relationships of specialized data, like images or text.<\/p>\n<p id=\"6fdf\">Decision-tree models like XGBoost or Adaboost have instead been popular with real-world tabular data, because they split the feature space in simple perpendicular planes. This level of separation is usually fine for most real-world datasets; even though these models, regardless how complex, make assumptions about decision boundaries, overfitting is a worse problem.<\/p>\n<p id=\"a604\">Yet for many real-world datasets, decision-tree models are not enough and neural networks are too much. TabNet was created by two Google researchers to address this problem. The model relies on a fundamental neural network design, which makes decisions like a more complex decision tree.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-639fe7f elementor-widget elementor-widget-image\" data-id=\"639fe7f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"311\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1OOj0nHL0XXB6CGBDLaET2Q-1024x311.png\" class=\"attachment-large size-large wp-image-18724\" alt=\"5 Exciting Deep Learning Advancements to Keep Your Eye on in 2021\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1OOj0nHL0XXB6CGBDLaET2Q-1024x311.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1OOj0nHL0XXB6CGBDLaET2Q-300x91.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1OOj0nHL0XXB6CGBDLaET2Q-768x233.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1OOj0nHL0XXB6CGBDLaET2Q-1536x467.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1OOj0nHL0XXB6CGBDLaET2Q-2048x622.png 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1OOj0nHL0XXB6CGBDLaET2Q-610x185.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1OOj0nHL0XXB6CGBDLaET2Q-750x228.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1OOj0nHL0XXB6CGBDLaET2Q-1140x346.png 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\"><figcaption>Left: a simple decision tree-like neural network. The real TabNet architecture is deeper. Right: the left model\u2019s division of the feature space, which is much like how a decision tree would split the feature space. Source:\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1908.07442.pdf\" target=\"_blank\" rel=\"noopener\">TabNet paper<\/a>.<\/figcaption>\n<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8f2873e elementor-widget elementor-widget-text-editor\" data-id=\"8f2873e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"4bc8\">Furthermore, TabNet is trained in two stages. In the unsupervised pre-training stage, the model is trained to predict masked values in the data. Decision-making layers are then appended to the pretrained encoder and supervised fine-tuning takes place. This is one of first instance of incredibly successful unsupervised pre-training on tabular data.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-312ec56 elementor-widget elementor-widget-image\" data-id=\"312ec56\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"559\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1pX36-z91NxtoreoPRxknVA-1024x559.png\" class=\"attachment-large size-large wp-image-18725\" alt=\"5 Exciting Deep Learning Advancements to Keep Your Eye on in 2021\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1pX36-z91NxtoreoPRxknVA-1024x559.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1pX36-z91NxtoreoPRxknVA-300x164.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1pX36-z91NxtoreoPRxknVA-768x419.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1pX36-z91NxtoreoPRxknVA-1536x838.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1pX36-z91NxtoreoPRxknVA-2048x1117.png 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1pX36-z91NxtoreoPRxknVA-610x333.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1pX36-z91NxtoreoPRxknVA-750x409.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1pX36-z91NxtoreoPRxknVA-1140x622.png 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\"><figcaption>Source:\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1908.07442.pdf\" target=\"_blank\" rel=\"noopener\">TabNet paper<\/a>.<\/figcaption>\n<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fc91eb7 elementor-widget elementor-widget-text-editor\" data-id=\"fc91eb7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"7a26\">Critically, the model uses attention, so it can choose which features it will make a decision from. This allows it to develop a strong representation of hierarchical structures often present in real-world data.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-55e1e63 elementor-widget elementor-widget-image\" data-id=\"55e1e63\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"273\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15CEJ1SAtJVWjtLVcRGkacQ-1024x273.png\" class=\"attachment-large size-large wp-image-18726\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15CEJ1SAtJVWjtLVcRGkacQ-1024x273.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15CEJ1SAtJVWjtLVcRGkacQ-300x80.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15CEJ1SAtJVWjtLVcRGkacQ-768x205.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15CEJ1SAtJVWjtLVcRGkacQ-1536x409.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15CEJ1SAtJVWjtLVcRGkacQ-2048x546.png 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15CEJ1SAtJVWjtLVcRGkacQ-610x163.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15CEJ1SAtJVWjtLVcRGkacQ-750x200.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15CEJ1SAtJVWjtLVcRGkacQ-1140x304.png 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\"><figcaption>Source:\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1908.07442.pdf\" target=\"_blank\" rel=\"noopener\">TabNet paper<\/a>.<\/figcaption>\n<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-55bc43e elementor-widget elementor-widget-text-editor\" data-id=\"55bc43e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"c9d4\">These mechanisms mean input data for TabNet need no processing whatsoever. TabNet is very quickly rising among data scientists; almost all of top-scoring competitors in the\u00a0<a href=\"https:\/\/www.kaggle.com\/c\/lish-moa\/\" target=\"_blank\" rel=\"noreferrer noopener\">Mechanisms of Action Kaggle competition<\/a>, for instance, incorporated TabNet into their solutions. Because of its popularity, it has been implemented in a very simple and usable API.<\/p>\n<p id=\"7c2c\">This represents a broadening of deep learning past extremely specialized data types, and reveals just how universal neural networks can be. [link]<\/p>\n<p id=\"279d\"><a href=\"https:\/\/arxiv.org\/pdf\/1908.07442.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>,\u00a0<a href=\"https:\/\/pypi.org\/project\/pytorch-tabnet\/\" target=\"_blank\" rel=\"noreferrer noopener\">Simple Pytorch API Implementation<\/a>,\u00a0<a href=\"https:\/\/pypi.org\/project\/tabnet\/\" target=\"_blank\" rel=\"noreferrer noopener\">Simple TensorFlow API Implementation<\/a><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-72cd018 elementor-widget elementor-widget-heading\" data-id=\"72cd018\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">3. EfficientNet<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f638699 elementor-widget elementor-widget-text-editor\" data-id=\"f638699\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"b44a\"><strong>tl;dr:\u00a0<\/strong>Model scaling to improve deep CNNs can be unorganized. Compound scaling is a simple and effective method that uniformly scales the width, depth, and resolution of the network. EfficientNet is a simple network with compound scaling applied to it, and yields state of the art results. The model is incredibly popular in the image recognition work.<\/p>\n<p id=\"3324\">Deep convolutional neural networks have been growing larger in an attempt to make them more powerful. Exactly\u00a0<em>how<\/em>\u00a0they become bigger, though, is actually quite arbitrary. Sometimes, the resolution of the image is increased (more pixels). Other times, it may be the depth (# of layers) or the width (# of neurons in each layer) that are increased.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b8d99e7 elementor-widget elementor-widget-image\" data-id=\"b8d99e7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"427\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1xDuDmA68axdbOr3Sxi1oNw-1024x427.png\" class=\"attachment-large size-large wp-image-18727\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1xDuDmA68axdbOr3Sxi1oNw-1024x427.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1xDuDmA68axdbOr3Sxi1oNw-300x125.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1xDuDmA68axdbOr3Sxi1oNw-768x320.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1xDuDmA68axdbOr3Sxi1oNw-1536x640.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1xDuDmA68axdbOr3Sxi1oNw-2048x854.png 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1xDuDmA68axdbOr3Sxi1oNw-610x254.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1xDuDmA68axdbOr3Sxi1oNw-750x313.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1xDuDmA68axdbOr3Sxi1oNw-1140x475.png 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\"><figcaption>Source:\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1905.11946.pdf\" target=\"_blank\" rel=\"noopener\">EfficientNet Paper<\/a>.<\/figcaption>\n<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-63c5eea elementor-widget elementor-widget-text-editor\" data-id=\"63c5eea\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d686\">Compound scaling is a simple idea: instead of scaling them arbitrarily, scale the resolution, depth, and width of the network equally.<\/p>\n<p id=\"54af\">If one wants to use 2\u00b3 times more computational resources, for example;<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7172bc6 elementor-widget elementor-widget-text-editor\" data-id=\"7172bc6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n<li>increase the network depth by \u03b1\u00b3 times<\/li>\n<li>increase the network width by \u03b2\u00b3 times<\/li>\n<li>increase the image size by \u03b3\u00b3 times<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b7fe6c0 elementor-widget elementor-widget-text-editor\" data-id=\"b7fe6c0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"c7bd\">The values of \u03b1, \u03b2, and \u03b3 can be found through a simple grid search. Compound scaling can be applied to any network, and compound-scaled versions of models like ResNet have consistently performed better than arbitrary scaled ones.<\/p>\n<p id=\"7071\">The authors of the paper developed a baseline model, EfficientNetB0, which consists of very standard convolutions. Then, using compound scaling, seven scaled models \u2014 EfficientNetB1 to EfficientNetB7 \u2014 were created.<\/p>\n<p id=\"0686\">The results are amazing \u2014 EfficientNets were able to perform better than models that required 4 to 7 times more parameters and 6 to 19 times more computational resources. It seems that compound scaling is one of the most efficient ways to utilize neural network space.<\/p>\n<p id=\"f8bb\">EfficientNet has been one of the most important recent contributions. It indicates a turn in research towards more powerful but also efficient and practical neural networks.<\/p>\n<p id=\"0af7\"><a href=\"https:\/\/arxiv.org\/pdf\/1905.11946.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>,\u00a0<a href=\"https:\/\/pypi.org\/project\/efficientnet-pytorch\/\" target=\"_blank\" rel=\"noreferrer noopener\">Simple Pytorch API Implementation<\/a>,\u00a0<a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/keras\/applications\/EfficientNetB0\" target=\"_blank\" rel=\"noreferrer noopener\">Simple TensorFlow Implementation<\/a><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-285f650 elementor-widget elementor-widget-heading\" data-id=\"285f650\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">4. The Lottery Ticket Hypothesis<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0dc242a elementor-widget elementor-widget-text-editor\" data-id=\"0dc242a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"abc6\"><strong>tl;dr:\u00a0<\/strong>Neural networks are essentially giant lotteries; through random initialization, certain subnetworks are mathematically lucky and are recognized for their potential by the optimizer. These subnetworks (\u2018winning tickets\u2019) emerge as doing most of the heavy lifting, while the rest of the network doesn\u2019t do much. This hypothesis is groundbreaking in understanding how neural networks work.<\/p>\n<p id=\"40a9\">Why don\u2019t neural networks overfit? How do they generalize with so many parameters? Why do big neural networks work better than smaller ones when it is common statistics principle that more parameters = overfitting?<\/p>\n<p id=\"b3a8\">\u201cBah! Go away and shut up!\u201d grumbles the deep learning community. \u201cWe don\u2019t care about how neural networks work as long as they work.\u201d Too long have these big questions been under-investigated.<\/p>\n<p id=\"3dd9\">One common answer is regularization. However, this doesn\u2019t seem to be the case \u2014 in a study conducted by Zhang et al., an Inception architecture without various regularization methods didn\u2019t perform much worse than one with. Thus, one cannot argue that regularization is the\u00a0<em>basis<\/em>\u00a0for generalization.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a7502a7 elementor-widget elementor-widget-image\" data-id=\"a7502a7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"222\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0dWE29J7z2PPA2zKI-1024x222.png\" class=\"attachment-large size-large wp-image-18728\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0dWE29J7z2PPA2zKI-1024x222.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0dWE29J7z2PPA2zKI-300x65.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0dWE29J7z2PPA2zKI-768x166.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0dWE29J7z2PPA2zKI-610x132.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0dWE29J7z2PPA2zKI-750x162.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0dWE29J7z2PPA2zKI-1140x247.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0dWE29J7z2PPA2zKI.png 1394w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\"><figcaption>Source:\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1611.03530\" target=\"_blank\" rel=\"noopener\">Zhang et al.<\/a><\/figcaption>\n<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d37fd2a elementor-widget elementor-widget-text-editor\" data-id=\"d37fd2a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"c135\">Neural network pruning offers a glimpse into one convincing answer.<\/p>\n<p id=\"2898\">With neural network pruning, over 90 percent \u2014 in some cases 95 or even 99 percent \u2014 of neural network weights and neurons can be eliminated with little to no loss on performance. How can this be?<\/p>\n<p id=\"c71e\">Imagine you want to order a pen on Amazon. When the delivery package arrives, you find it is in a large cardboard box with lots of stuffing inside it. You finally find the pen after several minutes of searching.<\/p>\n<p id=\"e66e\">After you find the pen, the stuffing doesn\u2019t matter. But before you find it, the stuffing is part of the delivery. The cardboard box with the stuffing is the neural network, and the pen is the subnetwork doing all the real work. After you locate that subnetwork, you can ditch the rest of the neural network. However, there needs to be a network in the first place to find the subnetwork.<\/p>\n<p><em>Lottery Ticket Hypothesis<\/em>: In every sufficiently deep neural network, there is a smaller subnetwork that can perform just as well as the whole neural network.<\/p>\n<p id=\"baa6\">Weights in the neural network begin randomly initialized. At this point, there are plenty of random subnetworks in the network, but some have more mathematical potential. That is, the optimizer thinks it is mathematically better to update this set of weights to lower the loss. Eventually, the optimizer has developed a subnetwork to do all the work; the other parts of the network do not serve much of a purpose.<\/p>\n<p id=\"9144\">Each subnetwork is a \u2018lottery ticket\u2019, with a random initialization. Favorable initializations are \u2018winning tickets\u2019 identified by the optimizer. The more random tickets you have, the higher probability one of them will be a winning ticket. This is why larger networks generally perform better.<\/p>\n<p id=\"9c65\">This hypothesis is particularly important to proposing an explanation for the\u00a0<em>Deep Double Descent<\/em>, in which after a certain threshold,\u00a0<em>more parameters<\/em>\u00a0yields a better generalization rather than less.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2396aa1 elementor-widget elementor-widget-image\" data-id=\"2396aa1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"510\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0GiLxolJ4YRBPctMC-1024x510.png\" class=\"attachment-large size-large wp-image-18729\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0GiLxolJ4YRBPctMC-1024x510.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0GiLxolJ4YRBPctMC-300x149.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0GiLxolJ4YRBPctMC-768x382.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0GiLxolJ4YRBPctMC-1536x765.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0GiLxolJ4YRBPctMC-2048x1019.png 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0GiLxolJ4YRBPctMC-610x304.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0GiLxolJ4YRBPctMC-360x180.png 360w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0GiLxolJ4YRBPctMC-750x373.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0GiLxolJ4YRBPctMC-1140x567.png 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\"><figcaption>Source:\u00a0<a href=\"https:\/\/openai.com\/blog\/deep-double-descent\/\" target=\"_blank\" rel=\"noopener\">OpenAI<\/a>.<\/figcaption>\n<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0dab746 elementor-widget elementor-widget-text-editor\" data-id=\"0dab746\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"f18a\">The Lottery Ticket Hypothesis is one giant step forward towards understanding truly how deep neural networks work. Although it\u2019s still a hypothesis, there is convincing evidence for it, and such a discovery would transform how we approach innovation in deep learning.<\/p>\n<p id=\"d2c5\"><a href=\"https:\/\/arxiv.org\/pdf\/1803.03635.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a765de6 elementor-widget elementor-widget-heading\" data-id=\"a765de6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">5. The Top-Performing Model With Zero Training<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2b6afc0 elementor-widget elementor-widget-text-editor\" data-id=\"2b6afc0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"f74c\"><strong>tl;dr:<\/strong>\u00a0Researchers developed a method to prune a completely\u00a0<em>randomly initialized<\/em>\u00a0network to achieve top performance with trained models.<\/p>\n<p id=\"395d\">In close relationship with the Lottery Ticket Hypothesis, this study explores just how much information can lie in a neural network. It\u2019s common for data scientists to see \u201c60 million parameters\u201d and underestimate how much power 60 million parameters can really store.<\/p>\n<p id=\"4757\">In support of the Lottery Ticket Hypothesis, the authors of the paper developed the edge-popup algorithm, which assesses how \u2018helpful\u2019 an edge, or a connection, would be towards prediction. Only the\u00a0<em>k<\/em>% more \u2018helpful\u2019 edges are retained; the remaining ones are pruned (removed).<\/p>\n<p id=\"2e76\">Using the edge-popup algorithm on a sufficiently large random neural network yields results very close to, and sometimes better than, performance of the trained neural network with all of its weights intact.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ce98848 elementor-widget elementor-widget-image\" data-id=\"ce98848\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"321\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1O2g-qr0HnRcolZi1WexT1g-1024x321.png\" class=\"attachment-large size-large wp-image-18730\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1O2g-qr0HnRcolZi1WexT1g-1024x321.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1O2g-qr0HnRcolZi1WexT1g-300x94.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1O2g-qr0HnRcolZi1WexT1g-768x241.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1O2g-qr0HnRcolZi1WexT1g-1536x481.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1O2g-qr0HnRcolZi1WexT1g-2048x642.png 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1O2g-qr0HnRcolZi1WexT1g-610x191.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1O2g-qr0HnRcolZi1WexT1g-750x235.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1O2g-qr0HnRcolZi1WexT1g-1140x357.png 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\"><figcaption>% of weights refers to\u00a0<em>k<\/em>, the percent of the model that remains after pruning. Source:\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1911.13299.pdf\" target=\"_blank\" rel=\"noopener\">Paper<\/a>.<\/figcaption>\n<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-10a882e elementor-widget elementor-widget-text-editor\" data-id=\"10a882e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d63f\">That\u2019s amazing \u2014 within a completely untrained, randomly initialized neural network lies already a top-performing subnetwork. This is like being told that your name can be found in a pretty short sequence of random letters.<\/p>\n<pre class=\"wp-block-preformatted\">uqhoquiwhrpugtdfdsnaoidpufehiobnfjdopafwuhibdsofpabievniawo;jkjndjkn<br \/>ajsodijaiufhuiduisafid<strong>johndoe<\/strong>ojahsiudhuidbviubdiaiupdphquiwhpeuhqiuhdpueohdpqiuwhdpiashiudhjashdiuhasiuhdibcisviywqrpiuhopfdbscjasnkuipi<\/pre>\n<p id=\"7631\">This study is more of a question than an answer. It points us in an area of new research: getting to the bottom of exactly how neural networks work. If these findings are universal, surely there must be a better training method that can take advantage of this fundamental axiom of deep learning waiting to be discovered.<\/p>\n<p id=\"5bf0\"><a href=\"https:\/\/arxiv.org\/pdf\/1911.13299.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-adb918b elementor-widget elementor-widget-heading\" data-id=\"adb918b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Summary &amp; Conclusion<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fd8e9af elementor-widget elementor-widget-text-editor\" data-id=\"fd8e9af\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n<li><em>GrowNet<\/em>. This application of ensemble methods to deep learning is one demonstration of harnessing simple subnetwork structures into a complex, sophisticated, and successful model.<\/li>\n<li><em>TabNet<\/em>. Neural network structures are exceptionally versatile, and TabNet marks the true expansion of neural networks to all sorts of data types. It is a perfect balance between underfitting and overfitting.<\/li>\n<li><em>EfficientNet<\/em>. Part of a growing trend of packing more predictive power into less space, EfficientNet is incredibly simplistic yet effective. It demonstrates that there is indeed a structure towards scaling models. This is something incredibly important to pay attention to going forward as models continually become larger and larger.<\/li>\n<li><em>The Lottery Ticket Hypothesis<\/em>. A fascinating perspective towards how neural networks generalize, the Lottery Ticket Hypothesis is a golden key that will help us unlock greater deep learning achievements. Identifying the power of large networks coming not from the largeness itself but an increased number of \u2018lottery tickets\u2019 is groundbreaking.<\/li>\n<li><em>The Top-Performing Model with Zero Training<\/em>. A vivid demonstration of just how much we underestimate the predictive power within a randomly initialized neural network.<\/li>\n<\/ul><p id=\"1607\">There is no doubt 2021 will bring many more fascinating advancements in deep learning.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>2021 is here, and deep learning is as active as ever; research in the field is speeding up exponentially. Here are the five 5 exciting deep learning advancements that demonstrate a central undercurrent in ongoing deep learning research.<\/p>\n","protected":false},"author":884,"featured_media":18731,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[183],"tags":[97,206,92],"ppma_author":[3782],"class_list":["post-22625","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence","tag-deep-learning","tag-machine-learning"],"authors":[{"term_id":3782,"user_id":884,"is_guest":0,"slug":"andre-ye","display_name":"Andre Ye","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/08\/Andre-Ye-150x150.jpg","author_category":"","user_url":"https:\/\/www.critiq.tech\/","last_name":"Ye","first_name":"Andre","job_title":"","description":"Andre Ye is Cofounder at Critiq, and Editor and Writer at Medium"}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22625","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/884"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22625"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22625\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/18731"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22625"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22625"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22625"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22625"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}