{"id":26738,"date":"2021-10-01T01:58:27","date_gmt":"2021-10-01T01:58:27","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=26738"},"modified":"2023-08-16T11:03:29","modified_gmt":"2023-08-16T11:03:29","slug":"review-of-attention-3","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/review-of-attention-3\/","title":{"rendered":"Review of Attention (Vision Models) &#8211; Part 3"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"26738\" class=\"elementor elementor-26738\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-53f53d5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"53f53d5\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-236fe83\" data-id=\"236fe83\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-bdac48b elementor-widget elementor-widget-text-editor\" data-id=\"bdac48b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In the previous article, we reviewed several approaches to applying attention to vision models. We will continue our discussion, present a few additional vision models approaches in this article, and discuss their advantages over traditional approaches.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-815910d elementor-widget elementor-widget-text-editor\" data-id=\"815910d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><strong>Stand-Alone Self Attention<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-432186e elementor-widget elementor-widget-text-editor\" data-id=\"432186e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Ramachandran et al. 2019 proposed an attention mechanism like the 2-D attention in Image Transformer [1]. A local region of pixels within a fixed <a href=\"https:\/\/www.experfy.com\/experts\/experfy-talentclouds\/data-scientist-spatial-statistics?practice_area=ai-machine-learning\">spatial<\/a> extent around the chosen pixel is used as a memory block for attention. Since the model doesn\u2019t operate on all the pixels at the same time, it can work on high-resolution images without the need for down-sampling them to lower resolution in order to save computation.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9ba5dad elementor-widget elementor-widget-text-editor\" data-id=\"9ba5dad\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><em>Fig. 4. Stand-Alone Self Attention <\/em>[1].<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e5d6661 elementor-widget elementor-widget-text-editor\" data-id=\"e5d6661\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Figure 4 describes the Self Attention module. A memory block x<sub>ab<\/sub> with a spatial extent of 3 pixels around pixel x<sub>ij<\/sub> is chosen, and query q<sub>ij<\/sub> is computed as the linear transformation of x<sub>ij<\/sub>. Keys k<sub>ab<\/sub> and values v<sub>ab<\/sub> are calculated as linear transformations of x<sub>ab<\/sub>. The parameters of the linear transformation are learnt during model training. Applying softmax on the dot product of the query q<sub>ij<\/sub> and the key matrix kab results in the attention map. The attention map is further multiplied with the value matrix to obtain the final output y<sub>ij<\/sub>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7fe3bb7 elementor-widget elementor-widget-text-editor\" data-id=\"7fe3bb7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/09\/Screen-Shot-2021-09-30-at-5.44.13-PM.png\" alt=\"Attention map formula\" \/>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5597f79 elementor-widget elementor-widget-text-editor\" data-id=\"5597f79\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>For encoding the relative positions of the pixels, the model uses 2-D relative distance between the pixel x<sub>ij<\/sub> and the pixels in x<sub>ab<\/sub>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-874ed86 elementor-widget elementor-widget-text-editor\" data-id=\"874ed86\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/09\/Screen-Shot-2021-09-30-at-5.46.13-PM.png\" alt=\"Relative positions of the pixels\" width=\"588\" height=\"223\" \/>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-24c7dd0 elementor-widget elementor-widget-text-editor\" data-id=\"24c7dd0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The row and column offsets are associated with an embedding, and they are concatenated to form <em>r<sub>a-i,b-i<\/sub><\/em>. The final attention output is given as:\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-be93232 elementor-widget elementor-widget-text-editor\" data-id=\"be93232\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/09\/Screen-Shot-2021-09-30-at-5.48.17-PM.png\" alt=\"Final Attention Output\" width=\"627\" height=\"115\" \/>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6921351 elementor-widget elementor-widget-text-editor\" data-id=\"6921351\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Hence, the model encodes both the content and the relative position of the pixels in its representation.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3d371dd elementor-widget elementor-widget-text-editor\" data-id=\"3d371dd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><strong><a href=\"http:\/\/[2] \u201cGenerative Modeling with Sparse Transformers.\u201d https:\/\/openai.com\/blog\/sparse-transformer\/ (accessed Sep. 19, 2021).\" target=\"_blank\" rel=\"noreferrer noopener\">Sparse Transformer<\/a> <\/strong>[2]<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9eb6afa elementor-widget elementor-widget-text-editor\" data-id=\"9eb6afa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>To address the computational challenge arising with handling very long sequences, Child et al. developed a sparse variant of attention [3]. This enabled the transformer model to handle different modalities such as text, image, and sound.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-caa5eb9 elementor-widget elementor-widget-text-editor\" data-id=\"caa5eb9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>After observing the attention patterns at different layers in the transformers, the authors posit that most layers learn only a sparse structure, and few layers learn a dynamic attention map that stretches over the entire image. They introduce a two-dimensional decomposition of the attention matrix to learn diverse patterns, allowing the model to attend to all places in two steps. It&#8217;s approximately equal to the model creating the desired position while paying attention to each row and column.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ddd0cb7 elementor-widget elementor-widget-text-editor\" data-id=\"ddd0cb7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><em> The attention patterns as observed in layers 19 (left) and 36 (right). The attention head attends to pixels indicated in white for generating the next pixel. The pattern in layer 19 is highly regular, and it is a great candidate for applying sparse attention <\/em>[2]<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3dd7e02 elementor-widget elementor-widget-text-editor\" data-id=\"3dd7e02\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The first version is called <em>strided<\/em> attention. It is suitable for handling 2-D data, such as images. They also introduce a second version called <em>fixed<\/em> attention for handling 1-D data, such as text. Here, the model attends to a fixed column and the elements after the latest column element.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8a56826 elementor-widget elementor-widget-text-editor\" data-id=\"8a56826\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Overall, understanding the attention patterns and the sparsity in them helps to reduce redundant computation and learn long sequences efficiently.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8e022ec elementor-widget elementor-widget-text-editor\" data-id=\"8e022ec\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><strong>Image GPT <\/strong>[4]<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c2d14eb elementor-widget elementor-widget-text-editor\" data-id=\"c2d14eb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Image GPT is another example where a transformer model was successfully adopted for vision models tasks [4]. GPT-2 is a large transformer model that was successfully trained on language to generate coherent text [5]. Chen et al. trained the same model on pixel sequences to generate coherent image completions and samples. Although no knowledge of the 2-D structure was explicitly incorporated, the model can generate high-quality features that are comparable to the ones generated by top <a href=\"http:\/\/www.experfy.com\/blog\/ai-ml\/what-are-convolutional-neural-networks-cnn\/\" target=\"_blank\" rel=\"noreferrer noopener\">Convolution Networks<\/a> in unsupervised settings.\u00a0<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2c63aae elementor-widget elementor-widget-text-editor\" data-id=\"2c63aae\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Although not explicitly trained with labels, the model is still able to recognize object categories and perform well on image classification tasks. The model implicitly learns about image categories while it learns to generate diverse samples with clearly recognizable objects.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e961a31 elementor-widget elementor-widget-text-editor\" data-id=\"e961a31\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Since no image-specific knowledge is encoded into the architecture, the model requires a large amount of computation to achieve competitive performance in an unsupervised setting.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-763b691 elementor-widget elementor-widget-text-editor\" data-id=\"763b691\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>With enough scale and computation, a large language model, such as GPT-2, is shown to be capable of image generation tasks without explicitly encoding additional knowledge about the structure of the images.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-06db736 elementor-widget elementor-widget-text-editor\" data-id=\"06db736\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><strong>Conclusion<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5a2e32c elementor-widget elementor-widget-text-editor\" data-id=\"5a2e32c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In this series, we reviewed some of the works that applied attention to visual tasks. These models clearly demonstrate that there is a strong interest in moving away from domain-specific architectures to a homogenous learning solution that is suitable across different applications [6]. Attention-based transformer architecture shows great potential to evolve as a possible solution to bridge the gap between different domains. The architecture is constantly evolving to <a href=\"http:\/\/www.experfy.com\/blog\/future-of-work-guide\/challenges\/\">address the challenges<\/a> and make it better suited for image and several other domains.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ecf5fc0 elementor-widget elementor-widget-text-editor\" data-id=\"ecf5fc0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><strong>References:<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1e9b481 elementor-widget elementor-widget-text-editor\" data-id=\"1e9b481\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>[1] P. Ramachandran, N. Parmar, A. Vaswani, I. Bello, A. Levskaya, and J. Shlens, \u201cStand-Alone Self-Attention in Vision Models,\u201d Jun. 2019, Accessed: Dec. 08, 2019. [Online]. Available: http:\/\/arxiv.org\/abs\/1906.05909.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dcc09e1 elementor-widget elementor-widget-text-editor\" data-id=\"dcc09e1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>[2] \u201cGenerative Modeling with Sparse Transformers.\u201d https:\/\/openai.com\/blog\/sparse-transformer\/ (accessed Sep. 19, 2021).<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fbef7df elementor-widget elementor-widget-text-editor\" data-id=\"fbef7df\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>[3] R. Child, S. Gray, A. Radford, and I. Sutskever, \u201cGenerating Long Sequences with Sparse Transformers,\u201d Apr. 2019, Accessed: Sep. 19, 2021. [Online]. Available: https:\/\/arxiv.org\/abs\/1904.10509v1.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d794fca elementor-widget elementor-widget-text-editor\" data-id=\"d794fca\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>[4] M. Chen <em>et al.<\/em>, \u201cGenerative Pretraining from Pixels,\u201d Accessed: Sep. 19, 2021. [Online]. Available: https:\/\/cdn.openai.com\/papers\/Generative_Pretraining_from_Pixels_V2.pdf.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-452f009 elementor-widget elementor-widget-text-editor\" data-id=\"452f009\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>[5] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, \u201cLanguage Models are Unsupervised Multitask Learners,\u201d Accessed: Sep. 19, 2021. [Online]. Available: https:\/\/cdn.openai.com\/better-language-models\/language_models_are_unsupervised_multitask_learners.pdf.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-04f60a1 elementor-widget elementor-widget-text-editor\" data-id=\"04f60a1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>[6] R. Bommasani <em>et al.<\/em>, \u201cOn the Opportunities and Risks of Foundation Models,\u201d Aug. 2021, Accessed: Sep. 20, 2021. [Online]. Available: https:\/\/arxiv.org\/abs\/2108.07258v2.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>In the previous article, we reviewed several approaches to applying attention to vision models. We will continue our discussion, present a few additional vision models approaches in this article, and discuss their advantages over traditional approaches. Stand-Alone Self Attention Ramachandran et al. 2019 proposed an attention mechanism like the 2-D attention in Image Transformer [1].<\/p>\n","protected":false},"author":1193,"featured_media":26744,"comment_status":"open","ping_status":"open","sticky":false,"template":"single-post-2.php","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183,965],"tags":[97,94,92],"ppma_author":[4007],"class_list":["post-26738","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","category-ai-machine-learning","tag-artificial-intelligence","tag-data-science","tag-machine-learning"],"authors":[{"term_id":4007,"user_id":1193,"is_guest":0,"slug":"raghav","display_name":"Raghavendran","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/09\/1577746037422-150x150.jpeg","user_url":"","last_name":"Ramakrishnan","first_name":"Raghavendran","job_title":"","description":"Raghavendran is a Machine Learning Engineer at IQVIA. He completed his Masters from Arizona State University. He is passionate about deep learning and particularly interested in understanding the role of vision and language in learning a task."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/26738","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1193"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=26738"}],"version-history":[{"count":19,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/26738\/revisions"}],"predecessor-version":[{"id":30457,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/26738\/revisions\/30457"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/26744"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=26738"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=26738"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=26738"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=26738"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}