{"id":17887,"date":"2021-05-12T06:49:00","date_gmt":"2021-05-12T06:49:00","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/time-to-put-an-end-to-bertology-or-ml-dl-is-not-even-relevant-to-nlu\/"},"modified":"2023-08-22T06:40:57","modified_gmt":"2023-08-22T06:40:57","slug":"time-to-an-end-to-bertology-or-ml-dl-not-even-relevant-to-nlu","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/time-to-an-end-to-bertology-or-ml-dl-not-even-relevant-to-nlu\/","title":{"rendered":"Time To Put An End To BERTology (Or, ML\/DL Is Not Even Relevant To NLU)"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"17887\" class=\"elementor elementor-17887\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-4d45e1d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4d45e1d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-7b1d849\" data-id=\"7b1d849\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-519a31e elementor-widget elementor-widget-text-editor\" data-id=\"519a31e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"12d5\">There are 3 technical (read: theoretical, scientific) reasons why the data-driven\/quantitative\/statistical\/machine learning approaches (that I will collectively refer to as BERTology) are utterly hopeless and futile efforts, at least when it comes to language\u00a0<strong>understanding<\/strong>. This is a big claim, I understand, especially given the current trend, the misguided media hype, and the massive amount of money the tech giants are spending on this utterly flawed paradigm. As I have repeated this claim in my publications, in seminars and posts, I have often been told \u201cbut could all of those people be wrong?\u201d Well, for now I will simply reply with \u201cyes, they could indeed all be wrong\u201d. I say that armed with the wisdom of the great mathematician\/logician Bertrand Russell who once said<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-55b5d56 elementor-widget elementor-widget-text-editor\" data-id=\"55b5d56\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<blockquote class=\"wp-block-quote\">\n<p><mark>The fact that an opinion has been widely held is no evidence whatsoever that it is not utterly absurd<\/mark><\/p>\n<\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-44a190f elementor-widget elementor-widget-text-editor\" data-id=\"44a190f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"5866\">Before we begin, however, it is important to emphasize that our discussion is directed to the use of BERTology in NL<strong>U<\/strong>, and the \u2018U\u2019 here is crucial \u2014 that is, and as will become obvious below, BERTology might be useful in some NL<strong>P<\/strong>\u00a0tasks (such as text summarization, search, extraction of key phrases, text similarity and\/or clustering, etc.) because these tasks are all some form of \u2018compression\u2019 that machine learning can be successfully applied to. However, we believe that NLP (which is essentially just text processing) is a completely different problem from NLU. Perhaps NLU should be replaced by\u00a0<strong>HuTU\u00a0<\/strong>for\u00a0<em>human thought understanding<\/em>, since <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/nlp-vs-nlu-from-understanding-a-language-to-its-processing\/\" target=\"_blank\" rel=\"noreferrer noopener\">NLU<\/a> is about comprehending the thoughts behind our linguistic utterances (you may also want to read\u00a0<a href=\"https:\/\/medium.com\/ontologik\/nlu-is-not-nlp-617f7535a92e?source=collection_home---6------1-----------------------\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">this<\/a>\u00a0short article that discusses this specific point).<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-70898a6 elementor-widget elementor-widget-text-editor\" data-id=\"70898a6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"ea42\">So, to summarize our introduction: the claim that we will defend here is that BERTology is a futile effort to NLU (in fact, it is irrelevant) and this claim is not about some NLP tasks, but is specific to true understanding of ordinary spoken language, the kind of which we do on a daily basis when we engage in dialogues with people that we don\u2019t even know, or with young children that do not have any domain specific knowledge!<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f8922c0 elementor-widget elementor-widget-text-editor\" data-id=\"f8922c0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9761\">Now we can get down to business.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-b4ebc24 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"b4ebc24\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d96281f\" data-id=\"d96281f\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-75772c8 elementor-widget elementor-widget-heading\" data-id=\"75772c8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">MTP \u2014 the Missing Text Phenomenon<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-02283aa elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"02283aa\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-7223b6a\" data-id=\"7223b6a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9b8e26e elementor-widget elementor-widget-text-editor\" data-id=\"9b8e26e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"ea67\">Let us start first with describing a phenomenon that is at the heart of all challenges in natural language understanding, which we refer to as the \u201cmissing text phenomenon\u201d (MTP).<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-46991c4 elementor-widget elementor-widget-text-editor\" data-id=\"46991c4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1kGFr0U5GBy892ph_C9Wi_g.png\" alt=\"Time To Put An End To BERTology (Or, ML\/DL Is Not Even Relevant To NLU)\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-66f0bfe elementor-widget elementor-widget-text-editor\" data-id=\"66f0bfe\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"be74\">Linguistic communication happens as shown in the image above: a thought is encoded by a speaker into some linguistic utterance (in some language), and the listener then decodes that linguistic utterance into (hopefully)\u00a0<em>the thought that the speaker intended to convey<\/em>! It is that \u201cdecoding\u201d process that is the \u2018U\u2019 in NLU \u2014 that is, understanding the thought behind the linguistic utterance is exactly what happens in the decoding process. Moreover, there are no approximations or any degrees of freedom in this \u2018decoding\u2019 process \u2014 that is, from the multitude of possible meanings of an utterance,\u00a0<strong>there is one and only one thought the speaker intended to convey in making an utterance<\/strong>.And this is precisely why NLU is difficult. Let\u2019s elaborate.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c897c65 elementor-widget elementor-widget-text-editor\" data-id=\"c897c65\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"be65\">In this complex communication there are two possible alternatives for optimization, or for effective communication: (<em>i<\/em>) the speaker can compress (and minimize) the amount of information sent in the encoding of the thought and hope that the listener will do some extra work in the decoding (uncompressing) process; or (<em>ii<\/em>) the speaker will do the hard work and send all the information needed to convey the thought which would leave the listener with little to do (see\u00a0<a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0010027715000815\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">this<\/a>\u00a0article for a full description of this process). The natural evolution of this process, it seems, has resulted in the right balance where the total work of both speaker and listener is optimized. That optimization resulted in the speaker encoding the minimum possible information that is needed, while leaving out everything else that can be safely assumed to be information that is available for the listener. The information we tend to leave out is usually information that we can safely assume to be available for both speaker and listener, and this is precisely the information that we usually call\u00a0<strong><em>common\u00a0<\/em><\/strong><em>background knowledge<\/em>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5e2002e elementor-widget elementor-widget-text-editor\" data-id=\"5e2002e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"0652\">To appreciate the intricacies of this process, consider the following (unoptimized) communication:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6510560 elementor-widget elementor-widget-text-editor\" data-id=\"6510560\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1uWQBUyJ5-lrEzM-gpDWS4w.png\" alt=\"Upoptimized\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f9cd695 elementor-widget elementor-widget-text-editor\" data-id=\"f9cd695\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"2d61\">It should be very obvious that we certainly do not communicate this way. In fact, the above thought is usually expressed as follows:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3245cf3 elementor-widget elementor-widget-text-editor\" data-id=\"3245cf3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1uogltBOTqFALTyJwyNqw2w.png\" alt=\"Time To Put An End To BERTology (Or, ML\/DL Is Not Even Relevant To NLU)\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-589fb52 elementor-widget elementor-widget-text-editor\" data-id=\"589fb52\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"2a1a\">This much shorter message, which is how we usually speak, conveys the same\u00a0<em>thought\u00a0<\/em>as the longer one. We do not explicitly state all the other stuff because we all know<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b4ba337 elementor-widget elementor-widget-text-editor\" data-id=\"b4ba337\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1tZUYbO0OspmVGWdAevTBMA.png\" alt=\"Time To Put An End To BERTology (Or, ML\/DL Is Not Even Relevant To NLU)\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a52491e elementor-widget elementor-widget-text-editor\" data-id=\"a52491e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"a86f\">That is, for effective communication, we do not say what we can assume we all know! This is also precisely why we all tend to leave out the same information \u2014 because we all know what everyone knows , and that is the \u201ccommon\u201d background knowledge. This genius optimization process that humans have developed in about 200,000 years of evolution works quite well, and precisely because we all know what we all know. But this is where the problem is in AI\/NLU. Machines don\u2019t know what we leave out, because they don\u2019t know what we all know. The net result? NLU is very very difficult, because a software program can only fully understand the thoughts behind our linguistic utterances if they can somehow \u201cuncover\u201d all that stuff that humans assume and leave out in their linguistic communication. That, really, is the NLU challenge (and not parsing, stemming, POS tagging, etc.) In fact, here are some well-known challenges in NLU \u2014 with the label such problems are usually given in computational linguistics. I am showing here (just some of) the missing text highlighted in red:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9e9e8d7 elementor-widget elementor-widget-text-editor\" data-id=\"9e9e8d7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1HxFQqmn_hrF2AOWR5YnuUQ.png\" alt=\"Challenges in NLU\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d0ff2bb elementor-widget elementor-widget-text-editor\" data-id=\"d0ff2bb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"b73f\">All the above well-known challenges in NLU are due to the fact that the challenge is to discover (or uncover) that information that is missing and implicitly assumed as shared and common background knowledge.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ccea7bf elementor-widget elementor-widget-text-editor\" data-id=\"ccea7bf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"843d\">Now that we are (hopefully) convinced that NLU is difficult because of MTP \u2014 that is, because our ordinary spoken language in everyday discourse is highly (if not optimally) compressed, and thus the challenge in \u201cunderstanding\u201d is in uncompressing (or uncovering) the missing text, I can state the first technical reason why BERTology is not relevant to NLU.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-114b5c2 elementor-widget elementor-widget-text-editor\" data-id=\"114b5c2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"07f4\">The equivalence between (machine) learnability (ML) and compressibility (COMP) has been mathematically established. That is, it has been established that learnability from a data set can only happen if the data is highly compressible (i.e., it has lots of redundancies) and vice versa (see\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1610.03592\" target=\"_blank\" rel=\"noreferrer noopener\">this<\/a>\u00a0article and the important\u00a0<a href=\"https:\/\/www.nature.com\/articles\/s42256-018-0002-3\" target=\"_blank\" rel=\"noreferrer noopener\">article<\/a>\u00a0\u201cLearnability can be Undecidable\u201d that appeared in 2019 in the journal\u00a0<em>Nature<\/em>). But MTP tells us that NLU is about uncompressing. What we now have is the following:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-60d2690 elementor-widget elementor-widget-text-editor\" data-id=\"60d2690\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1MxbfdK2KQ3QT7VMK3ClOoA.png\" alt=\"Time To Put An End To BERTology (Or, ML\/DL Is Not Even Relevant To NLU)\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-da67ff0 elementor-widget elementor-widget-text-editor\" data-id=\"da67ff0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"ddf3\">End of proof 1.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-867178c elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"867178c\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-46183b5\" data-id=\"46183b5\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-16c6e19 elementor-widget elementor-widget-heading\" data-id=\"16c6e19\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Intension (with an \u2018s\u2019)<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-657c1ac elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"657c1ac\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-e4c6404\" data-id=\"e4c6404\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-af8a5be elementor-widget elementor-widget-text-editor\" data-id=\"af8a5be\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"df31\">Intension is another phenomenon I want to discuss, before I get to the second proof that BERTology is not even relevant to NLU. I will start with what is known as the meaning triangle, shown below with an example:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e60c35f elementor-widget elementor-widget-text-editor\" data-id=\"e60c35f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Zk2NLHzn7embX_FSRIo3qA.png\" alt=\"Bertology\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-06ea232 elementor-widget elementor-widget-text-editor\" data-id=\"06ea232\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"b07a\">Thus every \u201cthing\u201d (or every object of cognition) has three parts: a symbol that refers to the concept, and the concept has (sometimes) actual instances. I say sometimes, because the concept \u201cunicorn\u201d has no \u201cactual\u201d instances, at least in the world we live in! The concept itself is an idealized template for all its potential instances (and thus it is close to the idealized\u00a0<em>Forms\u00a0<\/em>of Plato!) You can imagine how philosophers, logicians and cognitive scientists might have debated for centuries the nature of concepts and how they are defined. Regardless of that debate, we can agree on one thing: a concept (which is usually referred to by some symbol\/label) is defined by a set of properties and attributes and perhaps with additional axioms and established facts, etc. Nevertheless, a concept is not the same as the actual (imperfect) instances. This is also true in the perfect world of mathematics. So, for example, while the arithmetic expressions below all have the same extension, they have different intensions:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4e542d3 elementor-widget elementor-widget-text-editor\" data-id=\"4e542d3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1BOZHoVZLFizxU-QLfO7sUw.png\" alt=\"Time To Put An End To BERTology (Or, ML\/DL Is Not Even Relevant To NLU)\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-61d9df2 elementor-widget elementor-widget-text-editor\" data-id=\"61d9df2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"2135\">Thus, while all the expressions evaluate to 16, and thus are equal in one sense (their\u00a0<strong>VALUE<\/strong>), this is only one of their attributes. In fact, the expressions above have several other attributes, such as their\u00a0<strong>syntactic structure\u00a0<\/strong>(that\u2019s why (a) and (d) are different),\u00a0<strong>number of operators<\/strong>,\u00a0<strong>number of operands<\/strong>, etc. The\u00a0<strong>VALUE\u00a0<\/strong>(which is just one attribute) is called the extension, while\u00a0<strong><em>the set of all the attributes\u00a0<\/em><\/strong>is the intension. While in applied sciences (engineering, economics, etc.) we can safely consider these objects to be equal if they are equal in the\u00a0<strong>VALUE\u00a0<\/strong>attribute only, in cognition (and especially in language understanding) this equality fails! Here\u2019s one simple example:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2e28526 elementor-widget elementor-widget-text-editor\" data-id=\"2e28526\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1lQ8jdpa_FZFglXEWoh4LOQ.png\" alt=\"Time To Put An End To BERTology (Or, ML\/DL Is Not Even Relevant To NLU)\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-280da9f elementor-widget elementor-widget-text-editor\" data-id=\"280da9f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d874\">Suppose that (1) is true \u2014 that is, suppose (1) actually happened, and we saw it\/witnessed it. Still, that does not mean we can assume (2) is true, although all we did was replace \u201816\u2019 in (1) by a value that is (supposedly) equal to it. So what happened? We replaced one object in a true statement by an object that is supposedly equal to it, and we have inferred from something that is true something that is not! Well, what happened is this: while in physical sciences we can easily replace an object by one that is equal to it with one attribute, this does not work in cognition! Here\u2019s another example:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-00ae99a elementor-widget elementor-widget-text-editor\" data-id=\"00ae99a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1kyhE6R8o5XGab4QxFVkRlg.png\" alt=\"Time To Put An End To BERTology (Or, ML\/DL Is Not Even Relevant To NLU)\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-be421f3 elementor-widget elementor-widget-text-editor\" data-id=\"be421f3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9e4e\">We obtained (2), which is ridiculous, by simply replacing \u2018<em>the tutor of Alexander the Great<\/em>\u2019 by a value that is equal to it, namely\u00a0<em>Aristotle<\/em>. Again, while \u2018<em>the tutor of Alexander the Great<\/em>\u2019 and \u2018<em>Aristotle<\/em>\u2019 are equal in one sense (they both have the same value as a referent), these two objects of thought are different in many other respects.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-76f5f3a elementor-widget elementor-widget-text-editor\" data-id=\"76f5f3a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"41d0\">I\u2019ll stop here with the discussion of what \u2018intension\u2019 is and why it\u2019s important in high-level reasoning, and specifically in NLU. The interested reader can look at\u00a0<a href=\"https:\/\/medium.com\/ontologik\/in-nlu-you-ignore-intension-at-your-peril-dd173670660d\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">this<\/a>\u00a0short article where I have references there to additional material.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ff84365 elementor-widget elementor-widget-text-editor\" data-id=\"ff84365\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"6642\">So, what is the point from this discussion on \u2018intension\u2019. Natural language is rampant with\u00a0<em>intensional\u00a0<\/em>phenomena, since objects of thoughts \u2014 that language conveys \u2014 have an intensional aspect that cannot be ignored. But BERTology, in all its variants, is a purely\u00a0<em>extensional\u00a0<\/em>system and can only deal with extensions (<strong>numeric values, tensors\/vectors<\/strong>) only and thus it cannot model or account for intensions, and thus, it cannot model various phenomena in language.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-73b0e81 elementor-widget elementor-widget-text-editor\" data-id=\"73b0e81\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"30ee\">End of proof 2.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0a2ec02 elementor-widget elementor-widget-text-editor\" data-id=\"0a2ec02\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<blockquote class=\"wp-block-quote\">\n<p>Incidentally, the fact that BERTology is a purely extensional paradigm and it cannot account for \u2018intension\u2019 is the source of so-called \u2018adversarial examples\u2019 in DL. The problem is related to the fact that once tensors (high-dimensional vectors) are composed into one tensor, the resulting tensor can now be decomposed into components in infinite number of ways (meaning the decomposition is undecidable) \u2014 that is, once the input tensors are composed we lose the original structure (in simple terms: 10 can be the value of 2 * 5, but also the result of 8 + 1 + 1, and the result of 9 + 1 + 0, etc). Neural networks can\u00a0<strong>always\u00a0<\/strong>be hit with adversarial examples because by optimizing in reverse, we can always get the outputs expected at any layer but from components other than the ones expected. But this is a discussion for another place and another time.<\/p>\n<\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-9fef77f elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9fef77f\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-91b0365\" data-id=\"91b0365\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-24f758a elementor-widget elementor-widget-heading\" data-id=\"24f758a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Statistical (In)Significance<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-a1f2b7b elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a1f2b7b\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-2f65aed\" data-id=\"2f65aed\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-30513c1 elementor-widget elementor-widget-text-editor\" data-id=\"30513c1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"5b85\">One of the main issues with BERTology regarding\u00a0<em>statistical significance\u00a0<\/em>is the issue of\u00a0<strong>function words<\/strong>, that in BERTology must be ignored and are labelled as \u201cstop-words\u201d. These words have the same probability in every context and thus they must be taken out because they will disrupt the entire probability space. But, and whether BERTologists like it or not,\u00a0<strong>function words are the words that in the end glue together the final meaning<\/strong>. Just consider the difference between the pair of sentences below:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8acdab3 elementor-widget elementor-widget-text-editor\" data-id=\"8acdab3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1XePjGwxrhPPyOnhrvnby8g.png\" alt=\"Function Words\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ecdeb1d elementor-widget elementor-widget-text-editor\" data-id=\"ecdeb1d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"b714\">In (2a) we are referring to 50 groups, while to only 1 in (2b). How we interpret quantifiers, prepositions, modals, etc. changes the target (and intended) meaning considerably and thus there cannot be any true language understanding without taking function words into account, and in BERTology they cannot be (appropriately) modelled.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a8b6664 elementor-widget elementor-widget-text-editor\" data-id=\"a8b6664\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"455f\">We could have stopped here and that would be the end of proof 3 that BERTology is not even relevant to NLU. But there\u2019s more\u2026<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f50c485 elementor-widget elementor-widget-text-editor\" data-id=\"f50c485\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"addb\">BERTology is essentially a paradigm that is based on finding some patterns (correlations) in the data. Thus the hope in that paradigm is that there are statistically significant differences between various phenomenon in natural language, otherwise they will be considered essentially the same. But consider the following (see\u00a0<a href=\"https:\/\/cs.nyu.edu\/faculty\/davise\/papers\/WSKR2012.pdf\" rel=\"noopener\">this<\/a>\u00a0and\u00a0<a href=\"https:\/\/arxiv.org\/ftp\/arxiv\/papers\/1810\/1810.00324.pdf\" class=\"broken_link\" rel=\"noopener\">this<\/a>\u00a0for a discussion on this example as it relates to the Winograd Schema Challenge):<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2dce819 elementor-widget elementor-widget-text-editor\" data-id=\"2dce819\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1chz7xOnHTJLctcs_PtMwfQ.png\" alt=\"Winograd schema challenge\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-95b7506 elementor-widget elementor-widget-text-editor\" data-id=\"95b7506\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"da54\">Note that antonyms\/opposites such as \u2018<em>small<\/em>\u2019 and \u2018<em>big<\/em>\u2019 (or \u2018open\u2019 and \u2018close\u2019, etc.) occur in the same contexts with equal probabilities. As such, (1<em>a<\/em>) and (1<em>b<\/em>) are statistically equivalent, yet even for a 4-year old (1<em>a<\/em>) and (1<em>b<\/em>) are considerably different: \u201cit\u201d in (1<em>a<\/em>) refers to \u201cthe suitcase\u201d while in (1<em>b<\/em>) it refers to \u201cthe trophy\u201d. Basically, and in simple language, (1<em>a<\/em>) and (1<em>b<\/em>) are statistically equivalent, although semantically far from it. Thus, statistical analysis cannot model (not even approximate) semantics \u2014 it is that simple!<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-79d4c62 elementor-widget elementor-widget-text-editor\" data-id=\"79d4c62\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3aa9\">But let us see how many examples we would need if one insists on using BERTology to learn how to correctly resolve \u201cit\u201d in such structures. First of all, in BERTology there is no notion of\u00a0<strong>type\u00a0<\/strong>(and no symbolic knowledge whatsoever, for that matter). Thus the following are all different:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-29cf098 elementor-widget elementor-widget-text-editor\" data-id=\"29cf098\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"http:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1FSOIDeKTZgPBhCFsBTWDqA.png\" alt=\"Time To Put An End To BERTology (Or, ML\/DL Is Not Even Relevant To NLU)\" \/><\/figure>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fdf8014 elementor-widget elementor-widget-text-editor\" data-id=\"fdf8014\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3d8c\">That is, in BERTology there is no type hierarchy where we can make generalized statements about a \u2018bag\u2019, a \u2018suitcase\u2019, and a \u2018briefcase\u2019 etc. where all are considered subtypes of the general type \u2018container\u2019. Thus, each one of the above, in a purely data-driven paradigm, are different and must be \u2018seen\u2019 separately. If we add to the semantic differences all the minor syntactic differences to the above pattern (say changing \u2018because\u2019 to \u2018although\u2019 \u2014 which also changes the correct referent to \u201cit\u201d) then a rough calculation tells us that a BERTology system would need to see something like 40,000,000 variations of the above, and all of this just to resolve a reference like \u201cit\u201d in structures like the one in (1). If anything, this is computationally implausible. As Fodor and Pylyshyn once famously quoted the renowned cognitive scientist George Miller, to capture all syntactic and semantic variations that an NLU system would require, the number of features a neural network might need is more than the number of atoms in the universe! (I would recommend for anyone interested in cognitive science this classic and brilliant paper \u2014 it is available\u00a0<a href=\"http:\/\/ruccs.rutgers.edu\/images\/personal-zenon-pylyshyn\/proseminars\/Proseminar13\/ConnectionistArchitecture.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>).<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9577193 elementor-widget elementor-widget-text-editor\" data-id=\"9577193\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"ac6b\">To conclude this section, often there is no statistical significance in natural language that can explain the different interpretations \u2014 and precisely because the information needed to bring out the statistical significance\u00a0<strong>is not in the data<\/strong>\u00a0but is information that is available elsewhere \u2014 in the above example, the information needed is something like this:\u00a0<strong>not<\/strong>(FIT(x,y)) then LARGER(y, x) is more likely than LARGER(x, y). In short, the only source of information in BERTology is the data, but very often the required information required for a correct interpretation is not even in the data, and you can\u2018t\u2019 find what is not even there.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3f56e67 elementor-widget elementor-widget-text-editor\" data-id=\"3f56e67\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"5519\">End of proof 3<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-b0f36e3 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"b0f36e3\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-acefc82\" data-id=\"acefc82\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2eb3c14 elementor-widget elementor-widget-heading\" data-id=\"2eb3c14\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Conclusion<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-649d6bc elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"649d6bc\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-84d2f07\" data-id=\"84d2f07\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-3bd6244 elementor-widget elementor-widget-text-editor\" data-id=\"3bd6244\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9bfb\">I have discussed three reasons that proves BERTology is not even relevant to NLU (although it might be used in text processing tasks that are essentially compression tasks). Each of the above three reasons is enough on its own to put and end to this runaway train called BERTology.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-984a93c elementor-widget elementor-widget-text-editor\" data-id=\"984a93c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d860\">Natural language is not (<strong>just<\/strong>) data!<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>There are 3 technical reasons why the data driven \/ quantitative\/statistical\/machine learning approaches that are utterly hopeless and futile efforts, at least when it comes to language understanding.<\/p>\n","protected":false},"author":1127,"featured_media":23571,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[206,92,474,1573],"ppma_author":[3675],"class_list":["post-17887","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-deep-learning","tag-machine-learning","tag-nlp","tag-nlu"],"authors":[{"term_id":3675,"user_id":1127,"is_guest":0,"slug":"walid-saba","display_name":"Walid Saba","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Walid-Saba-150x150.jpeg","user_url":"https:\/\/ontologik.ai\/","last_name":"Saba","first_name":"Walid","job_title":"","description":"Walid Saba, PhD, is Co-Founder &amp; NLU Scientist at ONTOLOGIK.AI, a developer of a natural language understanding engine that can read text and automatically construct\/feed a Knowledge Graph."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/17887","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1127"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=17887"}],"version-history":[{"count":7,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/17887\/revisions"}],"predecessor-version":[{"id":31024,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/17887\/revisions\/31024"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/23571"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=17887"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=17887"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=17887"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=17887"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}