{"id":22541,"date":"2021-01-05T10:59:56","date_gmt":"2021-01-05T10:59:56","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/bert-model-transferring-knowledge-cross-encoders-bi-encoders\/"},"modified":"2023-09-13T12:25:12","modified_gmt":"2023-09-13T12:25:12","slug":"bert-model-transferring-knowledge-cross-encoders-bi-encoders","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/bert-model-transferring-knowledge-cross-encoders-bi-encoders\/","title":{"rendered":"Advance BERT Model Via Transferring Knowledge From Cross-Encoders To Bi-Encoders"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22541\" class=\"elementor elementor-22541\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-6b52a05 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6b52a05\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-76e9c4b\" data-id=\"76e9c4b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-1fa873e elementor-widget elementor-widget-text-editor\" data-id=\"1fa873e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Data Augmentation Method to improve SBERT Bi-Encoders for Pairwise Sentence Scoring Tasks (Semantic sentence tasks)<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f0075ab elementor-widget elementor-widget-heading\" data-id=\"f0075ab\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Background and challenges<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b562258 elementor-widget elementor-widget-text-editor\" data-id=\"b562258\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"66eb\">Currently, the state-of-the-art architecture models for NLP usually reuse the BERT model which was pre-trained on large text corpora such as Wikipedia and the Toronto Books Corpus as the baseline [1]. By fine-tuning deep pre-trained BERT, a lot of alternative architectures were invented like DeBERT, RetriBERT, RoBERTa,\u2026 that achieved substantial improvements to the benchmarks on a variety of language understanding tasks. Among common tasks in NLP, pairwise sentence scoring has a wide number of applications in information retrieval, question answering, duplicate question detection, or clustering,&#8230; Generally, there are two typical approaches proposed: <strong>Bi-encoders<\/strong> and <strong>Cross-encoders<\/strong>.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b9b825c elementor-widget elementor-widget-text-editor\" data-id=\"b9b825c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li><strong>Cross-encoders [2]: <\/strong>performs full (cross) self-attention over a given input and label candidate, and tends to attain much higher accuracies than their counterparts. However, it must recompute the encoding for each input and label; as a result, they are not possible to retrieval end-to-end information cause they do not yield independent representations for the inputs and is prohibitively slow at test time. For example, the clustering of 10,000 sentences has a quadratic complexity and requires about 65 hours in training [4].<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8a8a351 elementor-widget elementor-widget-image\" data-id=\"8a8a351\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"656\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15j-mV8zYNKbb9pGf1dZexg-1024x656.png\" class=\"attachment-large size-large wp-image-18334\" alt=\"Advance BERT Model Via Transferring Knowledge From Cross-Encoders To Bi-Encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15j-mV8zYNKbb9pGf1dZexg-1024x656.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15j-mV8zYNKbb9pGf1dZexg-300x192.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15j-mV8zYNKbb9pGf1dZexg-768x492.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15j-mV8zYNKbb9pGf1dZexg-610x391.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15j-mV8zYNKbb9pGf1dZexg-750x481.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15j-mV8zYNKbb9pGf1dZexg-1140x731.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15j-mV8zYNKbb9pGf1dZexg.png 1504w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Cross-encoders<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-048049f elementor-widget elementor-widget-text-editor\" data-id=\"048049f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li><strong>Bi-encoders [3]:<\/strong> performs self-attention over the input and candidate label separately, maps them to a dense vector space, and then combines them at the end for a final representation. Therefore, Bi-encoders are able to index the encoded candidates and compare these representations for each input resulting in fast <a href=\"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/what-the-future-has-in-stock-for-us-20-predictions-for-2020-part-2\/\" target=\"_blank\" rel=\"noreferrer noopener\">prediction <\/a>times. At the same complexity of clustering 10,000 sentences, time is reduced from 65 hours to about 5 seconds [4]. The performance of the advanced Bi-encoder Bert model was presented by Ubiquitous Knowledge Processing Lab (UKP-TUDA), which is called <strong>Sentence-BERT<\/strong> (SBERT). For more details, this article indicates the hands\u2019 tutorial of using <a href=\"https:\/\/towardsdatascience.com\/a-complete-guide-to-transfer-learning-from-english-to-other-languages-using-sentence-embeddings-8c427f8804a9\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\"><strong>SBert Bi-encoders<\/strong><\/a>.<\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2c3177e elementor-widget elementor-widget-image\" data-id=\"2c3177e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"518\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Zjgpwe0fNO6Hxq4ZBquFNw-1024x518.png\" class=\"attachment-large size-large wp-image-18335\" alt=\"Advance BERT Model Via Transferring Knowledge From Cross-Encoders To Bi-Encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Zjgpwe0fNO6Hxq4ZBquFNw-1024x518.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Zjgpwe0fNO6Hxq4ZBquFNw-300x152.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Zjgpwe0fNO6Hxq4ZBquFNw-768x388.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Zjgpwe0fNO6Hxq4ZBquFNw-1536x776.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Zjgpwe0fNO6Hxq4ZBquFNw-610x308.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Zjgpwe0fNO6Hxq4ZBquFNw-750x379.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Zjgpwe0fNO6Hxq4ZBquFNw-1140x576.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Zjgpwe0fNO6Hxq4ZBquFNw.png 1642w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Bi-Encoder<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-693ecca elementor-widget elementor-widget-text-editor\" data-id=\"693ecca\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"6b60\">On the other hand, no methodology is perfect in all aspects and <strong>Bi-encoders<\/strong> is not an exception. The <strong>Bi-encoders<\/strong> method usually achieves lower performance compared with the <strong>Cross-encoders<\/strong> method and requires a large amount of training data. The reason is <strong>Cross-encoders<\/strong> can compare both inputs simultaneously, while the <strong>Bi-encoders<\/strong> have to independently map inputs to a meaningful vector space which requires a sufficient amount of training examples for fine-tuning.<\/p>\n\n<p id=\"15f0\">To solve this problem, <strong>Poly-encoders<\/strong> was invented [5]. <strong>Poly-encoders<\/strong> utilizes two separate transformers (similar to cross-encoders), but attention was applied between two inputs only at the top layer, resulting in better performance gains over <strong>Bi-encoders<\/strong> and large speed gains over <strong>Cross-encoders. <\/strong>However,<strong> Poly-encoders<\/strong> still have some drawbacks: they cannot be applied for tasks with symmetric similarity relations because of an asymmetrical score function and <strong>Poly-encoders<\/strong> representations cannot be efficiently indexed, causing issues for retrieval tasks with large corpora sizes.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-89a75d1 elementor-widget elementor-widget-image\" data-id=\"89a75d1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"598\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1eeV-adbX6kyRM83VZBsg-1024x598.png\" class=\"attachment-large size-large wp-image-18336\" alt=\"Advance BERT Model Via Transferring Knowledge From Cross-Encoders To Bi-Encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1eeV-adbX6kyRM83VZBsg-1024x598.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1eeV-adbX6kyRM83VZBsg-300x175.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1eeV-adbX6kyRM83VZBsg-768x448.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1eeV-adbX6kyRM83VZBsg-1536x896.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1eeV-adbX6kyRM83VZBsg-610x356.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1eeV-adbX6kyRM83VZBsg-750x438.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1eeV-adbX6kyRM83VZBsg-1140x665.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1eeV-adbX6kyRM83VZBsg.png 1669w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Poly-encoders<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0b67fe1 elementor-widget elementor-widget-text-editor\" data-id=\"0b67fe1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"437f\">In this article, I want to introduce a new approach that can use both Cross-encoders and Bi-encoders in an effective way &#8211; data augmentation. This strategy is known as<strong> Augmented SBERT<\/strong> (AugSBERT) [6], which uses <strong>BERT cross-encoders <\/strong>to label a larger set of input pairs to augment the training data for <strong>SBERT bi-encoders<\/strong>. Then, <strong>SBERT bi-encoders<\/strong> is fine-tuned on this larger augmented training set, which yields a significant performance increase. The idea is very similar to <a href=\"https:\/\/towardsdatascience.com\/train-without-labeling-data-using-self-supervised-learning-by-relational-reasoning-b0298ad818f9\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\"><strong>Self-Supervised Learning by Relational Reasoning<\/strong><\/a> in Computer Vision. Therefore, in a simple way, we can think that it is Self-Supervised Learning in Natural Language Processing. For more details, it will be presented in the next section.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bf314c8 elementor-widget elementor-widget-heading\" data-id=\"bf314c8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Technique highlight<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0dac514 elementor-widget elementor-widget-text-editor\" data-id=\"0dac514\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"da2f\">There are three major scenarios for the Augmented SBERT approach for either pairwise-sentence regression or classification task.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-96dc1bb elementor-widget elementor-widget-heading\" data-id=\"96dc1bb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Scenario 1: Full annotated datasets (all labeled sentence-pairs)<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-611106a elementor-widget elementor-widget-text-editor\" data-id=\"611106a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"dd35\">In this scenario, the straight forward data augmentation strategy is applied to prepare and extend the labeled dataset. There are three most common levels: Character, Word, Sentence.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4c9e2af elementor-widget elementor-widget-image\" data-id=\"4c9e2af\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"393\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1-lIpcq0Z8ausjWdQAMG_xQ-1024x393.png\" class=\"attachment-large size-large wp-image-18337\" alt=\"Advance BERT Model Via Transferring Knowledge From Cross-Encoders To Bi-Encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1-lIpcq0Z8ausjWdQAMG_xQ-1024x393.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1-lIpcq0Z8ausjWdQAMG_xQ-300x115.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1-lIpcq0Z8ausjWdQAMG_xQ-768x295.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1-lIpcq0Z8ausjWdQAMG_xQ-610x234.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1-lIpcq0Z8ausjWdQAMG_xQ-750x288.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1-lIpcq0Z8ausjWdQAMG_xQ-1140x438.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1-lIpcq0Z8ausjWdQAMG_xQ.png 1310w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Text augmentation level<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3f4f875 elementor-widget elementor-widget-text-editor\" data-id=\"3f4f875\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"37e6\">However, the word level is the most suitable one for the sentence pair task. Based on the performance of training Bi-Encoders, there are few suggested methodologies: Insert\/substitute word by contextual word embeddings (BERT, DistilBERT, RoBERTA or XLNet) or substitute word by synonym (WordNet, PPDB). After creating the augmented text data, it is then combined with the original one and fit into Bi-Encoders.<\/p>\n\n<p id=\"752d\">However, in the case of few labeled datasets or special cases, simple word replacement or increment strategies as shown are not helpful for data augmentation in sentence-pair tasks, even leading to worse performance compared to models without augmentation.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4f9a854 elementor-widget elementor-widget-heading\" data-id=\"4f9a854\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">In short, the straight forward data augmentation strategy involves three steps:<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7921396 elementor-widget elementor-widget-text-editor\" data-id=\"7921396\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>Step 1: Prepared the full labeled Semantic Text Similarity Dataset (gold data)<\/li><li>Step 2: Replace the synonyms words in pair sentence (silver data)<\/li><li>Step 3: Train a bi-encoder (SBERT) on the extended (gold + silver) training dataset<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d135e93 elementor-widget elementor-widget-image\" data-id=\"d135e93\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"472\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CDgkxnRIdUXsVIlh0-A6YA-1024x472.png\" class=\"attachment-large size-large wp-image-18338\" alt=\"Advance BERT Model Via Transferring Knowledge From Cross-Encoders To Bi-Encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CDgkxnRIdUXsVIlh0-A6YA-1024x472.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CDgkxnRIdUXsVIlh0-A6YA-300x138.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CDgkxnRIdUXsVIlh0-A6YA-768x354.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CDgkxnRIdUXsVIlh0-A6YA-610x281.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CDgkxnRIdUXsVIlh0-A6YA-750x346.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CDgkxnRIdUXsVIlh0-A6YA-1140x525.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CDgkxnRIdUXsVIlh0-A6YA.png 1450w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Training SBert Bi-encoders with full annotated datasets<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-995a0f0 elementor-widget elementor-widget-heading\" data-id=\"995a0f0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Scenario 2: Limited or small annotated datasets (few labeled sentence-pairs)<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b43f41b elementor-widget elementor-widget-text-editor\" data-id=\"b43f41b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"26b7\">In this case, because of the limited labeled datasets (gold dataset), the pre-trained <strong>Cross-encoders <\/strong>are used to weakly label the unlabeled data (same domain). However, randomly selecting two sentences usually leads to a dissimilar (negative) pair; while positive pairs are extremely rare. This skews the label distribution of the silver dataset heavily towards negative pairs. Therefore, the two appropriate sampling approaches are suggested:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3867658 elementor-widget elementor-widget-text-editor\" data-id=\"3867658\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>BM25 Sampling (BM25): the algorithm is based on the lexical overlap and is commonly used as a scoring function by many search engines [7]. The top k similar sentences are queried and retrieved from unique indexed sentences.<\/li><li>Semantic Search Sampling (SS): pre-trained <strong>Bi-encoders (SBERT) <\/strong>[4] are used to retrieve the top k most similar sentences in our collection. For large collections, approximate nearest neighbor search like Faiss could be used to quickly retrieve the k most similar sentences. It is able to solve the drawback of BM25 on synonymous sentences with no or little lexical overlap.<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7503ddd elementor-widget elementor-widget-text-editor\" data-id=\"7503ddd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3a7d\">After that, the sampled sentence pairs will be weakly labeled by pre-trained <strong>Cross-encoders <\/strong>and be merged with the gold dataset. Then, <strong>Bi-encoders <\/strong>are trained on this extended training dataset. This model is called <strong>Augmented SBERT (AugSBERT). AugSBERT<\/strong> might improve the performance of existing <strong>Bi-encoders<\/strong> and reduce the difference with <strong>Cross-encoders.<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d9bceaf elementor-widget elementor-widget-heading\" data-id=\"d9bceaf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">In summary, <strong>AugSBERT<\/strong> for a limited dataset involves three steps:<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ad9ff25 elementor-widget elementor-widget-text-editor\" data-id=\"ad9ff25\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>Step 1: Fine-tune a <strong>Cross-encoders<\/strong> (BERT) over the small (gold dataset)<\/li><li>Step 2.1: Create pairs by recombination and reduce the pairs via BM25 or semantic search<\/li><li>Step 2.2: Weakly label new pairs with <strong>Cross-encoders<\/strong> (silver dataset)<\/li><li>Step 3: Train a <strong>Bi-encoders<\/strong> (SBERT) on the extended (gold + silver) training dataset<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8815eeb elementor-widget elementor-widget-image\" data-id=\"8815eeb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"299\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1k8E3v1x2X7X5tMrBcUU6yw-1024x299.png\" class=\"attachment-large size-large wp-image-18339\" alt=\"Advance BERT Model Via Transferring Knowledge From Cross-Encoders To Bi-Encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1k8E3v1x2X7X5tMrBcUU6yw-1024x299.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1k8E3v1x2X7X5tMrBcUU6yw-300x88.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1k8E3v1x2X7X5tMrBcUU6yw-768x225.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1k8E3v1x2X7X5tMrBcUU6yw-610x178.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1k8E3v1x2X7X5tMrBcUU6yw-750x219.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1k8E3v1x2X7X5tMrBcUU6yw-1140x333.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1k8E3v1x2X7X5tMrBcUU6yw.png 1245w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Training SBert with limited annotated datasets<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7abee19 elementor-widget elementor-widget-heading\" data-id=\"7abee19\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Scenario 3: No annotated datasets (Only unlabeled sentence-pairs)<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-206acc3 elementor-widget elementor-widget-text-editor\" data-id=\"206acc3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"b087\">This scenario happens when we want SBERT to attain high performance in different domain data (without annotation). Basically, SBERT fails to map sentences with unseen terminology to a sensible vector space. Hence, the relevant data augmentation strategy domain adaptation was proposed:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6711d2b elementor-widget elementor-widget-text-editor\" data-id=\"6711d2b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>Step 1: Train from scratch a <strong>Cross-encoders<\/strong> (BERT) over a source dataset, for which we contain annotations.<\/li><li>Step 2: Use these <strong>Cross-encoders<\/strong> (BERT) to label your target dataset i.e. unlabeled sentence pairs<\/li><li>Step 3: Finally, train a <strong>Bi-encoders<\/strong> (SBERT) on the labeled target dataset<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a3b7c28 elementor-widget elementor-widget-text-editor\" data-id=\"a3b7c28\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"b7c3\">Generally, <strong>AugSBERT<\/strong> benefits a lot when the source domain is rather generic and the target domain is rather specific. Vice-versa, when it goes from a specific domain to a generic target domain, only a slight performance increase is noted.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3455f40 elementor-widget elementor-widget-image\" data-id=\"3455f40\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"330\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CvIsEOdi3XAcr8PE_t2LTQ-1024x330.png\" class=\"attachment-large size-large wp-image-18340\" alt=\"Advance BERT Model Via Transferring Knowledge From Cross-Encoders To Bi-Encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CvIsEOdi3XAcr8PE_t2LTQ-1024x330.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CvIsEOdi3XAcr8PE_t2LTQ-300x97.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CvIsEOdi3XAcr8PE_t2LTQ-768x247.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CvIsEOdi3XAcr8PE_t2LTQ-1536x495.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CvIsEOdi3XAcr8PE_t2LTQ-610x197.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CvIsEOdi3XAcr8PE_t2LTQ-750x242.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CvIsEOdi3XAcr8PE_t2LTQ-1140x367.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1CvIsEOdi3XAcr8PE_t2LTQ.png 1620w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Training SBert Bi-encoders with none annotated datasets<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-49c6c48 elementor-widget elementor-widget-heading\" data-id=\"49c6c48\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Experimental evaluation<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-44b7056 elementor-widget elementor-widget-text-editor\" data-id=\"44b7056\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3737\">In this experiment, I will introduce a demo on how to apply <strong>AugSBERT <\/strong>with different scenarios. First, we need to import some packages<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6535028 elementor-widget elementor-widget-image\" data-id=\"6535028\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"610\" height=\"501\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-1-7.png\" class=\"attachment-large size-large wp-image-18341\" alt=\"Advance BERT Model Via Transferring Knowledge From Cross-Encoders To Bi-Encoders&quot; class=&quot;wp-image-13637\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-1-7.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-1-7-300x246.png 300w\" sizes=\"(max-width: 610px) 100vw, 610px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C2\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6a4d8dd elementor-widget elementor-widget-heading\" data-id=\"6a4d8dd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Scenario 1: Full annotated datasets (all labeled sentence-pairs)<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-51fe5bd elementor-widget elementor-widget-text-editor\" data-id=\"51fe5bd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d50b\">The main purpose of this scenario is extending the labeled dataset by the straight forward data augmentation strategies, therefore, we will prepare train, dev, test dataset on the Semantic Text Similarity dataset (<a href=\"https:\/\/sbert.net\/datasets\/stsbenchmark.tsv.gz\" target=\"_blank\" rel=\"noreferrer noopener\">link<\/a>) and define batch size, epoch, and model name (You can specify any Huggingface\/transformers pre-trained model)<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a655d12 elementor-widget elementor-widget-image\" data-id=\"a655d12\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"537\" height=\"544\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-2-5.png\" class=\"attachment-large size-large wp-image-18342\" alt=\"Advance BERT Model Via Transferring Knowledge From Cross-Encoders To Bi-Encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-2-5.png 537w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-2-5-296x300.png 296w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-2-5-75x75.png 75w\" sizes=\"(max-width: 537px) 100vw, 537px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C4\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d2e5426 elementor-widget elementor-widget-text-editor\" data-id=\"d2e5426\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Then, we will insert words by our BERT model (you can apply another argumentation technique as I mentioned in the Technique highlight section) to create a silver dataset.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2eb706c elementor-widget elementor-widget-image\" data-id=\"2eb706c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"679\" height=\"417\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-3-3.png\" class=\"attachment-large size-large wp-image-18343\" alt=\"Advance BERT Model Via Transferring Knowledge From Cross-Encoders To Bi-Encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-3-3.png 679w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-3-3-300x184.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-3-3-610x375.png 610w\" sizes=\"(max-width: 679px) 100vw, 679px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C6\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e554840 elementor-widget elementor-widget-text-editor\" data-id=\"e554840\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d5d1\">Next, we define our <strong>Bi-encoders <\/strong>with mean pooling with both(gold + silver) STS benchmark dataset<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6d4bc1e elementor-widget elementor-widget-image\" data-id=\"6d4bc1e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"584\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-4-6.png\" class=\"attachment-large size-large wp-image-18344\" alt=\"Bi-encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-4-6.png 512w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-4-6-263x300.png 263w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C7\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7973b92 elementor-widget elementor-widget-text-editor\" data-id=\"7973b92\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"da5d\">Finally, we will evaluate our model in the test STS benchmark <\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a626ae9 elementor-widget elementor-widget-image\" data-id=\"a626ae9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"506\" height=\"476\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-5-6.png\" class=\"attachment-large size-large wp-image-18345\" alt=\"Advance BERT Model Via Transferring Knowledge From Cross-Encoders To Bi-Encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-5-6.png 506w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-5-6-300x282.png 300w\" sizes=\"(max-width: 506px) 100vw, 506px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C8\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-01fb8c7 elementor-widget elementor-widget-heading\" data-id=\"01fb8c7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Scenario 2: Limited or small annotated datasets (few labeled sentence-pairs)<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0d9abb8 elementor-widget elementor-widget-text-editor\" data-id=\"0d9abb8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"b935\">In this scenario, we will use <strong>Cross-encoders <\/strong>that were trained on the limited labeled dataset (gold dataset) to soft label the in-domain unlabeled dataset (silver dataset) and train <strong>Bi-encoders <\/strong>in both datasets (silver + gold). In this simulation, I also use again STS benchmark dataset and create new pairs of sentences by pre-trained SBERT model. First, we will define <strong>Cross-encoders <\/strong>and <strong>Bi-encoders.<\/strong><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-23903f5 elementor-widget elementor-widget-image\" data-id=\"23903f5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"508\" height=\"579\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-6-5.png\" class=\"attachment-large size-large wp-image-18346\" alt=\"Limited or small annotated datasets (few labeled sentence-pairs)\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-6-5.png 508w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-6-5-263x300.png 263w\" sizes=\"(max-width: 508px) 100vw, 508px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C11\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-98f99c8 elementor-widget elementor-widget-text-editor\" data-id=\"98f99c8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3a5c\">Step 1, we will prepare train, dev, test like before and fine-tune our <strong>Cross-encoders<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c052240 elementor-widget elementor-widget-image\" data-id=\"c052240\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"511\" height=\"580\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-7-6.png\" class=\"attachment-large size-large wp-image-18347\" alt=\"we will prepare train, dev, test like before and fine-tune our Cross-encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-7-6.png 511w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-7-6-264x300.png 264w\" sizes=\"(max-width: 511px) 100vw, 511px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C12\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-97fea04 elementor-widget elementor-widget-text-editor\" data-id=\"97fea04\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"f914\">Step 2, we use our fine-tuned <strong>Cross-encoders <\/strong>to label unlabeled datasets.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-76b3653 elementor-widget elementor-widget-image\" data-id=\"76b3653\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"510\" height=\"550\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-8-5.png\" class=\"attachment-large size-large wp-image-18348\" alt=\"we use our fine-tuned Cross-encoders to label unlabeled datasets.\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-8-5.png 510w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-8-5-278x300.png 278w\" sizes=\"(max-width: 510px) 100vw, 510px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C13\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-703b366 elementor-widget elementor-widget-text-editor\" data-id=\"703b366\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"eaf2\">Step 3, we train our <strong>Bi-encoders <\/strong>in both gold and silver datasets<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-49bf5f0 elementor-widget elementor-widget-image\" data-id=\"49bf5f0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"510\" height=\"578\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-9-4.png\" class=\"attachment-large size-large wp-image-18349\" alt=\"we train our Bi-encoders in both gold and silver datasets\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-9-4.png 510w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-9-4-265x300.png 265w\" sizes=\"(max-width: 510px) 100vw, 510px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C14\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-08cf9f2 elementor-widget elementor-widget-text-editor\" data-id=\"08cf9f2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3177\">Finally, we will evaluate our model in the test STS benchmark dataset.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5f0e1ae elementor-widget elementor-widget-image\" data-id=\"5f0e1ae\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"510\" height=\"489\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-10-4.png\" class=\"attachment-large size-large wp-image-18350\" alt=\"we will evaluate our model in the test STS benchmark dataset.\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-10-4.png 510w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-10-4-300x288.png 300w\" sizes=\"(max-width: 510px) 100vw, 510px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C15\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8b43779 elementor-widget elementor-widget-heading\" data-id=\"8b43779\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Scenario 3: No annotated datasets (Only unlabeled sentence-pairs)<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-99f5ac3 elementor-widget elementor-widget-text-editor\" data-id=\"99f5ac3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"f2e0\">In this scenario, all the steps are very similar to scenario 2 but in a different domain. Because of the capability of our <strong>Cross-encoders<\/strong>, we will use a generic source dataset (STS benchmark dataset) and transfer the knowledge to a specific target dataset (<a href=\"https:\/\/sbert.net\/datasets\/quora-IR-dataset.zip\" target=\"_blank\" rel=\"noreferrer noopener\">Quora Question Pairs<\/a>)<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eba23b9 elementor-widget elementor-widget-image\" data-id=\"eba23b9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"513\" height=\"421\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-11-6.png\" class=\"attachment-large size-large wp-image-18351\" alt=\"No annotated datasets (Only unlabeled sentence-pairs)\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-11-6.png 513w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-11-6-300x246.png 300w\" sizes=\"(max-width: 513px) 100vw, 513px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C17\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-27d5bfd elementor-widget elementor-widget-text-editor\" data-id=\"27d5bfd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"cf1e\">And train our <strong>Cross-encoders.<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ee06e2f elementor-widget elementor-widget-image\" data-id=\"ee06e2f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"511\" height=\"577\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-12-4.png\" class=\"attachment-large size-large wp-image-18352\" alt=\"And train our Cross-encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-12-4.png 511w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-12-4-266x300.png 266w\" sizes=\"(max-width: 511px) 100vw, 511px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C19\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-815d013 elementor-widget elementor-widget-text-editor\" data-id=\"815d013\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d262\">Labeling Quora Question Pairs dataset (silver dataset). In this case, the task is classification so we have to convert our score to binary scores.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e358686 elementor-widget elementor-widget-image\" data-id=\"e358686\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"510\" height=\"301\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-13-5.png\" class=\"attachment-large size-large wp-image-18353\" alt=\"Labeling Quora Question Pairs dataset (silver dataset)\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-13-5.png 510w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-13-5-300x177.png 300w\" sizes=\"(max-width: 510px) 100vw, 510px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C20\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a15588d elementor-widget elementor-widget-text-editor\" data-id=\"a15588d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"4391\">Then, training our <strong>Bi-encoders<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2a5d0db elementor-widget elementor-widget-image\" data-id=\"2a5d0db\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"516\" height=\"578\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-14-4.png\" class=\"attachment-large size-large wp-image-18354\" alt=\"Then, training our Bi-encoders\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-14-4.png 516w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-14-4-268x300.png 268w\" sizes=\"(max-width: 516px) 100vw, 516px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C21\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dcfe0b8 elementor-widget elementor-widget-text-editor\" data-id=\"dcfe0b8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9a49\">Finally, evaluating on test Quora Question Pairs dataset<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c0b7830 elementor-widget elementor-widget-image\" data-id=\"c0b7830\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"680\" height=\"449\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-15-5.png\" class=\"attachment-large size-large wp-image-18355\" alt=\"evaluating on test Quora Question Pairs dataset\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-15-5.png 680w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-15-5-300x198.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Untitled-15-5-610x403.png 610w\" sizes=\"(max-width: 680px) 100vw, 680px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Hosted on <a href=\"https:\/\/jovian.ai\/?utm_source=embed\" target=\"_blank\" rel=\"noopener\">Jovian<\/a><a href=\"https:\/\/jovian.ai\/vumichien\/sbert-c80af\/v\/1?utm_source=embed#C22\" target=\"_blank\" rel=\"noopener\">\/View File<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8a04bb3 elementor-widget elementor-widget-heading\" data-id=\"8a04bb3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Final Thoughts<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f185765 elementor-widget elementor-widget-text-editor\" data-id=\"f185765\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"66f8\"><strong>AugSBERT<\/strong> is a simple and effective data augmentation to improve <strong>Bi-encoders<\/strong> for pairwise sentence scoring tasks. The idea is based on labeling new sentence pairs by using pre-trained <strong>Cross-encoders <\/strong>and combining them into the training set. Selecting the right sentence pairs for soft-labeling is crucial and necessary to improve the performance. The <strong>AugSBERT<\/strong> approach can also be used for domain adaptation, by soft-labeling data on the target domain.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dab26b4 elementor-widget elementor-widget-heading\" data-id=\"dab26b4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">References<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-05c27e0 elementor-widget elementor-widget-text-editor\" data-id=\"05c27e0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"32d1\">[1] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding.<\/p>\n\n<p id=\"d299\">[2] Thomas Wolf, Victor Sanh, Julien Chaumond, and Clement Delangue. Transfertransfo: A transfer learning approach for neural network-based conversational agents.<\/p>\n\n<p id=\"1d1f\">[3] Pierre-Emmanuel Mazare, Samuel Humeau, Martin Raison, and Antoine Bordes. Training millions of personalized dialogue agents.<\/p>\n\n<p id=\"a2c7\">[4] Nils Reimers and Iryna Gurevych. SentenceBERT: Sentence Embeddings using Siamese BERTNetworks.<\/p>\n\n<p id=\"6f72\">[5] Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring.<\/p>\n\n<p id=\"03cd\">[6] Nandan Thakur, Nils Reimers, Johannes Daxenberge, and Iryna Gurevych. Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks.<\/p>\n\n<p id=\"c97d\">[7] Giambattista Amati. BM25, Springer US, Boston, MA.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>AugSBERT is a simple and effective data augmentation to improve Bi-encoders for pairwise sentence scoring tasks. The idea is based on labeling new sentence pairs by using pre-trained Cross-encoders and combining them into the training set.<\/p>\n","protected":false},"author":1019,"featured_media":18356,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[1210,1211,1212,1213],"ppma_author":[3930],"class_list":["post-22541","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-bi-encoders","tag-cross-encoders","tag-nlp-model","tag-transferring-knowledge"],"authors":[{"term_id":3930,"user_id":1019,"is_guest":0,"slug":"chien-vu","display_name":"Chien Vu","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Chien-Vu-150x150.jpeg","user_url":"http:\/\/jobs.hybrid-technologies.vn","last_name":"Vu","first_name":"Chien","job_title":"","description":"Chien Vu Ph.D., a Material Scientist is Machine learning Engineer at Hybrid Technology, a pioneer company in offshore\/ outsourcing industry"}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22541","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1019"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22541"}],"version-history":[{"count":4,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22541\/revisions"}],"predecessor-version":[{"id":32861,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22541\/revisions\/32861"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/18356"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22541"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22541"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22541"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22541"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}