{"id":798,"date":"2018-07-16T05:14:38","date_gmt":"2018-07-16T02:14:38","guid":{"rendered":"http:\/\/kusuaks7\/?p=403"},"modified":"2021-12-15T03:11:34","modified_gmt":"2021-12-15T03:11:34","slug":"the-seven-nlp-techniques-that-will-change-how-you-communicate-in-the-future-part-1","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/the-seven-nlp-techniques-that-will-change-how-you-communicate-in-the-future-part-1\/","title":{"rendered":"The Seven NLP Techniques That Will Change How You Communicate in The Future (Part I)"},"content":{"rendered":"<p><strong><em>Ready to learn Data Science? Browse\u00a0<a href=\"https:\/\/www.experfy.com\/training\/tracks\/data-science-training-certification\">Data Science Training and Certification<\/a> courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/em><\/strong><\/p>\n<h2>WHAT IS NLP?<\/h2>\n<p id=\"yui_3_17_2_1_1531731145853_469\"><strong>Natural Language Processing<\/strong>\u00a0(NLP)\u00a0is a field at the intersection of computer science, artificial intelligence, and linguistics. The goal is for computers to process or \u201cunderstand\u201d natural language in order to perform tasks that are useful, such as Performing Tasks, Language Translation, and Question Answering. It is certainly one of the most important technologies of the information age. Understanding complex language utterances is also a crucial part of artificial intelligence.\u00a0Fully understanding and representing the meaning of language is an extremely difficult goal. Why? Because the human language is quite special.<\/p>\n<p>So what is special about human language? Well, a few things actually:<\/p>\n<ul>\n<li>A human language is a system specifically constructed to convey the speaker\/writer\u2019s meaning. It\u2019s not just an environmental signal but a deliberate communication. Besides, it uses an encoding which little kids can quickly learn and which changes.<\/li>\n<li>A human language is mostly a discrete\/symbolic\/categorical signaling system, presumably because of greater signaling reliability.<\/li>\n<li>The categorical symbols of a language can be encoded as a signal for communication in several ways: sound, gesture, writing, images etc. A human language is capable of being any of those.<\/li>\n<li>Human languages are ambiguous (unlike programming and other formal languages); thus there is a high level of complexity in representing, learning and using linguistic \/ situational \/ contextual \/ word \/ visual knowledge towards the human language.<\/li>\n<\/ul>\n<h2>WHY STUDY NLP?<\/h2>\n<p>Well, there is a fast-growing collection of useful applications derived from this field of study. They range from simple to complex. Below are a just a few of them:<\/p>\n<ul>\n<li>Spell Checking, Keyword Search, Finding Synonyms.<\/li>\n<li>Extracting information from websites such as: product price, dates, location, people or company names.<\/li>\n<li>Classifying: reading level of school texts, positive\/negative sentiment of longer documents.<\/li>\n<li>Machine Translation.<\/li>\n<li>Spoken Dialog Systems.<\/li>\n<li>Complex Question Answering.<\/li>\n<\/ul>\n<p>Indeed, these applications have been used abundantly in industry: from\u00a0<strong>search<\/strong>\u00a0(written and spoken) to online advertisement\u00a0<strong>matching<\/strong>, from automated\/assisted\u00a0<strong>translation<\/strong>\u00a0to\u00a0<strong>sentiment analysis<\/strong>\u00a0for marketing or finance\/trading, from\u00a0<strong>speech recognition<\/strong>\u00a0to\u00a0<strong>chatbots\/dialog agents<\/strong>\u00a0(automating customer support, controlling devices, ordering goods).<\/p>\n<p><img decoding=\"async\" style=\"width: 750px; height: 422px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7d41575d1f53ad7ec3cb\/1529576780908\/NLP-DeepLearning.jpg?format=750w\" alt=\"NLP-DeepLearning.jpg\" data-image=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7d41575d1f53ad7ec3cb\/1529576780908\/NLP-DeepLearning.jpg\" data-image-dimensions=\"1280x720\" data-image-focal-point=\"0.5,0.5\" data-image-id=\"5b2b7d41575d1f53ad7ec3cb\" data-image-resolution=\"750w\" data-load=\"false\" data-position-mode=\"standard\" data-src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7d41575d1f53ad7ec3cb\/1529576780908\/NLP-DeepLearning.jpg\" data-type=\"image\" \/><\/p>\n<h2>DEEP LEARNING<\/h2>\n<p>Most of these NLP technologies are powered by\u00a0<strong>Deep Learning<\/strong>\u00a0&#8211; a subfield of machine learning. Deep Learning only started to gain momentum since the beginning of this decade, mainly due to these circumstances:<\/p>\n<ul>\n<li>Large amount of training data.<\/li>\n<li>Faster machines and multicore CPU\/GPUs.<\/li>\n<li>New models and algorithms with advanced capabilities and improved performance:<\/li>\n<li>More flexible learning of intermediate representations.<\/li>\n<li>Effective end-to-end joint system learning.<\/li>\n<li>Effective learning methods for using contexts and transferring between tasks.<\/li>\n<li>Better regularization and optimization methods.<\/li>\n<\/ul>\n<p>While most machine learning methods work well because of human-designed representations and input features, along with weight optimization to best make a final prediction. On the other hand, in deep learning, representation learning attempts to automatically learn good features or representations from the raw inputs. Manually designed features in machine learning are often over-specified, incomplete and take a long time to design and validate. In contrast, deep learning\u2019s learned features are easy to adapt and fast to learn.<\/p>\n<p>Deep Learning provides a very flexible, universal, learnable framework for representing the world, both in terms of visual and linguistic information. Initially, it made the breakthrough on fields such as speech recognition and computer vision.\u00a0Recently, deep learning approaches have obtained very high performance across many different NLP tasks. These models can often be trained with a single end-to-end model and do not require traditional, task-specific feature engineering.<\/p>\n<p id=\"yui_3_17_2_1_1531731145853_465\">I recently finished Stanford\u2019s comprehensive\u00a0<a href=\"http:\/\/web.stanford.edu\/class\/cs224n\/\" target=\"_blank\" rel=\"noopener noreferrer\">CS224n course on Natural Language Processing with Deep Learning<\/a>. The course provides a thorough introduction to cutting-edge research in deep learning applied to NLP. On the model side, it covers word vector representations, window-based neural networks, recurrent neural networks, long-short-term-memory models, recursive neural networks, convolutional neural networks as well as some recent models involving a memory component. On the programming side, I learned to implement, train, debug, visualize and invent my own neural network models. In this 2-part series, I want to share the 7 major NLP techniques that I have learned as well as major deep learning models and applications using each of them.<\/p>\n<h2>TECHNIQUE 1: TEXT EMBEDDINGS<\/h2>\n<p>In traditional NLP, we regard words as discrete symbols, which then can be represented by one-hot vectors. A vector dimension is simply the number of words in the vocabulary. The problem with words as discrete symbols is that there is no natural notion of similarity for one-hot vectors. Thus the alternative is to learn to encode similarity in the vectors themselves.\u00a0The core idea is that\u00a0<strong>a word\u2019s meaning is given by the words that frequently appear close-by.\u00a0<\/strong><\/p>\n<p><strong>Text Embeddings<\/strong>\u00a0are simply vectors or a more generically, real-valued representations of strings. Simply, we build a dense vector for each word, chosen so that it is similar to vectors of words that appear in similar contexts. Word embeddings are considered a great starting point for most deep NLP tasks. They allow deep learning to be effective on smaller datasets, as they are often the first inputs to a deep learning architecture and the most popular way of transfer learning in NLP.\u00a0The most popular names in word embeddings are\u00a0<strong>Word2vec<\/strong>\u00a0by Google (Mikolov) and\u00a0<strong>GloVe<\/strong>\u00a0by Stanford (Pennington, Socher, and Manning). Let\u2019s delve deeper into these word representations:<\/p>\n<p><img decoding=\"async\" style=\"width: 750px; height: 263px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7daa0e2e72f37d5dd3a3\/1529576890606\/word2vec.png?format=750w\" alt=\"word2vec.png\" data-image=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7daa0e2e72f37d5dd3a3\/1529576890606\/word2vec.png\" data-image-dimensions=\"1505x527\" data-image-focal-point=\"0.5,0.5\" data-image-id=\"5b2b7daa0e2e72f37d5dd3a3\" data-image-resolution=\"750w\" data-load=\"false\" data-position-mode=\"standard\" data-src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7daa0e2e72f37d5dd3a3\/1529576890606\/word2vec.png\" data-type=\"image\" \/><\/p>\n<p>In\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1301.3781v3.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Word2vec<\/a>, we have a large corpus of text in which every word in a fixed vocabulary is represented by a vector. We then go through each position t in the text, which has a center word c and context words o. Next, we use the similarity of the word vectors for c and o to calculate the probability of o given c (or vice versa). We keep adjusting the word vectors to maximize this probability.<\/p>\n<p>For efficient training of Word2vec, we can eliminate meaningless (or higher frequency) word from dataset (such as a, the, of, then\u2026). This helps improve model accuracy and training time. Additionally, we can use negative sampling for every input by updating the weights for all the correct labels, but only on a small number of incorrect labels.<\/p>\n<p>Word2vec has 2 model variants worth being mentioned:<\/p>\n<ol>\n<li><strong>Skip-Gram<\/strong>: We consider a context window containing k consecutive terms. Then we skip one of these words and try to learn a neural network that gets all terms except the one skipped and predicts the skipped term. Therefore, if 2 words repeatedly share similar contexts in a large corpus, the embedding vectors of those terms will have close vectors.<\/li>\n<li><strong>Continuous Bag of Words<\/strong>: We take lots and lots of sentences in a large corpus. Every time we see a word, we take the surrounding word. Then we input the context words to a neural network and predict the word in the center of this context.\u00a0When we have thousands of such context words and the center word, we have one instance of a dataset for the neural network.\u00a0We train the neural network and finally, the encoded hidden layer output represents the embedding for a particular word. It so happens that when we train this over a large number of sentences, words in similar context get similar vectors.<\/li>\n<\/ol>\n<p><img decoding=\"async\" style=\"width: 750px; height: 354px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7ddf03ce64228cc2d8f5\/1529576937076\/cbow-vs-skipgram.png?format=750w\" alt=\"cbow-vs-skipgram.png\" data-image=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7ddf03ce64228cc2d8f5\/1529576937076\/cbow-vs-skipgram.png\" data-image-dimensions=\"1587x749\" data-image-focal-point=\"0.5,0.5\" data-image-id=\"5b2b7ddf03ce64228cc2d8f5\" data-image-resolution=\"750w\" data-load=\"false\" data-position-mode=\"standard\" data-src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7ddf03ce64228cc2d8f5\/1529576937076\/cbow-vs-skipgram.png\" data-type=\"image\" \/><\/p>\n<p>One grievance with both Skip-Gram and CBOW is such that they are both window-based models, meaning the co-occurrence statistics of the corpus are not used efficiently, resulting in suboptimal embeddings. The\u00a0<a href=\"https:\/\/nlp.stanford.edu\/pubs\/glove.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">GloVe<\/a>\u00a0model seeks to solve this problem by capturing the meaning of one word embedding with the structure of the whole observed corpus. To do so, the model trains on global co-occurrence counts of words and makes a sufficient use of statistics by minimizing least-squares error and, as a result, producing a word vector space with meaningful substructure. Such an outline sufficiently preserves words\u2019 similarities with vector distance.<\/p>\n<p><img decoding=\"async\" style=\"width: 750px; height: 584px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7e02f950b7ec244f078c\/1529576977987\/glove.jpg?format=750w\" alt=\"glove.jpg\" data-image=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7e02f950b7ec244f078c\/1529576977987\/glove.jpg\" data-image-dimensions=\"1000x779\" data-image-focal-point=\"0.5,0.5\" data-image-id=\"5b2b7e02f950b7ec244f078c\" data-image-resolution=\"750w\" data-load=\"false\" data-position-mode=\"standard\" data-src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7e02f950b7ec244f078c\/1529576977987\/glove.jpg\" data-type=\"image\" \/><\/p>\n<p>Besides these 2 text embeddings, there are many more advanced models developed recently, including\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1607.04606.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">FastText<\/a>,\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1705.08039.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Poincare Embeddings<\/a>,\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1511.06388.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">sense2vec<\/a>,\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1506.06726.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Skip-Thought<\/a>,\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1502.07257.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Adaptive Skip-Gram<\/a>. I highly encourage you to check them out.<\/p>\n<h2>TECHNIQUE 2: MACHINE TRANSLATION<\/h2>\n<p><strong>Machine Translation<\/strong>\u00a0is the classic test of language understanding. It consists of both language analysis and language generation. Big machine translation systems have huge commercial use, as global language is a\u00a0$40 Billion-per-year industry. To give you some notable examples:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.hindustantimes.com\/tech\/10-year-old-google-translate-goes-through-100-billion-words-a-day\/story-XiIzd1U1Y49TFKI0zAtAML.html\" target=\"_blank\" rel=\"noopener noreferrer\">Google Translate<\/a>\u00a0goes through 100 billion words per day.<\/li>\n<li><a href=\"https:\/\/code.facebook.com\/posts\/289921871474277\/transitioning-entirely-to-neural-machine-translation\/\" target=\"_blank\" rel=\"noopener noreferrer\">Facebook<\/a>\u00a0uses machine translation to translate text in posts and comments automatically, in order to break language barriers and allow people around the world to communicate with each other.<\/li>\n<li><a href=\"https:\/\/www.ebayinc.com\/stories\/news\/ebays-machine-translation-technology-breaks-down-borders\/\" target=\"_blank\" rel=\"noopener noreferrer\">eBay<\/a>\u00a0uses Machine Translation tech to enable cross-border trade and connect buyers and sellers around the world.<\/li>\n<li><a href=\"https:\/\/blogs.msdn.microsoft.com\/translation\/2018\/04\/18\/microsoft-brings-ai-powered-translation-to-end-users-and-developers-whether-youre-online-or-offline\/\" target=\"_blank\" rel=\"noopener noreferrer\">Microsoft<\/a>\u00a0brings AI-powered translation to end users and developers on Android, iOS, and Amazon Fire, whether or not they have access to the Internet.<\/li>\n<li><a href=\"https:\/\/globenewswire.com\/news-release\/2016\/10\/17\/880052\/10165647\/en\/SYSTRAN-1st-software-provider-to-launch-a-Neural-Machine-Translation-engine-in-more-than-30-languages.html\" target=\"_blank\" rel=\"noopener noreferrer\">Systran<\/a>\u00a0became the 1st software provider to launch a Neural Machine Translation engine in more than 30 languages back in 2016.<\/li>\n<\/ul>\n<p>In a traditional Machine Translation system, we have to use parallel corpus \u2014 a collection of texts, each of which is translated into one or more other languages than the original. For example, given the source language f (e.g. French) and the target language e (e.g. English), we need to build multiple statistical models, including a probabilistic formulation using the Bayesian rule, a translation model p(f|e) trained on the parallel corpus, and a language model p(e) trained on English-only corpus. Needless to say, this approach skips hundreds of important details, requires a lot of human feature engineering, consists of many different &amp; independent machine learning problems, and overall is a very complex system.<\/p>\n<p><strong>Neural Machine Translation<\/strong>\u00a0is the approach of modeling this entire process via one big artificial neural network, known as a\u00a0<a href=\"http:\/\/karpathy.github.io\/2015\/05\/21\/rnn-effectiveness\/\" target=\"_blank\" rel=\"noopener noreferrer\">Recurrent Neural Network<\/a>\u00a0(RNN). RNN is a stateful neural network, in which it has connections between passes, connections through time.\u00a0Neurons are fed information not just from the previous layer but also from themselves from the previous pass. This means that the order in which we feed the input and train the network matters: feeding it \u201cDonald\u201d and then \u201cTrump\u201d may yield different results compared to feeding it \u201cTrump\u201d and then \u201cDonald\u201d.<\/p>\n<p><img decoding=\"async\" style=\"width: 750px; height: 389px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7e2c6d2a73a146f99b33\/1529577023087\/Neural-Machine-Translation.png?format=750w\" alt=\"Neural-Machine-Translation.png\" data-image=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7e2c6d2a73a146f99b33\/1529577023087\/Neural-Machine-Translation.png\" data-image-dimensions=\"2500x1296\" data-image-focal-point=\"0.5,0.5\" data-image-id=\"5b2b7e2c6d2a73a146f99b33\" data-image-resolution=\"750w\" data-load=\"false\" data-position-mode=\"standard\" data-src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7e2c6d2a73a146f99b33\/1529577023087\/Neural-Machine-Translation.png\" data-type=\"image\" \/><\/p>\n<p>Standard Neural Machine Translation is an end-to-end neural network where the source sentence is encoded by a RNN called\u00a0<strong>encoder<\/strong>\u00a0and the target words are predicted using another RNN known as\u00a0<strong>decoder<\/strong>.\u00a0The RNN Encoder reads a source sentence one symbol at a time, and then summarizes the entire source sentence in its last hidden state. The RNN Decoder uses back-propagation to learn this summary and returns the translated version. It is amazing that Neural Machine Translation went from a fringe research activity in 2014 to the widely adopted leading way to do Machine Translation in 2016. So what are the big wins of using Neural Machine Translation?<\/p>\n<ol>\n<li><strong>End-to-end training<\/strong>: All parameters in NMT are simultaneously optimized to minimize a loss function on the network\u2019s output.<\/li>\n<li><strong>Distributed representations share strength<\/strong>: NMT has a better exploitation of word and phrase similarities.<\/li>\n<li><strong>Better exploration of context<\/strong>: NMT can use a much bigger context &#8211; both source and partial target text &#8211; to translate more accurately.<\/li>\n<li><strong>More fluent text generation<\/strong>: Deep learning text generation is of much higher quality than the parallel corpus way.<\/li>\n<\/ol>\n<p>One big problem with RNNs is the vanishing (or exploding) gradient problem where, depending on the activation functions used, information rapidly gets lost over time. Intuitively, this wouldn\u2019t be much of a problem because these are just weights and not neuron states, but the weights through time is actually where the information from the past is stored; if the weight reaches a value of 0 or 1,000,000, the previous state won\u2019t be very informative.<\/p>\n<p><a href=\"http:\/\/www.bioinf.jku.at\/publications\/older\/2604.pdf\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Long \/ short term memory<\/strong><\/a><strong>\u00a0(LSTM)<\/strong>\u00a0networks try to combat the vanishing \/ exploding gradient problem by introducing gates and an explicitly defined memory cell. Each neuron has a memory cell and three gates: input, output and forget. The function of these gates is to safeguard the information by stopping or allowing the flow of it.<\/p>\n<ul>\n<li>The input gate determines how much of the information from the previous layer gets stored in the cell.<\/li>\n<li>The output layer takes the job on the other end and determines how much of the next layer gets to know about the state of this cell.<\/li>\n<li>The forget gate seems like an odd inclusion at first but sometimes it\u2019s good to forget: if it\u2019s learning a book and a new chapter begins, it may be necessary for the network to forget some characters from the previous chapter.<\/li>\n<\/ul>\n<p>LSTMs have been shown to be able to learn complex sequences, such as writing like Shakespeare or composing primitive music. Note that each of these gates has a weight to a cell in the previous neuron, so they typically require more resources to run.\u00a0LSTMs are currently very hip and have been used a lot in machine translation. Besides that, It is the default model for most sequence labeling tasks, which have lots and lots of data.<\/p>\n<p><img decoding=\"async\" style=\"width: 750px; height: 422px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7e5c2b6a28718c493790\/1529577067232\/Long-Short-Term-Memory.jpg?format=750w\" alt=\"Long-Short-Term-Memory.jpg\" data-image=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7e5c2b6a28718c493790\/1529577067232\/Long-Short-Term-Memory.jpg\" data-image-dimensions=\"1280x720\" data-image-focal-point=\"0.5,0.5\" data-image-id=\"5b2b7e5c2b6a28718c493790\" data-image-resolution=\"750w\" data-load=\"false\" data-position-mode=\"standard\" data-src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7e5c2b6a28718c493790\/1529577067232\/Long-Short-Term-Memory.jpg\" data-type=\"image\" \/><\/p>\n<p><a href=\"https:\/\/arxiv.org\/pdf\/1412.3555v1.pdf\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Gated recurrent units<\/strong><\/a><strong>\u00a0(GRU)\u00a0<\/strong>are a slight variation on LSTMs and are also extensions of Neural Machine Translation. They have one less gate and are wired slightly differently: instead of an input, output and a forget gate, they have an update gate. This update gate determines both how much information to keep from the last state and how much information to let in from the previous layer. The reset gate functions much like the forget gate of an LSTM but it\u2019s located slightly differently. They always send out their full state, they don\u2019t have an output gate. In most cases, they function very similarly to LSTMs, with the biggest difference being that GRUs are slightly faster and easier to run (but also slightly less expressive). In practice, these tend to cancel each other out, as you need a bigger network to regain some expressiveness which then, in turn, cancels out the performance benefits. In some cases where the extra expressiveness is not needed, GRUs can outperform LSTMs.<\/p>\n<p><img decoding=\"async\" style=\"width: 750px; height: 334px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7e8e575d1f53ad7ee2fd\/1529577113683\/Gated-Recurrent-Units.png?format=750w\" alt=\"Gated-Recurrent-Units.png\" data-image=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7e8e575d1f53ad7ee2fd\/1529577113683\/Gated-Recurrent-Units.png\" data-image-dimensions=\"2000x890\" data-image-focal-point=\"0.5,0.5\" data-image-id=\"5b2b7e8e575d1f53ad7ee2fd\" data-image-resolution=\"750w\" data-load=\"false\" data-position-mode=\"standard\" data-src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7e8e575d1f53ad7ee2fd\/1529577113683\/Gated-Recurrent-Units.png\" data-type=\"image\" \/><\/p>\n<p>Besides these 3 major architecture, there have been further improvements in neural machine translation system over the past few years. Below are the most notable developments:<\/p>\n<ul>\n<li><a href=\"https:\/\/arxiv.org\/pdf\/1409.3215v3.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Sequence to Sequence Learning with Neural Networks<\/a>\u00a0proved the effectiveness of LSTM for Neural Machine Translation. It presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. The method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.<\/li>\n<li><a href=\"https:\/\/arxiv.org\/pdf\/1409.0473v6.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Neural Machine Translation by Jointly Learning to Align and Translate<\/a>\u00a0introduced the attention mechanism in NLP (which will be covered in the next post). Acknowledging that the use of a fixed-length vector is a bottleneck in improving the performance of NMT, the authors propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.<\/li>\n<li><a href=\"https:\/\/ufal.mff.cuni.cz\/pbml\/108\/art-dakwale-monz.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Convolutional over Recurrent Encoder for Neural Machine Translation<\/a>\u00a0augments the standard RNN encoder in NMT with additional convolutional layers in order to capture wider context in the encoder output.<\/li>\n<li>Google built its own NMT system, called\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1609.08144.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Google\u2019s Neural Machine Translation<\/a>, which addresses many issues in accuracy and ease of deployment. The model consists of a deep LSTM network with 8 encoder and 8 decoder layers using residual connections as well as attention connections from the decoder network to the encoder.<\/li>\n<li>Instead of using Recurrent Neural Networks, Facebook AI Researchers uses\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1705.03122.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">convolutional neural networks<\/a>\u00a0for sequence to sequence learning tasks in NMT.<\/li>\n<\/ul>\n<h2>TECHNIQUE 3: DIALOGUE AND CONVERSATIONS<\/h2>\n<p>A lot has been written about conversational AI, and a majority of it focuses on vertical chatbots, messenger platforms, business trends, and startup opportunities (think Amazon Alexa, Apple Siri, Facebook M, Google Assistant, Microsoft Cortana). AI\u2019s capability of understanding natural language is still limited. As a result, creating fully-automated, open-domain conversational assistants has remained an open challenge. Nonetheless, the work shown below serve as great starting points for people who want to seek the next breakthrough in conversation AI.<\/p>\n<p><img decoding=\"async\" style=\"width: 750px; height: 321px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7eaa1ae6cfe5590cdce1\/1529577147587\/chatbots.png?format=750w\" alt=\"chatbots.png\" data-image=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7eaa1ae6cfe5590cdce1\/1529577147587\/chatbots.png\" data-image-dimensions=\"1200x513\" data-image-focal-point=\"0.5,0.5\" data-image-id=\"5b2b7eaa1ae6cfe5590cdce1\" data-image-resolution=\"750w\" data-load=\"false\" data-position-mode=\"standard\" data-src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7eaa1ae6cfe5590cdce1\/1529577147587\/chatbots.png\" data-type=\"image\" \/><\/p>\n<p>Researchers from Montreal, Georgia Tech, Microsoft and Facebook built a\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1506.06714v1.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">neural network that is capable of generating context-sensitive conversational responses<\/a>. This novel response generation system can be trained end-to-end on large quantities of unstructured Twitter conversations. A Recurrent Neural Network architecture is used to address sparsity issues that arise when integrating contextual information into classic statistical models, allowing the system to take into account previous dialog utterances. The model shows consistent gains over both context-sensitive and non-context-sensitive Machine Translation and Information Retrieval baselines.<\/p>\n<p>Developed in Hong Kong,\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1503.02364v2.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Neural Responding Machine<\/a>\u00a0(NRM) is a neural-network-based response generator for short-text conversation. It takes the general encoder-decoder framework. First, it formalizes the generation of response as a decoding process based on the latent representation of the input text, while both encoding and decoding are realized with Recurrent Neural Networks. The NRM is trained with a large amount of one-round conversation data collected from a microblogging service. Empirical study shows that NRM can generate grammatically correct and content-wise appropriate responses to over 75% of the input text, outperforming state-of-the-arts in the same setting.<\/p>\n<p><img decoding=\"async\" style=\"width: 750px; height: 348px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7ec81ae6cfe5590cdffe\/1529577166563\/seq-to-seq.png?format=750w\" alt=\"seq-to-seq.png\" data-image=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7ec81ae6cfe5590cdffe\/1529577166563\/seq-to-seq.png\" data-image-dimensions=\"1329x616\" data-image-focal-point=\"0.5,0.5\" data-image-id=\"5b2b7ec81ae6cfe5590cdffe\" data-image-resolution=\"750w\" data-load=\"false\" data-position-mode=\"standard\" data-src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5b2b7ec81ae6cfe5590cdffe\/1529577166563\/seq-to-seq.png\" data-type=\"image\" \/><\/p>\n<p>Last but not least, Google\u2019s\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1506.05869v3.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Neural Conversational Model<\/a>\u00a0is a simple approach to conversational modeling. It uses the sequence-to-sequence framework. The model converses by predicting the next sentence given the previous sentence(s) in a conversation. The strength of the model is such it can be trained end-to-end and thus requires much fewer hand-crafted rules. The model can generate simple conversations given a large conversational training dataset. It is able to extract knowledge from both a domain-specific dataset, and from a large, noisy, and general domain dataset of movie subtitles. On a domain-specific IT help-desk dataset, the model can find a solution to a technical problem via conversations. On a noisy open-domain movie transcript dataset, the model can perform simple forms of common sense reasoning.<\/p>\n<p>That\u2019s the end of Part I. In the next post, I\u2019ll go over the remaining 4 Natural Language Processing techniques as well as discuss important limits and extensions in the field. Stay tuned!<\/p>\n<p><em>If you enjoyed this piece, I\u2019d love it if you can share it over social media<\/em>\u00a0<em>so others might stumble upon it.\u00a0You can sign up for my newsletter in the footer section below to receive my newest articles once a week.<\/em><\/p>\n<footer>Source:\u00a0<a href=\"https:\/\/heartbeat.fritz.ai\/the-7-nlp-techniques-that-will-change-how-you-communicate-in-the-future-part-i-f0114b2f0497\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/heartbeat.fritz.ai\/the-7-nlp-techniques-that-will-change-how-you-comm<\/a><\/p>\n<\/footer>\n","protected":false},"excerpt":{"rendered":"<p>Natural Language Processing&nbsp;(NLP)&nbsp;is a field at the intersection of computer science, artificial intelligence, and linguistics. The goal is for computers to process or &ldquo;understand&rdquo; natural language in order to perform tasks that are useful, such as Performing Tasks, Language Translation, and Question Answering. It is certainly one of the most important technologies of the information age. Understanding complex language utterances is also a crucial part of artificial intelligence.&nbsp;&nbsp;This 2-part series shares the 7 major NLP techniques as well as major deep learning models and applications using each of them.<\/p>\n","protected":false},"author":86,"featured_media":4231,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187],"tags":[94],"ppma_author":[1842],"class_list":["post-798","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":1842,"user_id":86,"is_guest":0,"slug":"james-le","display_name":"James Le","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Le","first_name":"James","job_title":"","description":"James Le is a Software Developer with experiences in Product Management and Data Analytics. He played a pivotal role in the operation of a start-up organization at Denison University."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/798","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/86"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=798"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/798\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/4231"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=798"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=798"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=798"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=798"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}