{"id":22583,"date":"2021-02-01T11:43:00","date_gmt":"2021-02-01T11:43:00","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/implement-expectation-maximization-algorithm-python\/"},"modified":"2023-09-05T11:45:51","modified_gmt":"2023-09-05T11:45:51","slug":"implement-expectation-maximization-algorithm-python","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/implement-expectation-maximization-algorithm-python\/","title":{"rendered":"Implement Expectation-Maximization Algorithm(EM) in Python from Scratch"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22583\" class=\"elementor elementor-22583\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-81dcba3 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"81dcba3\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a6e7f94\" data-id=\"a6e7f94\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-862d990 elementor-widget elementor-widget-text-editor\" data-id=\"862d990\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"has-medium-font-size\"><strong>Unsupervised and Semi-supervised&nbsp;<\/strong>Gaussian Mixture Models (GMM)<\/p>\n<p id=\"0456\">When companies launch a new product, they usually want to find out the target customers. If they have data on customers\u2019 purchasing history and shopping preferences, they can utilize it to predict what types of customers are more likely to purchase the new product. There are many models to solve this typical unsupervised learning problem and the Gaussian Mixture Model (GMM) is one of them.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3bae7aa elementor-widget elementor-widget-heading\" data-id=\"3bae7aa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">GMM and EM<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7d27def elementor-widget elementor-widget-text-editor\" data-id=\"7d27def\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"6aac\">GMMs are probabilistic models that&nbsp;assume all the data points are generated from a mixture of several Gaussian distributions with unknown parameters. They differ from k-means clustering in that GMMs incorporate information about the center(mean) and variability(variance) of each clusters and provide posterior probabilities.<\/p>\n<p id=\"cf1f\">In the example mentioned earlier, we have 2 clusters: people who like the product and people who don\u2019t. If we know which cluster each customer belongs to (the labels), we can easily estimate the parameters(mean and variance) of the clusters, or if we know the parameters for both clusters, we can predict the labels. Unfortunately, we don\u2019t know either one. To solve this chicken and egg problem, the Expectation-Maximization Algorithm (EM) comes in handy.<\/p>\n<p id=\"64da\">EM is an iterative algorithm to find the maximum likelihood when there are latent variables. The algorithm iterates between performing an expectation (E) step, which creates a heuristic of the posterior distribution and the log-likelihood using the current estimate for the parameters, and a maximization (M) step, which computes parameters by maximizing the expected log-likelihood from the E step. The parameter-estimates from M step are then used in the next E step. In the following sections, we will delve into the math behind EM, and implement it in Python from scratch.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bf54270 elementor-widget elementor-widget-heading\" data-id=\"bf54270\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Mathematical Deduction<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b29ecea elementor-widget elementor-widget-text-editor\" data-id=\"b29ecea\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"43b3\">W define the known variables as&nbsp;<em>x<\/em>, and the unknown label as&nbsp;<em>y<\/em>. We make two assumptions: the prior distribution&nbsp;<em>p(y)<\/em>&nbsp;is binomial and&nbsp;<em>p(x|y)&nbsp;<\/em>in each cluster is a Gaussian .<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e82f2e9 elementor-widget elementor-widget-image\" data-id=\"e82f2e9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"479\" height=\"259\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Tb_0l7Qey4Mxtr5jJeoTRg.png\" class=\"attachment-large size-large wp-image-18528\" alt=\"variables as x, and the unknown label as y\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Tb_0l7Qey4Mxtr5jJeoTRg.png 479w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Tb_0l7Qey4Mxtr5jJeoTRg-300x162.png 300w\" sizes=\"(max-width: 479px) 100vw, 479px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6de145c elementor-widget elementor-widget-text-editor\" data-id=\"6de145c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"18c9\">All parameters are randomly initialized. For simplicity, we use \u03b8 to represent all parameters in the following equations.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0293305 elementor-widget elementor-widget-image\" data-id=\"0293305\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"231\" height=\"57\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_N2802zrpbrLWvbMPimp6w.png\" class=\"attachment-large size-large wp-image-18529\" alt=\"\u03b8 to represent all parameters\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4db01a4 elementor-widget elementor-widget-text-editor\" data-id=\"4db01a4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"5e26\">At the expectation (E) step, we calculate the heuristics of the posteriors. We call them heuristics because they are calculated with guessed parameters \u03b8.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7e25553 elementor-widget elementor-widget-image\" data-id=\"7e25553\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"802\" height=\"332\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1jlyIeAGxl57EPu78hceWzQ.png\" class=\"attachment-large size-large wp-image-18530\" alt=\"E step, we calculate the heuristics of the posteriors.\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1jlyIeAGxl57EPu78hceWzQ.png 802w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1jlyIeAGxl57EPu78hceWzQ-300x124.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1jlyIeAGxl57EPu78hceWzQ-768x318.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1jlyIeAGxl57EPu78hceWzQ-610x253.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1jlyIeAGxl57EPu78hceWzQ-750x310.png 750w\" sizes=\"(max-width: 802px) 100vw, 802px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4ddef58 elementor-widget elementor-widget-text-editor\" data-id=\"4ddef58\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"2a9b\">At the maximization (M) step, we find the maximizers of the log-likelihood and use them to update \u03b8. Notice that the summation inside the logarithm in equation (3) makes the computational complexity NP-hard. To move the summation out of the logarithm, we use Jensen\u2019s inequality to find the evidence lower bound (ELBO) which is tight only when Q(y|x) = P(y|x). If you are interested in the math details from equation (3) to equation (5),&nbsp;<a href=\"https:\/\/jonathan-hui.medium.com\/machine-learning-expectation-maximization-algorithm-em-2e954cb76959\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">this article<\/a>&nbsp;has decent explanation.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4ff0428 elementor-widget elementor-widget-image\" data-id=\"4ff0428\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"712\" height=\"393\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Mw4iw3N6d6XeI-r9u35xJg.png\" class=\"attachment-large size-large wp-image-18531\" alt=\"Implement Expectation-Maximization Algorithm(EM) in Python from Scratch\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Mw4iw3N6d6XeI-r9u35xJg.png 712w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Mw4iw3N6d6XeI-r9u35xJg-300x166.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1Mw4iw3N6d6XeI-r9u35xJg-610x337.png 610w\" sizes=\"(max-width: 712px) 100vw, 712px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-265adb9 elementor-widget elementor-widget-text-editor\" data-id=\"265adb9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"f10b\">Luckily, there are closed-form solutions for the maximizers in GMM.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-83459e5 elementor-widget elementor-widget-image\" data-id=\"83459e5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"702\" height=\"305\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/13P4apeUNyhxbAvrg3cfiug.png\" class=\"attachment-large size-large wp-image-18532\" alt=\"closed-form solutions for the maximizers in GMM.\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/13P4apeUNyhxbAvrg3cfiug.png 702w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/13P4apeUNyhxbAvrg3cfiug-300x130.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/13P4apeUNyhxbAvrg3cfiug-610x265.png 610w\" sizes=\"(max-width: 702px) 100vw, 702px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-14fb1e7 elementor-widget elementor-widget-text-editor\" data-id=\"14fb1e7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"2baf\">We use these updated parameters in the next iteration of E step, get the new heuristics and run M-step. What the EM algorithm does is repeat these two steps until the average log-likelihood converges.<\/p>\n<p id=\"51f9\">Before jumping into the code, let\u2019s compare the above parameter solutions from EM to the direct parameter estimates when the labels are known. Did you find they are very similar? In fact, the only difference is that the EM solutions use the heuristics of posteriors<em>&nbsp;Q<\/em>&nbsp;while the direct estimates use the true labels.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6ef6540 elementor-widget elementor-widget-image\" data-id=\"6ef6540\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"752\" height=\"345\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1h5xsZpxYn0VVdc9HYOk0Jg.png\" class=\"attachment-large size-large wp-image-18533\" alt=\"EM solutions use the heuristics of posteriors Q\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1h5xsZpxYn0VVdc9HYOk0Jg.png 752w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1h5xsZpxYn0VVdc9HYOk0Jg-300x138.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1h5xsZpxYn0VVdc9HYOk0Jg-610x280.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1h5xsZpxYn0VVdc9HYOk0Jg-750x344.png 750w\" sizes=\"(max-width: 752px) 100vw, 752px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9e113cf elementor-widget elementor-widget-heading\" data-id=\"9e113cf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Python Implementation<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e5fc74f elementor-widget elementor-widget-text-editor\" data-id=\"e5fc74f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"2207\">There are many packages including scikit-learn that offer high-level APIs to train GMMs with EM. In this section, I will demonstrate how to implement the algorithm from scratch to solve both unsupervised and semi-supervised problems. The complete code can be find&nbsp;<a href=\"https:\/\/github.com\/VXU1230\/Medium-Tutorials\/tree\/master\/em\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5477742 elementor-widget elementor-widget-heading\" data-id=\"5477742\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">1. Unsupervised GMM<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1a9aa69 elementor-widget elementor-widget-text-editor\" data-id=\"1a9aa69\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"f383\">Let\u2019s stick with the new product example. Using the known personal data, we have engineered 2 features&nbsp;<em>x1, x2&nbsp;<\/em>represented by a matrix<em>&nbsp;x<\/em>, and our goal is to forecast whether each customer will like the product (y=1) or not (y=0).https:\/\/towardsdatascience.com\/media\/de0e60977c7b85502cad24a0abe46e8d<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-009eda0 elementor-widget elementor-widget-text-editor\" data-id=\"009eda0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<figure class=\"wp-block-embed is-type-rich is-provider-embed-handler wp-block-embed-embed-handler\"><div class=\"wp-block-embed__wrapper\">\nhttps:\/\/gist.github.com\/VXU1230\/7333b0bf40256e1612d48af5080afea5\n<\/div><\/figure>\n<p id=\"c7f2\">First we initialize all the unknown parameters.<code>get_random_psd()<\/code>&nbsp;ensures the random initialization of the covariance matrices is positive semi-definite.<\/p>\n\n<p>https:\/\/gist.github.com\/VXU1230\/426921e27f0dfa6fa76aeaa95faec54f#file-initialize_random_params-py<\/p>\n<p id=\"5c43\">Then we pass the initialized parameters to&nbsp;<code>e_step()<\/code>and calculate the heuristics&nbsp;<em>Q(y=1|x) and Q(y=0|x)<\/em>&nbsp;for every data point as well as the average log-likelihoods which we will maximize in the M step.<\/p>\n\n<p>https:\/\/gist.github.com\/VXU1230\/dca3f37734759c278eaeb29a92020d80#file-e_step-py<\/p>\n<p id=\"ff22\">In&nbsp;<code>m_step()<\/code>&nbsp;, the parameters are updated using the closed-form solutions in equation(7) ~ (11). Note that if there weren\u2019t closed-form solutions, we would need to solve the optimization problem using gradient ascent and find the parameter estimates.<\/p>\n\n<p>https:\/\/gist.github.com\/VXU1230\/3f6db97cce7a9c6969b83fa9c1c7d712#file-m_step-py<\/p>\n<p id=\"0a04\">Now we can repeat running the two steps until the average log-likelihood converges.&nbsp;<code>rum_em()<\/code>&nbsp;returns the predicted labels, the posteriors and average log-likelihoods from all training steps.<\/p>\n\n<p>https:\/\/gist.github.com\/VXU1230\/c86fc58a1d3043a6fde6f48d6ccb1f9a#file-run_em-py<\/p>\n<p id=\"6e95\">Running the unsupervised model , we see the average log-likelihoods converged in over 30 steps.<\/p>\n\n<p>https:\/\/gist.github.com\/VXU1230\/ebf0771d2f9ef4635ad21eb1688dcd34#file-unsupervised-py<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-30d065d elementor-widget elementor-widget-image\" data-id=\"30d065d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1ugKILZ0sh2_FkSUc0cebRA.png\" class=\"attachment-large size-large wp-image-18534\" alt=\"Implement Expectation-Maximization Algorithm(EM) in Python from Scratch\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1ugKILZ0sh2_FkSUc0cebRA.png 640w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1ugKILZ0sh2_FkSUc0cebRA-300x225.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1ugKILZ0sh2_FkSUc0cebRA-610x458.png 610w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-353a7fa elementor-widget elementor-widget-heading\" data-id=\"353a7fa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">2. Semi-supervised GMM<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-58001b1 elementor-widget elementor-widget-text-editor\" data-id=\"58001b1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"b9ad\">In some cases, we have a small amount of labeled data. For example, we might know some customers\u2019 preferences from surveys. Considering the potential customer base is huge, the amount of labeled data we have is insufficient for full supervised learning, yet we can learn the initial parameters from the data in a semi-supervised way.<\/p>\n<p id=\"e2e8\">We use the same unlabeled data as before, but we also have some labeled data this time.<\/p><p>https:\/\/gist.github.com\/VXU1230\/9b6648f5202522c6ef2756e6ede74ab2#file-load_labeled_data-py<\/p>\n<p id=\"0630\">In&nbsp;<code>learn_params()<\/code>&nbsp;, we learn the initial parameters from the labeled data by implementing equation (12) ~(16). These learned parameters are used in the first E step.<\/p><p>https:\/\/gist.github.com\/VXU1230\/77ca2a26190920d4724501453e84ba0c#file-learn_params-py<\/p>\n<p id=\"333d\">Other than the initial parameters, everything else is the same so we can reuse the functions defined earlier. Let\u2019s train the model and plot the average log-likelihoods.<\/p>\n\n<p>https:\/\/gist.github.com\/VXU1230\/33b51d76fe736bd82d67a731ad50f441#file-semi-supervised-py<\/p>\n<p id=\"b9b2\">This time the average log-likelihoods converged in 4 steps, much faster than unsupervised learning.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f34fcd7 elementor-widget elementor-widget-image\" data-id=\"f34fcd7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1bnSXjrQI_uOBQCcn_WV6ZQ.png\" class=\"attachment-large size-large wp-image-18535\" alt=\"Implement Expectation-Maximization Algorithm(EM) in Python from Scratch\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1bnSXjrQI_uOBQCcn_WV6ZQ.png 640w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1bnSXjrQI_uOBQCcn_WV6ZQ-300x225.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1bnSXjrQI_uOBQCcn_WV6ZQ-610x458.png 610w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-875ab16 elementor-widget elementor-widget-text-editor\" data-id=\"875ab16\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d897\">To verify our implementation, we compare our forecasts with forecasts from the scikit-learn API. To build the model in scikit-learn, we simply call the GaussianMixture API and fit the model with our unlabeled data. Don\u2019t forget to pass the learned parameters to the model so it has the same initialization as our semi-supervised implementation.<code>GMM_sklearn()<\/code>returns the forecasts and posteriors from scikit-learn.<\/p>\n\n<p>https:\/\/gist.github.com\/VXU1230\/3e3dafedb33ed8b30689957ec356b969#file-compare_sklearn-py<\/p>\n<p id=\"69ea\">Comparing the results, we see that the learned parameters from both models are very close and 99.4% forecasts matched. In case you are curious, the minor difference is mostly caused by parameter regularization and numeric precision in matrix calculation.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-44b80ab elementor-widget elementor-widget-image\" data-id=\"44b80ab\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"392\" height=\"396\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1kI3BZR3pnzOvSmY5T2fV4Q.png\" class=\"attachment-large size-large wp-image-18536\" alt=\"Implement Expectation-Maximization Algorithm(EM) in Python from Scratch\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1kI3BZR3pnzOvSmY5T2fV4Q.png 392w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1kI3BZR3pnzOvSmY5T2fV4Q-297x300.png 297w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1kI3BZR3pnzOvSmY5T2fV4Q-75x75.png 75w\" sizes=\"(max-width: 392px) 100vw, 392px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f77b8fb elementor-widget elementor-widget-text-editor\" data-id=\"f77b8fb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d7af\">That\u2019s it! We just demystified the EM algorithm.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fbefa41 elementor-widget elementor-widget-heading\" data-id=\"fbefa41\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Conclusions<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2b9b298 elementor-widget elementor-widget-text-editor\" data-id=\"2b9b298\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"ebf5\">In this article, we explored how to train Gaussian Mixture Models with the Expectation-Maximization Algorithm and implemented it in Python to solve unsupervised and semi-supervised learning problems. EM is a very useful method to find the maximum likelihood when the model depends on latent variables and therefore is frequently used in machine learning.<\/p>\n<p id=\"9203\">I hope you had fun reading this article \ud83d\ude42<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>This article explores how to train Gaussian Mixture Models with the Expectation-Maximization Algorithm and implemente it in Python to solve unsupervised and semi-supervised learning problems.<\/p>\n","protected":false},"author":1008,"featured_media":18537,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[1283,92,1284,1269,1021],"ppma_author":[3744],"class_list":["post-22583","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-gmm","tag-machine-learning","tag-semi-supervised-gmm","tag-semi-supervised-learning","tag-unsupervised-model"],"authors":[{"term_id":3744,"user_id":1008,"is_guest":0,"slug":"siwei-xu","display_name":"Siwei Xu","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Siwei-Xu-150x150.jpeg","user_url":"https:\/\/www.linkedin.com\/company\/walmartglobaltech%20","last_name":"Xu","first_name":"Siwei","job_title":"","description":"Siwei Xu is Software Engineer at Walmart Labs."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22583","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1008"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22583"}],"version-history":[{"count":4,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22583\/revisions"}],"predecessor-version":[{"id":32334,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22583\/revisions\/32334"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/18537"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22583"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22583"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22583"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22583"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}