{"id":22624,"date":"2021-02-15T11:13:49","date_gmt":"2021-02-15T11:13:49","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/markov-clustering-algorithm\/"},"modified":"2023-09-05T06:19:34","modified_gmt":"2023-09-05T06:19:34","slug":"markov-clustering-algorithm","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/markov-clustering-algorithm\/","title":{"rendered":"Markov Clustering Algorithm"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22624\" class=\"elementor elementor-22624\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-aebb5e5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"aebb5e5\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-9513662\" data-id=\"9513662\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-28f71e6 elementor-widget elementor-widget-text-editor\" data-id=\"28f71e6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"has-medium-font-size\">Intuitive description with example and discussion<\/p><p id=\"1b1a\">In this post, we describe an interesting and effective graph-based clustering algorithm called Markov clustering. Like other graph-based clustering algorithms and unlike\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/K-means_clustering\" target=\"_blank\" rel=\"noreferrer noopener\"><em>K<\/em>-means clustering<\/a>, this algorithm does not require the number of clusters to be known in advance. (For more on this, see [1].)<\/p>\n<p id=\"3186\">This algorithm is very popular in clustering bioinformatics data, specifically to cluster protein sequences and to cluster genes from co-expression data [2]. This algorithm also lends itself to distributed computing [2]. As discussed there, the algorithm was able to utilize 2000 compute nodes to cluster a graph of about 70 million nodes and about 68 billion edges in less than 2\u00bd hours.<\/p>\n<p id=\"19fe\">In this post, we have just one aim: to describe the algorithm at an intuitive level, with suitably chosen examples that bring out its distinguishing features.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9a3aef8 elementor-widget elementor-widget-heading\" data-id=\"9a3aef8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">The Random Walk Principle<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e863bf1 elementor-widget elementor-widget-text-editor\" data-id=\"e863bf1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"f865\">The motivating idea in MCL is that if you start walking randomly from a node, you are more likely to move around in the same cluster than to cross clusters. This is because by definition clusters are internally dense while being separated by sparse regions. In <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/understanding-graph-embeddings\/\" target=\"_blank\" rel=\"noreferrer noopener\">graph clustering<\/a>, density and sparsity is defined in terms of the proportion of edge slots that have edges in them.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6d3d54c elementor-widget elementor-widget-heading\" data-id=\"6d3d54c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Let\u2019s see an example.<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-20bdd4c elementor-widget elementor-widget-image\" data-id=\"20bdd4c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"434\" height=\"187\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/00Vlng-1nVtPT2azp.png\" class=\"attachment-large size-large wp-image-18718\" alt=\"Markov Clustering Algorithm\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/00Vlng-1nVtPT2azp.png 434w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/00Vlng-1nVtPT2azp-300x129.png 300w\" sizes=\"(max-width: 434px) 100vw, 434px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">A graph with two connected components (clusters)<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-04458b9 elementor-widget elementor-widget-text-editor\" data-id=\"04458b9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"08b5\">The random walk will remain within the connected component where it starts, i.e. with {<em>a<\/em>,&nbsp;<em>b<\/em>} or within {<em>c<\/em>,&nbsp;<em>d<\/em>,&nbsp;<em>e<\/em>}. There is no way to get across.<\/p>\n<p id=\"05ec\">We could interpret this behavior as discovering the connected components as clusters.<\/p>\n<p id=\"2d30\">Okay, that\u2019s a very simple example just to get us started. MCL would not be interesting if that\u2019s all it could do. Let\u2019s see a more interesting example.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c7d4ab9 elementor-widget elementor-widget-image\" data-id=\"c7d4ab9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"676\" height=\"296\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/08nz3H6RZ2wWDYmbo.png\" class=\"attachment-large size-large wp-image-18719\" alt=\"Markov Clustering Algorithm\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/08nz3H6RZ2wWDYmbo.png 676w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/08nz3H6RZ2wWDYmbo-300x131.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/08nz3H6RZ2wWDYmbo-610x267.png 610w\" sizes=\"(max-width: 676px) 100vw, 676px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Randomly walking on this graph will visit nodes e and j more frequently<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b3452d1 elementor-widget elementor-widget-text-editor\" data-id=\"b3452d1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"fd1e\">Say we start the walk at node&nbsp;<em>a<\/em>. The walker will meander around but will eventually get to node&nbsp;<em>e<\/em>. Once it arrives at&nbsp;<em>e<\/em>, it is likely to revisit&nbsp;<em>e<\/em>&nbsp;soon. To see this, note that from&nbsp;<em>e<\/em>&nbsp;the next step is either to&nbsp;<em>d<\/em>,&nbsp;<em>f<\/em>,&nbsp;<em>g<\/em>,&nbsp;<em>h<\/em>, or<em>&nbsp;i<\/em>. In 3 of these 5, the step that follows will bring the walker back to&nbsp;<em>e<\/em>. You could say that the walker has discovered a core node in a cluster, in this case&nbsp;<em>e<\/em>.<\/p>\n<p id=\"5fa5\">How could it record this information? By just maintaining a counter for each node which records the number of times that node has been visited in a certain unit of time.<\/p>\n<p id=\"58d3\">As we will see soon, there is a more efficient way to record this information. Random walking is just a convenient way to explain the algorithm\u2019s behavior. That said, we\u2019ll stick with random walking for a while longer, as it has more insights to offer.<\/p>\n<p id=\"2a3e\">Let\u2019s continue walking where we left off. Because&nbsp;<em>e<\/em>&nbsp;gets visited frequently, at some point we will walk from&nbsp;<em>e<\/em>&nbsp;to&nbsp;<em>i<\/em>. And at some point shortly thereafter, from&nbsp;<em>i<\/em>&nbsp;to&nbsp;<em>j<\/em>. Once&nbsp;<em>j<\/em>&nbsp;has been visited for the first time, it will get visited frequently again for the same reason that&nbsp;<em>e<\/em>&nbsp;was.&nbsp;<em>j<\/em>&nbsp;is a second core node we have discovered.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ad33d38 elementor-widget elementor-widget-heading\" data-id=\"ad33d38\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Beyond Discovering Core Nodes<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e34ea9f elementor-widget elementor-widget-text-editor\" data-id=\"e34ea9f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"8164\">Encouraged by our finding that we can discover core nodes by walking randomly, let\u2019s dig deeper to see what else we can do.<\/p>\n<p id=\"8470\">Consider the example below. Think of the two 4-cliques (fully connected subgraphs on 4 nodes) as the clusters. The edge a-b crosses the two clusters. Ignore the nodes of degree 1 for now.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c65c756 elementor-widget elementor-widget-image\" data-id=\"c65c756\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"460\" height=\"257\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/08Rd35jFbCxmxKDXq.png\" class=\"attachment-large size-large wp-image-18720\" alt=\"Markov Clustering Algorithm\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/08Rd35jFbCxmxKDXq.png 460w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/08Rd35jFbCxmxKDXq-300x168.png 300w\" sizes=\"(max-width: 460px) 100vw, 460px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Two clusters connected by edge a-e<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-87c50be elementor-widget elementor-widget-text-editor\" data-id=\"87c50be\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3222\">Consider starting from node&nbsp;<em>a<\/em>&nbsp;and walking&nbsp;<em>k<\/em>&nbsp;steps. What is the probability distribution over the various nodes we end up at?<\/p>\n<p id=\"f460\">Before looking at this, let\u2019s define a key term, as it will keep cropping up in our discussion. The degree of a node is the number of edges that it touches.<\/p>\n<p id=\"4aee\">When&nbsp;<em>k&nbsp;<\/em>is 1, the probability of ending up at node&nbsp;<em>e<\/em>&nbsp;is \u00bc since&nbsp;<em>a<\/em>\u2019s degree is 4. When&nbsp;<em>k<\/em>&nbsp;is 2, the probability of ending up at&nbsp;<em>e<\/em>&nbsp;is 0 because there is no way to reach&nbsp;<em>e<\/em>&nbsp;from&nbsp;<em>a<\/em>&nbsp;in exactly two hops. When&nbsp;<em>k<\/em>&nbsp;is 3, the walks of length 3 that end at&nbsp;<em>e<\/em>&nbsp;have one of two structures<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d45462b elementor-widget elementor-widget-text-editor\" data-id=\"d45462b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<pre class=\"wp-block-preformatted\"><em>a<\/em> \u2192 { <em>b<\/em> or <em>c<\/em> or <em>d<\/em> } \u2192 <em>a<\/em> \u2192 <em>e<br>a<\/em> \u2192 <em>e<\/em> \u2192 { <em>f<\/em> or <em>g<\/em> or <em>h<\/em> } \u2192 <em>e<\/em><\/pre>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ed9b376 elementor-widget elementor-widget-text-editor\" data-id=\"ed9b376\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"f3c7\">So there are 6 such walks. The total number of distinct walks from node&nbsp;<em>a<\/em>&nbsp;of length 3 are 4*4*4 = 64 since every interior node on any of these walks has degree 4. So the probability of walking from&nbsp;<em>a<\/em>&nbsp;to&nbsp;<em>e<\/em>&nbsp;in exactly 3 steps is 6\/64. Way less than \u00bc.<\/p>\n<p id=\"dc47\">We can think of it this way. When we took just one step from&nbsp;<em>a<\/em>, all we see is&nbsp;<em>a<\/em>\u2019s neighbors. So the 1-step probability of ending up at&nbsp;<em>e<\/em>, while low, is not very low. If we walk a few more steps, we discover the cluster structure of&nbsp;<em>a&nbsp;<\/em>and realize that we are much more likely to stay within there than end up at&nbsp;<em>e<\/em>.<\/p>\n<p id=\"6f9d\">That said, if we walk too long, say&nbsp;<em>k<\/em>=20 steps, the situation gets more diffuse. Simply put whereas the first walk from&nbsp;<em>a<\/em>&nbsp;to&nbsp;<em>e<\/em>&nbsp;is a rare event, once we have walked to&nbsp;<em>e<\/em>, the probability of visiting&nbsp;<em>e<\/em>&nbsp;again in the next few steps increases as we are now in&nbsp;<em>e<\/em>\u2019s cluster.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-018f50d elementor-widget elementor-widget-heading\" data-id=\"018f50d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">How can we take advantage of this behavior?<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-449d7aa elementor-widget elementor-widget-text-editor\" data-id=\"449d7aa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"8fc1\">At this point, we\u2019ll bring in Markov chains. Consider, for every pair of nodes u and&nbsp;<em>v<\/em>, P<em>uv<\/em>(<em>k<\/em>), the probability of starting from node&nbsp;<em>u<\/em>&nbsp;and ending up at node&nbsp;<em>v<\/em>&nbsp;after walking&nbsp;<em>k<\/em>&nbsp;steps. P<em>uv<\/em>(1) is easily computed: it is just 1 divided by&nbsp;<em>u<\/em>\u2019s degree.<\/p>\n<p id=\"acea\">Now comes the key point. If we multiply the matrix&nbsp;<strong>P<\/strong>(1)=<strong>P<\/strong>&nbsp;with itself, we get&nbsp;<strong>P<\/strong>(2) =&nbsp;<strong>P<\/strong>\u00b2. More generally,&nbsp;<strong>P<\/strong>(k) =&nbsp;<strong>P<\/strong>^k.<\/p>\n<p id=\"d2bd\">This suggests the following procedure. We initialize&nbsp;<strong>P<\/strong>. We then compute&nbsp;<strong>P<\/strong>(k) by multiplying&nbsp;<strong>P&nbsp;<\/strong>with itself k times. (k is typically 2 or 3.) If some transition probability P<em>uv<\/em>(<em>k<\/em>) is especially low, way lower than P<em>uv<\/em>(1) is, we drive it further towards 0. A way to do this is to take each probability in&nbsp;<strong>P<\/strong>(k), raise it to a power greater than 1, and renormalize. In MCL, this process is called inflation. It enhances differences. The rich get richer.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ff2b286 elementor-widget elementor-widget-heading\" data-id=\"ff2b286\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Putting this all together, the algorithm goes as follows:<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6b1da82 elementor-widget elementor-widget-text-editor\" data-id=\"6b1da82\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<pre class=\"wp-block-preformatted\">Initialize <strong>P <\/strong>from the graph<br>repeat<br>  <strong>Q<\/strong> = <strong>P^k <br>  P<\/strong> = inflate(<strong>Q<\/strong>)<br>until happy<br>Prune away especially low-probability transitions<br>Find (strongly) connected components in the remaining directed graph<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3aaabe0 elementor-widget elementor-widget-text-editor\" data-id=\"3aaabe0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"7270\">We can interpret this algorithm as trying to find transitions that cross cluster boundaries and reduce their probabilities.<\/p>\n<p id=\"173c\">Why do we iterate then? Because there is a risk of pruning away edges in the same cluster if we don\u2019t iterate enough first.<\/p>\n<p id=\"cd82\">On the other hand, if we iterate without doing an inflation inside an iteration, we may diffuse the transition probabilities too rapidly and lose whatever we gained in the first few iterations. (We already saw an example of this.) Inflating seems to make the algorithm somewhat robust against this.<\/p>\n<p id=\"bd82\">Specifically, by deflating low probability transitions, we are forcing the algorithm to explore regions of the graph it might not yet have explored. Take an extreme form of this. Say we block a low-probability transition&nbsp;<em>u<\/em>&nbsp;\u2192&nbsp;<em>v<\/em>, i.e. truncate the probability to 0. We are forcing the algorithm to try to find a different way to get from&nbsp;<em>u<\/em>&nbsp;to&nbsp;<em>v<\/em>&nbsp;if it can. If it can\u2019t, we now gain more confidence that the transition from&nbsp;<em>u<\/em>&nbsp;to&nbsp;<em>v<\/em>&nbsp;crosses cluster boundaries.<\/p>\n<p id=\"5638\">Ideally, after all the iterations have ended, the especially low-probability transitions are the ones that cross cluster boundaries. We can then clip these. This results in a directed graph whose strongly connected components are the clusters.<\/p>\n<p id=\"7b9a\">In our example, the probabilities of the transitions from&nbsp;<em>a<\/em>&nbsp;to&nbsp;<em>e<\/em>&nbsp;and&nbsp;<em>e<\/em>&nbsp;to&nbsp;<em>a<\/em>&nbsp;would both reduce. We can interpret this as deleting the edge&nbsp;<em>a<\/em>&#8211;<em>e<\/em>&nbsp;in the undirected graph. The clusters are now the connected components, which can be found by a connected-components finding graph algorithm.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0c54891 elementor-widget elementor-widget-heading\" data-id=\"0c54891\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Summary<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-91ac56c elementor-widget elementor-widget-text-editor\" data-id=\"91ac56c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"285a\">In this post, we explained, with suitably chosen examples, how the Markov clustering algorithm works. We started by explaining how random walks on a graph can discover core nodes within clusters. We then discussed how taking powers of the Markov matrix helps us distinguish between within-cluster transitions and across-cluster transitions. We discussed how doing inflation further enhances these distinctions. Interspersing inflation allows us to continue exploring the graph without risking diluting the transition probabilities too much.<\/p>\n<p id=\"e9b4\">This algorithm is also attractive from the point of view of implementation. At its core, it uses very simple algebraic operations: powers of a matrix, and inflation. Consequently, it is very easy to implement for small-to-moderate size problems. For especially large problems, we can take advantage of the existence of efficient parallel algorithms for taking powers of a sparse matrix as mentioned in [2] below.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Intuitive description with example and discussion In this post, we describe an interesting and effective graph-based clustering algorithm called Markov clustering. Like other graph-based clustering algorithms and unlike\u00a0K-means clustering, this algorithm does not require the number of clusters to be known in advance. (For more on this, see [1].) This algorithm is very popular in<\/p>\n","protected":false},"author":1044,"featured_media":18721,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[342,1336,1337,1338],"ppma_author":[3691],"class_list":["post-22624","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-algorithm","tag-graph-based-clustering","tag-markov-chains","tag-markov-clustering"],"authors":[{"term_id":3691,"user_id":1044,"is_guest":0,"slug":"arun-jagota","display_name":"Arun Jagota","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Arun-Jagota-150x150.jpeg","user_url":"https:\/\/www.salesforce.com\/in\/?ir=1","last_name":"Jagota","first_name":"Arun","job_title":"","description":"Arun Jagota is Director of Data Science at Salesforce.com. A PhD in computer science, he has taught undergraduate, graduate, and continuing education courses in Computer Science at many US Universities from 1992 through 2006. He has written a number of books, most available at Amazon.com, 50 academic publications and has 17+ patents issued."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22624","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1044"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22624"}],"version-history":[{"count":4,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22624\/revisions"}],"predecessor-version":[{"id":32213,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22624\/revisions\/32213"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/18721"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22624"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22624"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22624"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22624"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}