{"id":22480,"date":"2020-12-03T10:23:20","date_gmt":"2020-12-03T10:23:20","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/weisfeiler-lehman-approximate-isomorphisms-metric-embeddings\/"},"modified":"2021-05-21T03:29:57","modified_gmt":"2021-05-21T03:29:57","slug":"weisfeiler-lehman-approximate-isomorphisms-metric-embeddings","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/weisfeiler-lehman-approximate-isomorphisms-metric-embeddings\/","title":{"rendered":"Beyond Weisfeiler-Lehman: Approximate Isomorphisms And Metric Embeddings"},"content":{"rendered":"\n<p><em>In this post, I argue that the graph isomorphism setting is too limiting for analysing the expressive power of graph neural networks and suggest a broader setting based on metric embeddings.<\/em><\/p>\n\n\n\n<p id=\"3adb\"><em>This is the third in the series of posts on the expressivity of graph neural networks. <em><a rel=\"noreferrer noopener\" href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/expressive-power-graph-neural-networks-weisfeiler-lehman\/\" target=\"_blank\">See\u00a0Part 1<\/a><\/em>\u00a0describing the relation between graph neural networks and the Weisfeiler-Lehman graph isomorphism test, and\u00a0\u00a0see <a rel=\"noreferrer noopener\" href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/weisfeiler-lehman-substructures-expressive-gnn\/\" target=\"_blank\">Part 2<\/a> showing how to construct provably powerful graph neural networks based on substructure counting.<\/em><\/p>\n\n\n\n<p id=\"c618\">W<em>eisfeiler-Lehman<\/em>\u00a0(WL) test [1] is a general name for\u00a0a hierarchy of graph-theoretical polynomial-time iterative algorithms providing a necessary but insufficient condition for graph isomorphism. In the context of deep learning on graphs, it was shown that message passing neural networks are\u00a0<a href=\"https:\/\/towardsdatascience.com\/expressive-power-of-graph-neural-networks-and-the-weisefeiler-lehman-test-b883db3c7c49\" target=\"_blank\" rel=\"noreferrer noopener\">as powerful as the 1-WL test<\/a>\u00a0[2]. Going to higher-order and hence more powerful\u00a0<em>k<\/em>-WL tests leads to graph neural networks with high computational complexity and non-local operations [3], which are impractical in real applications.<\/p>\n\n\n\n<p id=\"9e60\">A\u00a0<a href=\"https:\/\/towardsdatascience.com\/beyond-weisfeiler-lehman-using-substructures-for-provably-expressive-graph-neural-networks-d476ad665fa3\" target=\"_blank\" rel=\"noreferrer noopener\">different approach<\/a>, which we introduced in a recent paper [4], is to break from the Weisfeiler-Lehman hierarchy and resort to a message passing mechanism that is aware of local graph structures such as cycles, cliques, and paths. This leads to graph neural networks that have the attractive locality and low-complexity of standard message-passing architectures, at the same time being provably more powerful than 2-WL and at least not less powerful than 3-WL. Such a view opens deep links to open problems in graph theory that have not been previously explored in the context of graph neural network expressivity.<\/p>\n\n\n\n<p id=\"43c6\">Is this the end of the story? I believe we can do better by changing the perspective on the problem and will discuss a few directions in this post.<\/p>\n\n\n\n<p id=\"8b3a\"><strong>Approximate isomorphisms.&nbsp;<\/strong>While important insights into the expressive power of graph neural networks have emerged from the link to the graph isomorphism problem, the setting itself is rather limited.&nbsp;<mark>Graph neural networks can be seen as attempting to find a graph embedding&nbsp;<\/mark><mark><em>f<\/em><\/mark><mark>(<\/mark><mark><em>G<\/em><\/mark><mark>), with the desired property&nbsp;<\/mark><mark><em>f<\/em><\/mark><mark>(<\/mark><mark><em>G<\/em><\/mark><mark>)=<\/mark><mark><em>f<\/em><\/mark><mark>(<\/mark><mark><em>G\u2032<\/em><\/mark><mark>) iff&nbsp;<\/mark><mark><em>G<\/em><\/mark><mark>~<\/mark><mark><em>G\u2032<\/em><\/mark>. Unfortunately, since the Weisfeiler-Lehman test is a necessary but not sufficient condition for graph isomorphism, this works only one way:&nbsp;<em>f<\/em>(<em>G<\/em>)=<em>f<\/em>(<em>G\u2032<\/em>) if&nbsp;<em>G<\/em>~<em>G\u2032<\/em>, but the converse is not necessarily true: two non-isomorphic graphs can result in equal embeddings.<\/p>\n\n\n\n<p id=\"8746\">In practical applications, we rarely deal with exactly isomorphic graphs but rather graphs that are&nbsp;<em>approximately isomorphic<\/em>, as shown in the following figure:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1N3TQW4MWdingwjXe5ZIyfw.png\" alt=\"Beyond Weisfeiler-Lehman: Approximate Isomorphisms And Metric Embeddings\"\/><figcaption>Examples of approximately isomorphic graphs: both pairs of graphs&nbsp;<em>G<\/em>,&nbsp;<em>G\u2032<\/em>&nbsp;and&nbsp;<em>G<\/em>,&nbsp;<em>G\u2033<\/em>&nbsp;are non-isomorphic. However,&nbsp;<em>G<\/em>&nbsp;and&nbsp;<em>G\u2032<\/em>&nbsp;differ only by one edge and thus are \u201cmore isomorphic\u201d than&nbsp;<em>G<\/em>&nbsp;and&nbsp;<em>G\u2033<\/em>, which differ by five edges.<\/figcaption><\/figure><\/div>\n\n\n\n<p id=\"f68d\">A proper way of quantifying the notion of \u201capproximate isomorphism\u201d is in the form of a metric&nbsp;<em>d<\/em>, which is large if the two graphs are dissimilar and small otherwise. Graph edit distance [5] or the Gromov-Hausdorff distance [6] are two possible examples. Importantly, the metric is defined to be&nbsp;<em>d<\/em>(<em>G<\/em>,<em>G\u2032<\/em>)=0 iff&nbsp;<em>G<\/em>~<em>G\u2032<\/em>&nbsp;, implying that two isomorphic graphs are indiscernible [7]. The problem of&nbsp;<em>isometric embedding<\/em><\/p>\n\n\n\n<p id=\"b599\">|<em>f<\/em>(<em>G<\/em>)\u2212<em>f<\/em>(<em>G\u2032<\/em>)|=<em>d<\/em>(<em>G<\/em>,<em>G\u2032<\/em>) for all&nbsp;<em>G<\/em>,&nbsp;<em>G\u2032&nbsp;<\/em>(\u2660)<\/p>\n\n\n\n<p id=\"9b22\">attempts to replace the graph distance with the Euclidean distance |.| between the embeddings of the graphs in a way that the two distances are equal (the embedding&nbsp;<em>f<\/em>, in this case, is said to be \u201cisometric\u201d or \u201cdistance preserving\u201d).<\/p>\n\n\n\n<p id=\"327a\">The works on the expressive power of graph neural networks showed that it is possible to construct in polynomial time graph embeddings that satisfy (\u2660) on a family of graphs on which the Weisfeiler-Lehman test succeeds. This setting seems to be too restrictive. First, it is probably impossible to devise a local and efficient algorithm that guarantees (\u2660) for all graphs [8]. Second, the graph isomorphism problem does not provide any guarantees for approximately isomorphic graphs, in which case&nbsp;<em>d<\/em>(<em>G<\/em>,<em>G\u2032<\/em>)&gt;0. It is in principle possible, for example, that more similar graphs are less similar in the embedding space, i.e.,&nbsp;<em>d<\/em>(<em>G<\/em>,<em>G\u2032<\/em>)&lt;<em>d<\/em>(<em>G<\/em>,<em>G\u2033<\/em>) but |<em>f<\/em>(<em>G<\/em>)\u2212<em>f<\/em>(<em>G\u2032<\/em>)|&gt;|<em>f<\/em>(<em>G<\/em>)\u2212<em>f<\/em>(<em>G\u2033<\/em>)|.<\/p>\n\n\n\n<p id=\"a795\">The metric formulation, on the other hand, offers much more flexibility. Instead of requiring (\u2660) to hold exactly, one can relax it by bounding the&nbsp;<em>metric dilation<\/em>,<\/p>\n\n\n\n<p id=\"ecce\"><em>c<\/em>\u207b\u00b9&nbsp;<em>d<\/em>(<em>G<\/em>,<em>G\u2032<\/em>) \u2264 |<em>f<\/em>(<em>G<\/em>)\u2212<em>f<\/em>(<em>G\u2032<\/em>)| \u2264&nbsp;<em>c d<\/em>(<em>G<\/em>,<em>G\u2032<\/em>) for&nbsp;<em>c<\/em>\u22651 (\u2663)<\/p>\n\n\n\n<p id=\"4f4e\">which can be expressed as the bi-Lipschitz constant&nbsp;<em>c<\/em>&nbsp;of the embedding&nbsp;<em>f<\/em>. This means that the embedding does not \u201cstretch\u201d of \u201cshrink\u201d the groundtruth graph distance&nbsp;<em>d<\/em>&nbsp;by more than a factor&nbsp;<em>c<\/em>, which should ideally be as close to 1 as possible. Such an embedding is called an&nbsp;<em>approximate isometry.<\/em>&nbsp;A result one would try to obtain is a bound&nbsp;<em>c<\/em>&nbsp;independent of the graphs, and in particular, on the number of vertices.<\/p>\n\n\n\n<p id=\"149e\">Note that the graph isomorphism is a particular case of (\u2663), since for isomorphic graphs&nbsp;<em>G<\/em>~<em>G\u2032<\/em>&nbsp;one has&nbsp;<em>d<\/em>(<em>G<\/em>,<em>G\u2032<\/em>) and thus&nbsp;<em>f<\/em>(<em>G<\/em>)=<em>f<\/em>(<em>G\u2032<\/em>). As discussed above, this is impossible to achieve for all graphs. Instead, model (\u2663) can be further relaxed allowing for additional&nbsp;<em>metric distortion \u03b5<\/em>,<\/p>\n\n\n\n<p id=\"a293\"><em>c<\/em>\u207b\u00b9&nbsp;<em>d<\/em>(<em>G<\/em>,<em>G\u2032<\/em>)\u2212<em>\u03b5&nbsp;<\/em>\u2264 |<em>f<\/em>(<em>G<\/em>)\u2212<em>f<\/em>(<em>G\u2032<\/em>)| \u2264&nbsp;<em>c d<\/em>(<em>G<\/em>,<em>G\u2032<\/em>)+<em>\u03b5<\/em>&nbsp;for&nbsp;<em>c<\/em>\u22651 (\u2666)<\/p>\n\n\n\n<p id=\"9700\">which sets a tolerance on how far apart can isomorphic graphs be. This is a small price to pay in many applications if in turn one can possibly satisfy (\u2666) for a much larger family of graphs than (\u2660).<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1nOTyxpzrw4V9szMYBrA4vw.png\" alt=\"Beyond Weisfeiler-Lehman: Approximate Isomorphisms And Metric Embeddings\"\/><figcaption>The approximate isometric embedding problem.<\/figcaption><\/figure><\/div>\n\n\n\n<p id=\"4589\"><strong>Probably approximate isometry.&nbsp;<\/strong>The approximate isomorphism problem can be rewritten as a probabilistic statement in the \u201cprobably approximately correct\u201d (PAC) form<\/p>\n\n\n\n<p id=\"82f1\"> (&nbsp;<em>c<\/em>\u207b\u00b9&nbsp;<em>d<\/em>(<em>G<\/em>,<em>G\u2032<\/em>)\u2212<em>\u03b5&nbsp;<\/em>\u2264 |<em>f<\/em>(<em>G<\/em>)\u2212<em>f<\/em>(<em>G\u2032<\/em>)| \u2264&nbsp;<em>c d<\/em>(<em>G<\/em>,<em>G\u2032<\/em>) )&gt;1\u2212<em>\u03b4<\/em><\/p>\n\n\n\n<p id=\"491d\">where the \u201capproximately\u201d is understood in the sense of the metric dilation&nbsp;<em>c<\/em>&nbsp;and \u201cprobably\u201d is that the bound (\u2666) holds with high probability 1\u2212<em>\u03b4<\/em>&nbsp;for some distribution of graphs [9]. It is possible that such a setting will turn easier for analysis than the deterministic approximate isometric embedding problem.<\/p>\n\n\n\n<p id=\"74c5\"><strong>Bridging continuous and discrete structures.&nbsp;<\/strong>Last but not least, the question of graph neural network expressivity has been primarily focused on the&nbsp;<em>topology<\/em>&nbsp;of the graphs. However, in most practical applications, the graphs also have node- or edge-wise features or attributes, typically represented as vectors [10]. The metric framework naturally handles both structures by defining an appropriate distance between attributed graphs.<\/p>\n\n\n\n<p id=\"a94c\">As a final remark, I remind that one should also bear in mind that the majority of the works on the expressive power of graph neural networks are very recent developments building upon classical results in graph theory. Expanding into additional fields of mathematics such as metric geometry seems a promising avenue that is likely to bear fruit in the near future.<\/p>\n\n\n\n<p id=\"73f3\">[1] B. Weisfeiler, A. Lehman,\u00a0<a href=\"https:\/\/www.iti.zcu.cz\/wl2018\/pdf\/wl_paper_translation.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">The reduction of a graph to canonical form and the algebra which appears therein<\/a>, 1968 (English translation)<\/p>\n\n\n\n<p id=\"a883\">[2] K. Xu et al.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1810.00826\" target=\"_blank\" rel=\"noreferrer noopener\">How powerful are graph neural networks?<\/a>\u00a0(2019). Proc. ICLR.<\/p>\n\n\n\n<p id=\"680a\">[3] H. Maron et al.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1905.11136\" target=\"_blank\" rel=\"noreferrer noopener\">Provably powerful graph neural networks<\/a>\u00a0(2019). Proc. NeurIPS.<\/p>\n\n\n\n<p id=\"ba28\">[4] G. Bouritsas et al.\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2006.09252\" target=\"_blank\" rel=\"noreferrer noopener\">Improving graph neural network expressivity via subgraph isomorphism counting<\/a>\u00a0(2020). arXiv:2006.09252.<\/p>\n\n\n\n<p id=\"33b1\">[5] A. Sanfeliu, K.-S. Fu, A distance measure between attributed relational graphs for pattern recognition (1983). IEEE Trans. Systems, Man, and Cybernetics (3):353\u2013362.<\/p>\n\n\n\n<p id=\"e85c\">[6] M. Gromov, Structures m\u00e9triques pour les vari\u00e9t\u00e9s riemanniennes (1981).<\/p>\n\n\n\n<p id=\"5d74\">[7] Technically speaking,&nbsp;<em>d<\/em>&nbsp;is a pseudo-metric and can be regarded as a metric on the quotient set  \u29f5~ of graphs modulo the isomorphism relation.<\/p>\n\n\n\n<p id=\"8cfb\">[8] I say \u201cprobably\u201d because it is still unknown whether the graph isomorphism problem has a polynomial-time solution. It seems extremely unlikely that such a hypothetical algorithm would have practical time and space complexity.<\/p>\n\n\n\n<p id=\"d2db\">[9] A version of a probabilistic metric setting was previously used to analyse graph kernels in N. M. Kriege et al.\u00a0<a href=\"https:\/\/www.ijcai.org\/Proceedings\/2018\/0325.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">A property testing framework for the theoretical expressivity of graph kernels<\/a>\u00a0(2018). Proc. IJCAI.<\/p>\n\n\n\n<p id=\"fe9e\">[10] While the Weisfeiler-Lehman framework technically applies to coloured graphs, it is limited to nodes with a discrete set of features.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post argues that the graph isomorphism setting is too limiting for analysing the expressive power of graph neural networks and suggest a broader setting based on metric embeddings.<\/p>\n","protected":false},"author":874,"featured_media":16921,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[561,1080,1071],"ppma_author":[3686],"class_list":["post-22480","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-graph-neural-networks","tag-metric-embeddings","tag-weisfeiler-lehman-test"],"authors":[{"term_id":3686,"user_id":874,"is_guest":0,"slug":"michael-bronstein","display_name":"Michael Bronstein","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/08\/Michael-Bronstein-150x150.jpg","user_url":"https:\/\/www.imperial.ac.uk\/people\/m.bronstein","last_name":"Bronstein","first_name":"Michael","job_title":"","description":"Michael Bronstein is Professor, Chair in Machine Learning and Pattern Recognition at Imperial College, London, besides Head of Graph ML at Twitter \/ ML Lead at ProjectCETI\/ ex Founder &amp; Chief Scientist at Fabula_ai\/ ex at Intel #AI #ML #graphs. His main expertise is in theoretical and computational geometric methods for data analysis, and his research encompasses a broad spectrum of applications ranging from machine learning, computer vision, and pattern recognition to geometry processing, computer graphics, and imaging. He has authored over 150 papers, the book Numerical geometry of non-rigid shapes (Springer 2008), and holds over 30 granted patents."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22480","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/874"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22480"}],"version-history":[{"count":1,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22480\/revisions"}],"predecessor-version":[{"id":23177,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22480\/revisions\/23177"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/16921"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22480"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22480"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22480"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22480"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}