{"id":2259,"date":"2020-02-14T02:50:19","date_gmt":"2020-02-13T23:50:19","guid":{"rendered":"http:\/\/kusuaks7\/?p=1864"},"modified":"2024-01-10T15:47:37","modified_gmt":"2024-01-10T15:47:37","slug":"five-more-tools-and-techniques-for-better-plotting","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/five-more-tools-and-techniques-for-better-plotting\/","title":{"rendered":"Five more tools and techniques for better plotting"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"2259\" class=\"elementor elementor-2259\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-a7a6b17 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a7a6b17\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3bb8a684\" data-id=\"3bb8a684\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a8549d6 elementor-widget elementor-widget-heading\" data-id=\"a8549d6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>And getting the most out of your data<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0211043 elementor-widget elementor-widget-text-editor\" data-id=\"0211043\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"18d2\" data-selectable-paragraph=\"\">In real-life data science, plotting does matter.\u00a0In my day-to-day life, I spend more time plotting and analysing those charts, than doing anything else. Let me explain myself, I work at Ravelin Technology. Our business is data, and specifically, analyzing and predicting fraud for online merchants. The main product of the company uses a combination of machine learning, network analysis, rules and human insights for predicting if a transaction might or might not be fraud. We have an ad-hoc machine learning model for each one of our clients, but the building of that model is something that happens at the beginning of the relationship with them and then it mostly requires maintenance.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-58cfe5e elementor-widget elementor-widget-text-editor\" data-id=\"58cfe5e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"7d6b\" data-selectable-paragraph=\"\">Maintenance how? Sometimes it\u2019s for introducing new features or because of behavioural changes in customers. However, it can also be the case that something changes in the data we receive. Or perhaps there\u2019s just something we were originally missing for not having enough data when we built the model for the first time. It can also happen that either the client or us spot some dodgy performance in our predictions in, for example, one specific country. Whatever is the case, there\u2019s usually an extensive investigation to find out what\u2019s the problem and\/or what could we do better. And just to give a bit more of context, analyzing the performance of a model for us usually implies dataset with millions of rows and thousands of columns. This can only be addressed by plotting. It\u2019s almost impossible to find patterns or insights just by looking at the data. Plotting allows us to compare features\u2019 performance, see the evolution through time, distribution of values, differences in mean and median values, etc., etc., etc., etc.<\/p>\n<p id=\"792f\" data-selectable-paragraph=\"\">As I said in my previous story, in our field we must equally weight the importance of explainability and interpretability. Real-life Data Science never finds you working alone on a project and your workmates and\/or clients usually won\u2019t know much about the data you\u2019ll be using. Being able to explain your thinking process is a key part of any data-related job. That\u2019s why copying and pasting are not enough and charts personalization becomes key.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fdfd914 elementor-widget elementor-widget-text-editor\" data-id=\"fdfd914\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9acd\" data-selectable-paragraph=\"\">Today we\u2019ll go through 5 techniques to make better charts that I\u2019ve found useful in the past. Some of them are day-to-day tools, while others you\u2019ll use them every now and then. But having this story at hand, hopefully, will come in handy when the moment arrives. The libraries we\u2019ll be using are:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2c25631 elementor-widget elementor-widget-text-editor\" data-id=\"2c25631\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\">import matplotlib.pyplot as pltimport seaborn as sns<\/div>\n<p id=\"2f78\" data-selectable-paragraph=\"\">With the following style and configurations:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a20ec0e elementor-widget elementor-widget-text-editor\" data-id=\"a20ec0e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\">plt.style.use(\u2018fivethirtyeight\u2019)%config InlineBackend.figure_format = \u2018retina\u2019%matplotlib inline<\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5b9dd1a elementor-widget elementor-widget-heading\" data-id=\"5b9dd1a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"406d\" data-selectable-paragraph=\"\">1. Change range and steps in axis<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f743c00 elementor-widget elementor-widget-text-editor\" data-id=\"f743c00\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"0858\" data-selectable-paragraph=\"\">The default configuration of matplotlib or seaborn for setting up the range and steps it\u2019s usually good enough for visualizing out data, but sometimes we\u2019ll want to see all the steps in our axis explicitly shown. Or perhaps, something I\u2019ve found useful is drawing all the data but including the axis labels just for a specific range of the y or x-axis.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f1babde elementor-widget elementor-widget-text-editor\" data-id=\"f1babde\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"f9f1\" data-selectable-paragraph=\"\">For example, let\u2019s say we\u2019re plotting the distribution of our model\u2019s predictions and we want to concentrate ourself in the values in between 30 and 50, with a step every two units, and without losing sight of the rest of the values. Our original seabon\u2019s \u2018distplot\u2019 would be like:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-16a5c02 elementor-widget elementor-widget-image\" data-id=\"16a5c02\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1600\/0*5uWQ7C3_Z6egsVKp\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f25c0e8 elementor-widget elementor-widget-text-editor\" data-id=\"f25c0e8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"836e\" data-selectable-paragraph=\"\">We have now two options for accomplishing the idea above:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-87a71ce elementor-widget elementor-widget-text-editor\" data-id=\"87a71ce\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\">ax.set_xticks(range(30, 51, 2))ax.xaxis.set_ticks(np.arange(30, 51, 2))<\/div>\n<p id=\"166c\" data-selectable-paragraph=\"\">In both cases, we need to specify the starting point, ending point and step. Mind how the ending point follows a \u2018less than\u2019 kind of logic instead of \u2018equal to or less than\u2019. The result would be the following chart:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-977c89c elementor-widget elementor-widget-image\" data-id=\"977c89c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1600\/0*S_-9XKlf6teuZaBA\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-69491b9 elementor-widget elementor-widget-text-editor\" data-id=\"69491b9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"2c18\" data-selectable-paragraph=\"\">Also, mind how I\u2019m calling both options from the \u2018ax\u2019 object, given that\u2019s the default. When we create any kind of chart the axis (\u2018ax\u2019) and a figure (\u2018fig\u2019) are automatically created. We can also do the same the following way:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8b90ee7 elementor-widget elementor-widget-text-editor\" data-id=\"8b90ee7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\">myplot = sns.distplot(mydata)myplot.set_xticks(range(30,51,2))<\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4b302f5 elementor-widget elementor-widget-heading\" data-id=\"4b302f5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"75c7\" data-selectable-paragraph=\"\">2. Rotate ticks<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1dfc44d elementor-widget elementor-widget-text-editor\" data-id=\"1dfc44d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"348c\" data-selectable-paragraph=\"\">This is an easy one but very very useful tip if, for example, we\u2019re dealing with text labels instead of numbers. We can do this just by using the \u2018rotation\u2019 hyperparameter in \u2018set_xticklabels\u2019:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4270a7b elementor-widget elementor-widget-text-editor\" data-id=\"4270a7b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\">ax.set_xticklabels(labels=my_labels, rotation=90)<\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c30c9e1 elementor-widget elementor-widget-text-editor\" data-id=\"c30c9e1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"aa38\" data-selectable-paragraph=\"\">Note how I\u2019m also passing \u2018my_labels\u2019 to the \u2018labels\u2019 hyperparameter since that\u2019s mandatory when using \u2018xticklabels\u2019. However, if you\u2019re drawing a \u2018distplot\u2019, you can simply pass the range of values to be shown, while for any other chart, you can pass exactly the same array you specified for the x-axis. Also, you can combine this with the first technique like this:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3f6b1b9 elementor-widget elementor-widget-text-editor\" data-id=\"3f6b1b9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\">range_step = np.arange(30, 51, 2)ax.xaxis.set_ticks(range_step)ax.set_xticklabels(labels=range_step, rotation=90);<\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-24d62fb elementor-widget elementor-widget-text-editor\" data-id=\"24d62fb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"da7c\" data-selectable-paragraph=\"\">Obtaining the following result:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-4cdd5f2 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4cdd5f2\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-5e09556\" data-id=\"5e09556\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-fa89004 elementor-widget elementor-widget-image\" data-id=\"fa89004\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1600\/0*Yv9t3jIrRYnxiZho\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2461882 elementor-widget elementor-widget-heading\" data-id=\"2461882\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"e8cf\" data-selectable-paragraph=\"\">3. Change the space in between plots<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d80315f elementor-widget elementor-widget-text-editor\" data-id=\"d80315f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"fb94\" data-selectable-paragraph=\"\">More often than not, we\u2019ll want to plot several charts at once to compare their results, visualize them all together, or perhaps just to save time and\/or space. In any case, we can do that by using \u2018subplots\u2019 in a very simple way:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f6223e1 elementor-widget elementor-widget-text-editor\" data-id=\"f6223e1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\">fig, ax = plt.subplots(figsize=(18,10), nrows=2)<\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3490a23 elementor-widget elementor-widget-text-editor\" data-id=\"3490a23\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"21ef\" data-selectable-paragraph=\"\">We specified two rows, and therefore we\u2019ll be plotting two charts:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7191776 elementor-widget elementor-widget-text-editor\" data-id=\"7191776\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\">sns.distplot(mydata, ax=ax[0])sns.lineplot(x=mydata[\u2018xaxis\u2019], y=mydata[\u2018yaxis\u2019], ax=ax[1])<\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b5f77b1 elementor-widget elementor-widget-image\" data-id=\"b5f77b1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1600\/0*JBndMuGzJMxdHcD1\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4d015c0 elementor-widget elementor-widget-text-editor\" data-id=\"4d015c0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"a908\" data-selectable-paragraph=\"\">Now, sometimes, instead of having only two charts, we might have more. And perhaps we need to include titles for all of them. We may also have some charts with text labels that will need to rotate them for better readability. In cases like this, we could end up with some overlap between plots and increasing the space between charts could help us to visualize them better.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2341cba elementor-widget elementor-widget-text-editor\" data-id=\"2341cba\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\">plt.subplots_adjust(hspace = 0.8)<\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2134c06 elementor-widget elementor-widget-image\" data-id=\"2134c06\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1600\/0*EGDpKTmrvI4Gwi5b\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cb375bf elementor-widget elementor-widget-text-editor\" data-id=\"cb375bf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"5d2d\" data-selectable-paragraph=\"\">Mind how the hight of the figure remains the same (10 in this case), but the space in between charts increases. If you want to maintain your charts\u2019 size, you\u2019d have to increase your figure size through the \u2018figsize\u2019 hyperparameter.<\/p>\n<p id=\"d408\" data-se\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a2263b6 elementor-widget elementor-widget-text-editor\" data-id=\"a2263b6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d408\" data-selectable-paragraph=\"\">Also, the hyperparameter \u2018hspace\u2019 follows the horizontal space. If you were drawing multiple columns instead of rows, you could accomplish the same by using the hyperparameter \u2018vspace\u2019.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-668c828 elementor-widget elementor-widget-text-editor\" data-id=\"668c828\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"f6cb\" data-selectable-paragraph=\"\">By the way, if you want to found out how to set titles for your charts, you can find that tip and some others for better plotting in my\u00a0<a href=\"https:\/\/towardsdatascience.com\/10-tips-to-improve-your-plotting-f346fa468d18?source=friends_link&amp;sk=b2f7a584a74badc44d09d5de04fe30d8\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">previous story<\/a>.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a57aa49 elementor-widget elementor-widget-text-editor\" data-id=\"a57aa49\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<blockquote>\n<p id=\"6799\" data-selectable-paragraph=\"\"><strong>NOTE<\/strong>: just like I specified two rows to be drawn above, you could also specify a fixed number of columns. In that case, the indexing of the charts would follow a two indexes logic like ax=[0,1].<\/p>\n<\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ccf44a9 elementor-widget elementor-widget-heading\" data-id=\"ccf44a9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"a9c8\" data-selectable-paragraph=\"\">4. Customize your confusion matrix<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-92df676 elementor-widget elementor-widget-text-editor\" data-id=\"92df676\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"fbe7\" data-selectable-paragraph=\"\">Unfortunately, this is not the space for explaining in depth how the confusion matrix works or what it is useful for. Nonetheless, if you fancy learning more about it, I always recommend\u00a0<a href=\"https:\/\/medium.com\/thalus-ai\/performance-metrics-for-classification-problems-in-machine-learning-part-i-b085d432082b\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">this story<\/a>\u00a0from M. Sunasra.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-77589a5 elementor-widget elementor-widget-text-editor\" data-id=\"77589a5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d8fb\" data-selectable-paragraph=\"\">Now, if you\u2019re already familiar with the concept, you might have encountered in the past that sometimes the default heatmap created by \u2018plot_confusion_matrix\u2019 from \u2018sklearn.metrics\u2019 library, comes out with the upper and lower squares cut off, like the following picture:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5ec7297 elementor-widget elementor-widget-image\" data-id=\"5ec7297\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/720\/0*AUpl1vYFM25dvaRG\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0888dd9 elementor-widget elementor-widget-text-editor\" data-id=\"0888dd9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\" data-selectable-paragraph=\"\">source:\u00a0<a href=\"https:\/\/gis.stackexchange.com\/\" target=\"_blank\" rel=\"noopener nofollow noreferrer\">https:\/\/gis.stackexchange.com\/<\/a><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-35a9460 elementor-widget elementor-widget-text-editor\" data-id=\"35a9460\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"5860\" data-selectable-paragraph=\"\">We can solve this by plotting our own confusion matrix from scratch using just a bunch of lines. For example:<\/p>\n<p style=\"background: #eeeeee; border: 1px solid #cccccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">fig = plt.figure(figsize=(12,10))<\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0f99b86 elementor-widget elementor-widget-text-editor\" data-id=\"0f99b86\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"background: #eeeeee; border: 1px solid #cccccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">cm = skplt.metrics.confusion_matrix(real_y, pred_y)<\/span><\/p>\n<p style=\"background: #eeeeee; border: 1px solid #cccccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">labels=[0,1,2,3,4]<\/span><\/p>\n<p style=\"background: #eeeeee; border: 1px solid #cccccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">ax = sns.heatmap(cm, annot=True,annot_kws={\u201csize\u201d:12}, fmt=\u2019g\u2019, cmap=\u201dBlues\u201d, xticklabels=labels, yticklabels=labels)<\/span><\/p>\n<p style=\"background: #eeeeee; border: 1px solid #cccccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">bottom, top = ax.get_ylim()<\/span><\/p>\n<p style=\"background: #eeeeee; border: 1px solid #cccccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">ax.set_ylim(bottom + 0.5, top \u2014 0.5)<\/span><\/p>\n<p style=\"background: #eeeeee; border: 1px solid #cccccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">ax.set(ylabel=\u2019True label\u2019)<\/span><\/p>\n<p style=\"background: #eeeeee; border: 1px solid #cccccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">ax.set(xlabel=\u2019Predicted label\u2019)<\/span><\/p>\n<p style=\"background: #eeeeee; border: 1px solid #cccccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">plt.show()<\/span><\/p>\n<p id=\"1dde\" data-selectable-paragraph=\"\">What we\u2019re doing here is:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7ad6fe8 elementor-widget elementor-widget-text-editor\" data-id=\"7ad6fe8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol>\n \t<li id=\"09eb\" data-selectable-paragraph=\"\">We create an empty figure. Wider than taller since we\u2019ll have the annotations to the right of the heatmap<\/li>\n \t<li id=\"b0eb\" data-selectable-paragraph=\"\">We get the values of our confusion matrix through \u2018skplt.metrics.confusion_matrix\u2019<\/li>\n \t<li id=\"9570\" data-selectable-paragraph=\"\">We specified the \u2018labels\u2019 according to the number of categories we have<\/li>\n \t<li id=\"132f\" data-selectable-paragraph=\"\">We create a heatmap using the values from point 2 and specifying: i) \u2018annotations\u2019 equal True, ii) \u2018annot_kws\u2019 for specifying the font size of the annotations (12 in this case), iii) \u2018fmt\u2019 for the passing the string formatting code, iv) \u2018cmap\u2019 for the colour pattern, v) And finally we specify the labels for both axis (in a confusion matrix both of them are the same)<\/li>\n \t<li id=\"216e\" data-selectable-paragraph=\"\">We get the y-axis view limits and we set the again +- 0.5<\/li>\n \t<li id=\"e58f\" data-selectable-paragraph=\"\">Last step: set the y and \u2018xlabels\u2019 to true and predicted label respectively<\/li>\n<\/ol>\n<p id=\"7022\" data-selectable-paragraph=\"\">The result should be something like this:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-6d98da7 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6d98da7\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-1d130ee\" data-id=\"1d130ee\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-907bbe4 elementor-widget elementor-widget-image\" data-id=\"907bbe4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/673\/0*RxIrZjWxhL5exWHo\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fa66d41 elementor-widget elementor-widget-heading\" data-id=\"fa66d41\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"c186\" data-selectable-paragraph=\"\">5. Plot accumulative distributions<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7bdd366 elementor-widget elementor-widget-text-editor\" data-id=\"7bdd366\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"ad76\" data-selectable-paragraph=\"\">Surely I don\u2019t need to size how useful can be plotting accumulative distributions, either for better understanding the percentage of elements up to certain value or for comparing two different groups within our data.<\/p>\n<p id=\"90b8\" data-selectable-paragraph=\"\">You can easily get this kind of charts through Seaborn\u2019s \u2018distplot\u2019 chart itself just by setting the following:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-69323dd elementor-widget elementor-widget-text-editor\" data-id=\"69323dd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<pre>sns.distplot(my_data, label=\u2019my label\u2019, color=\u2019red\u2019, hist_kws=dict(cumulative=True))\n<\/pre>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e0d6ba0 elementor-widget elementor-widget-image\" data-id=\"e0d6ba0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/819\/0*pCE7hIIp48_BpDGs\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7c6e5f7 elementor-widget elementor-widget-text-editor\" data-id=\"7c6e5f7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"bc05\" data-selectable-paragraph=\"\">We can make the chart look better by setting the limits for the x-axis:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8551558 elementor-widget elementor-widget-text-editor\" data-id=\"8551558\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\">sns.distplot(my_data, label=\u2019my label\u2019, color=\u2019red\u2019, hist_kws=dict(cumulative=True)).set(xlim=(0, my_data.max()))<\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8909d6e elementor-widget elementor-widget-image\" data-id=\"8909d6e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/819\/0*ifArR7pbT0YbACi0\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c827d4f elementor-widget elementor-widget-text-editor\" data-id=\"c827d4f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"7614\" data-selectable-paragraph=\"\">As I said at the beginning of the story, some of these tools or tips I use them all the time, while some others only every now and then. But hopefully, knowing these quick fixes and techniques will help you to make better plots and to better understand your data itself.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Real-life Data Science never finds you working alone on a project and your workmates or clients usually won&rsquo;t know much about the data you&rsquo;ll be using. Being able to explain your thinking process is a key part of any data-related job. That&rsquo;s why copying and pasting are not enough and charts personalization becomes key. This blog goes through 5 techniques to make better charts that are useful. Some of them are day-to-day tools, while others you&rsquo;ll use them every now and then.<\/p>\n","protected":false},"author":726,"featured_media":3684,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[94],"ppma_author":[3568],"class_list":["post-2259","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":3568,"user_id":726,"is_guest":0,"slug":"gonzalo-ferreiro-volpi","display_name":"Gonzalo Volpi","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Volpi","first_name":"Gonzalo","job_title":"","description":"Gonzalo Ferreiro Volpi is Data Science Fellow for Investigations at <a href=\"https:\/\/www.linkedin.com\/company\/ravelin-technology\/\">Ravelin Technology,<\/a>&nbsp;award-winning fraud detection and prevention platform for online merchants and the payments industry in eCommerce.&nbsp;"}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2259","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/726"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=2259"}],"version-history":[{"count":5,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2259\/revisions"}],"predecessor-version":[{"id":35448,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2259\/revisions\/35448"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3684"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=2259"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=2259"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=2259"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=2259"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}