{"id":1856,"date":"2019-07-31T06:24:37","date_gmt":"2019-07-31T06:24:37","guid":{"rendered":"http:\/\/kusuaks7\/?p=1461"},"modified":"2024-07-24T16:44:49","modified_gmt":"2024-07-24T16:44:49","slug":"an-introduction-to-big-data-clustering","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/an-introduction-to-big-data-clustering\/","title":{"rendered":"An Introduction to Big Data: Clustering"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"1856\" class=\"elementor elementor-1856\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-2428e52a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"2428e52a\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3ce0d169\" data-id=\"3ce0d169\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2f7a6c82 elementor-widget elementor-widget-text-editor\" data-id=\"2f7a6c82\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tINTRODUCTION\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4f7f012 elementor-widget elementor-widget-text-editor\" data-id=\"4f7f012\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tClustering is a Machine Learning technique that involves the grouping of data points. Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. In theory, data points that are in the same group should have similar properties and\/or features, while data points in different groups should have highly dissimilar properties and\/or features. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many fields.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-05ec08f elementor-widget elementor-widget-text-editor\" data-id=\"05ec08f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThere are many different clustering models:\n<ul data-rte-list=\"default\">\n \t<li>Connectivity models based on connectivity distance.<\/li>\n \t<li>Centroid models based on central individuals and distance.<\/li>\n \t<li>Density models based on connected and dense regions in space.<\/li>\n \t<li>Graph-based models based on cliques and their relaxations.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7cc196c elementor-widget elementor-widget-text-editor\" data-id=\"7cc196c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIn this article, I will walk through 3 models: k-means (centroid), hierarchical (graph), and DBSCAN (density).\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7ba9820 elementor-widget elementor-widget-heading\" data-id=\"7ba9820\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>1\u200a\u2014\u200aK-MEANS CLUSTERING<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6d56c8d elementor-widget elementor-widget-text-editor\" data-id=\"6d56c8d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe simplest clustering algorithm is k-means, which is a centroid-based model. Shown in the images below is a demonstration of the algorithm.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-742e0ac elementor-widget elementor-widget-image\" data-id=\"742e0ac\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555027754705-VVMQ6B5UUA8WTDTG8EXI\/ke17ZwdGBToddI8pDm48kM6Sh6VprS0eFQRCGN3XfoUUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcqHfGtR9UeZ1YHoiD6g_dmhjEvk2ryGptGh9fNPQ1l6bDLW2S2MpmsXA8fohvNPhv\/k-means1.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-485b463 elementor-widget elementor-widget-text-editor\" data-id=\"485b463\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tWe start out with k initial \u201cmeans\u201d (in this case, k = 3), which are randomly generated within the data domain (shown in color).\u00a0<em>k<\/em>\u00a0clusters are then created by associating every observation with the nearest mean. The partitions here represent the Voronoi diagram generated by the means. The centroid of each of the\u00a0<em>k<\/em>\u00a0clusters becomes the new mean. Steps 2 and 3 are repeated until convergence has been reached.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a267733 elementor-widget elementor-widget-text-editor\" data-id=\"a267733\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe pseudocode of k-means clustering is shown here:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b431a2e elementor-widget elementor-widget-image\" data-id=\"b431a2e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555027781786-23UWN6Y3LG6Q9ZAHW1LP\/ke17ZwdGBToddI8pDm48kLUbs0zC8tNQWGoPqK63E_1Zw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZamWLI2zvYWH8K3-s_4yszcp2ryTI0HqTOaaUohrI8PI7A63Fwgc8N0uzRmjp3ViDJWkEYjged0Pe3x3xEaoEm0\/k-means2.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5980ec3 elementor-widget elementor-widget-text-editor\" data-id=\"5980ec3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong>Example<\/strong>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-522de77 elementor-widget elementor-widget-image\" data-id=\"522de77\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"257\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2019\/09\/image001-1024x257.png\" class=\"attachment-large size-large wp-image-36525\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2019\/09\/image001-1024x257.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2019\/09\/image001-300x75.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2019\/09\/image001-768x193.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2019\/09\/image001-1536x386.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2019\/09\/image001-2048x515.png 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2019\/09\/image001-610x153.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2019\/09\/image001-750x189.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2019\/09\/image001-1140x287.png 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-88b4783 elementor-widget elementor-widget-text-editor\" data-id=\"88b4783\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThere are 9 points, including (1, 2), (1, 3), (2, 3), (2, 4), (4, 6), (5, 6), (6, 6), (6, 8), (7, 7). We will create 2 random centroids in the orange X marks at coordinates (2,8) (centroid 1) and (8, 1) (centroid 2). We will choose k = 2 and use the Manhattan distance to calculate the distance between points and the centroids. We would have the following results of the centroid distances:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-55c1083 elementor-widget elementor-widget-image\" data-id=\"55c1083\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028027379-V1X00Q1P6QSVBT52IA11\/ke17ZwdGBToddI8pDm48kPOpzdT5nbbUYRsLK4bWSwFZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZUJFbgE-7XRK3dMEBRBhUpy5OqJHZgpiq4OBf18zvapmmLQ74INfslNkXz_xD9QNnrH8LlRcjHZq7OiG7tXDeb0\/example2.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d8db9af elementor-widget elementor-widget-text-editor\" data-id=\"d8db9af\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tSo data point (1, 2) is 7 units away from centroid 1 and 8 units away from centroid 2; data point (1, 3) is 6 units away from centroid 1 and 9 units away from centroid 2; data point (2, 3) is 5 units away from centroid 1 and 8 units away from centroid 2, and so on. Using this data, we can subsequently update our centroids. Remember that the old centroids are (2, 8) and (8, 1), we have the new centroids as (4, 5) and (2, 2) as demonstrated by the green X marks.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b686d6b elementor-widget__width-initial elementor-widget elementor-widget-image\" data-id=\"b686d6b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028040299-NGNXZXFQC6HR2HROQ2Z9\/ke17ZwdGBToddI8pDm48kDy1jqRvnMWEBPrlSpqIj3tZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZamWLI2zvYWH8K3-s_4yszcp2ryTI0HqTOaaUohrI8PITnU2fyiuXICA43C4KazWKzsh5qJTPpWh907VobjmOh8\/example3.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dbb7c1c elementor-widget__width-initial elementor-widget elementor-widget-image\" data-id=\"dbb7c1c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028027379-V1X00Q1P6QSVBT52IA11\/ke17ZwdGBToddI8pDm48kPOpzdT5nbbUYRsLK4bWSwFZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZUJFbgE-7XRK3dMEBRBhUpy5OqJHZgpiq4OBf18zvapmmLQ74INfslNkXz_xD9QNnrH8LlRcjHZq7OiG7tXDeb0\/example2.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5b7cc48 elementor-widget elementor-widget-text-editor\" data-id=\"5b7cc48\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIf we keep calculating the Manhattan distance of each data point with respect to these 2 new centroids, we\u2019ll end up with another table results:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-82454a0 elementor-widget elementor-widget-image\" data-id=\"82454a0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028055168-QGCOA0M0MY4H924E18NY\/ke17ZwdGBToddI8pDm48kF8Lo_hNsk3JmJk9clB6xKRZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZUJFbgE-7XRK3dMEBRBhUpze8pLi_k0lv2sKsTx6MSpN_eRsKi3CLWjCKQyTDs_tbQMawYxt6_-_1Zs6i2wzxns\/example4.png?format=750w\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0742538 elementor-widget elementor-widget-text-editor\" data-id=\"0742538\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tUsing this data we can update our old centroids in orange X marks from (4, 5) and (2, 2) to new centroids in green X marks from (6, 7) and (1, 3), respectively.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3018a71 elementor-widget elementor-widget-image\" data-id=\"3018a71\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028071867-QTLRDYAVK5CBXLPD07GK\/ke17ZwdGBToddI8pDm48kNwFZBE3eOlLh2LjJhU2rxlZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZamWLI2zvYWH8K3-s_4yszcp2ryTI0HqTOaaUohrI8PIw2QfvJmtFK1KNZvUszJCwq3FedP255uYW2SUA_zSZRg\/example5.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8afbf75 elementor-widget elementor-widget-text-editor\" data-id=\"8afbf75\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tFinally, we repeat this whole process one more time to get the results table.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e6707e7 elementor-widget elementor-widget-image\" data-id=\"e6707e7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028085272-OXEFVF76C1IR51CJQCVJ\/ke17ZwdGBToddI8pDm48kPvTY5QWTMxt0xPqCmC30kRZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZUJFbgE-7XRK3dMEBRBhUpwThQMoKSjzYfiHa-U6eInyKLg7eZCEsT5-m7csjO4sguhHnJP7S_x8_q0EMS_Hh-0\/example6.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d2ef631 elementor-widget elementor-widget-text-editor\" data-id=\"d2ef631\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAt this time, our new centroids overlap with old centroids at (6, 7) and (1, 3). Thus, the algorithm stops.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4035135 elementor-widget elementor-widget-image\" data-id=\"4035135\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028097877-FMQ0KYGCKIOTXWEQVJ12\/ke17ZwdGBToddI8pDm48kCxfphzb6bGtbzbX5ZNpr09Zw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZamWLI2zvYWH8K3-s_4yszcp2ryTI0HqTOaaUohrI8PIYFCcBpjfQIoHBauPeyexA2j35SNI3OlBA7baHnSwXbo\/example7.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d96ad53 elementor-widget elementor-widget-text-editor\" data-id=\"d96ad53\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tWhen choosing hyper-parameter k (the number of clusters), we need to be careful to avoid\u00a0<strong>overfitting<\/strong>. This occurs when our model is too closely tied to our training data. Usually, a simpler model is better to avoid overfitting.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cf9f602 elementor-widget elementor-widget-text-editor\" data-id=\"cf9f602\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe correct choice of\u00a0<em>k<\/em>\u00a0is often ambiguous, with interpretations depending on the shape and scale of the distribution of points in a data set and the desired clustering resolution of the user. In addition, increasing\u00a0<em>k<\/em>\u00a0without penalty will always reduce the amount of error in the resulting clustering, to the extreme case of zero error if each data point is considered its own cluster (i.e., when\u00a0<em>k <\/em>equals the number of data points,\u00a0<em>n<\/em>). Intuitively then,\u00a0<em>the optimal choice of<\/em>\u00a0k\u00a0<em>will strike a balance between maximum compression of the data using a single cluster, and maximum accuracy by assigning each data point to its own cluster<\/em>. If an appropriate value of\u00a0<em>k<\/em>\u00a0is not apparent from prior knowledge of the properties of the data set, it must be chosen somehow. There are several categories of methods for making this decision.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0f2c52e elementor-widget elementor-widget-image\" data-id=\"0f2c52e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028134376-96BSNTT3XU7XA28TIPOP\/ke17ZwdGBToddI8pDm48kNF9_jTVSxmUXA9ySMkOi3lZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZamWLI2zvYWH8K3-s_4yszcp2ryTI0HqTOaaUohrI8PIqDX_fwtcu288PfTMg2kD6_GNBoqRO73VxnDSKtsdhys\/SSE.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e27fb5d elementor-widget elementor-widget-text-editor\" data-id=\"e27fb5d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe\u00a0<strong>sum of squared errors<\/strong>\u00a0is a good evaluation metric to choose the number of clusters. It is calculated based on the equation below. Essentially, we want to choose k such that we can minimize SSE.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-81921b5 elementor-widget elementor-widget-image\" data-id=\"81921b5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028163997-0NENR348NLS9K1N9UNH8\/ke17ZwdGBToddI8pDm48kFUfocj-sISRZPUF86hsAqMUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKczNSVqXlhIHi7r18c6HRjb3tcfscSLjcLq66Osf918vwxqYcMvk22tbxxRPygOFrw\/Silhouette1.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-227b834 elementor-widget elementor-widget-text-editor\" data-id=\"227b834\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe average\u00a0<strong>silhouette<\/strong>\u00a0of the data is another useful criterion for assessing the natural number of clusters. The silhouette of a data instance is a measure of how closely it is matched to data within its cluster and how loosely it is matched to data of the neighboring cluster, i.e. the cluster whose average distance from the datum is lowest. A silhouette close to 1 implies the datum is in an appropriate cluster, while a silhouette close to \u22121 implies the datum is in the wrong cluster. Optimization techniques such as genetic algorithms are useful in determining the number of clusters that give rise to the largest silhouette. It is also possible to re-scale the data in such a way that the silhouette is more likely to be maximized at the correct number of clusters.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6168737 elementor-widget elementor-widget-image\" data-id=\"6168737\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028194833-OYSCE66QSVQO5F3V2H50\/ke17ZwdGBToddI8pDm48kEWiO525_1R1Zgw7rXO6gJxZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZamWLI2zvYWH8K3-s_4yszcp2ryTI0HqTOaaUohrI8PIRCFg1hA9AO4st8SVy0ERcG2pMT0qf0U2Toa00N4lnMI\/Silhouette2.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-30ada50 elementor-widget elementor-widget-text-editor\" data-id=\"30ada50\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThus, the average silhouette value is 0.72.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ca09d07 elementor-widget elementor-widget-heading\" data-id=\"ca09d07\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>2\u200a\u2014\u200aHIERARCHICAL CLUSTERING<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1bf5e40 elementor-widget elementor-widget-text-editor\" data-id=\"1bf5e40\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBroadly speaking there are two ways of clustering data points based on the algorithmic structure and operation, namely agglomerative and divisive.\n<ul data-rte-list=\"default\">\n \t<li><strong>Agglomerative:<\/strong>\u00a0An agglomerative approach begins with each observation in a distinct (singleton) cluster, and successively merges clusters together until a stopping criterion is satisfied.<\/li>\n \t<li><strong>Divisive:<\/strong>\u00a0A divisive method begins with all patterns in a single cluster and performs splitting until a stopping criterion is met.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-71c65f4 elementor-widget elementor-widget-image\" data-id=\"71c65f4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028230339-M7KPT804N8KDVVVKMLM5\/ke17ZwdGBToddI8pDm48kOJ_4VutST_qlFmx-UvigGsUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcQQsCWGtOdQyPKxQ0mQWVDt5SJvrPROPsoR6cCtTTtc80Tr4bI74hicvug9TUxrlP\/agglomerative-divisive.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-161b93e elementor-widget elementor-widget-text-editor\" data-id=\"161b93e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tHierarchical clustering is an instance of the agglomerative or bottom-up approach, where we start with each data point as its own cluster and then combine clusters based on some similarity measure.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0272d49 elementor-widget elementor-widget-text-editor\" data-id=\"0272d49\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe key operation in hierarchical agglomerative clustering is to repeatedly combine the two nearest clusters into a larger cluster. There are three key questions that need to be answered first:\n<ul data-rte-list=\"default\">\n \t<li>How do you represent a cluster of more than one point?<\/li>\n \t<li>How do you determine the \u201cnearness\u201d of clusters?<\/li>\n \t<li>When do you stop combining clusters?<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bca02b5 elementor-widget elementor-widget-image\" data-id=\"bca02b5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028255410-3JQ0DAXKMK8HOJL24IPT\/ke17ZwdGBToddI8pDm48kCldU2Vlci8FbMOUt6mKs2VZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZamWLI2zvYWH8K3-s_4yszcp2ryTI0HqTOaaUohrI8PIVGOc3zT5-DyJPnB8qWz66qtlgzAfaZTeSd11i2y9Bs0\/hierarchical-clustering1.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-606bf93 elementor-widget elementor-widget-text-editor\" data-id=\"606bf93\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBefore applying hierarchical clustering let\u2019s have a look at its working:\n<ol data-rte-list=\"default\">\n \t<li>It starts by calculating the distance between every pair of observation points and store it in a distance matrix.<\/li>\n \t<li>It then puts every point in its own cluster.<\/li>\n \t<li>Then it starts merging the closest pairs of points based on the distances from the distance matrix and as a result, the amount of clusters goes down by 1.<\/li>\n \t<li>Then it recomputes the distance between the new cluster and the old ones and stores them in a new distance matrix.<\/li>\n \t<li>Lastly, it repeats steps 2 and 3 until all the clusters are merged into one single cluster.<\/li>\n<\/ol>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-29c6fe0 elementor-widget elementor-widget-image\" data-id=\"29c6fe0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028307359-3B60V8EJ48X4CF2AENG6\/ke17ZwdGBToddI8pDm48kCgLzaA76JeyjNru-bghVOlZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZamWLI2zvYWH8K3-s_4yszcp2ryTI0HqTOaaUohrI8PIG-NmnRdQD5IBC-hlpBJOP0UxiOZlOk0kahR77K1tZ3I\/hierarchical-clustering2.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-77411a1 elementor-widget elementor-widget-text-editor\" data-id=\"77411a1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIn hierarchical clustering, you categorize the objects into a hierarchy similar to a tree-like diagram which is called a\u00a0<strong><em>dendrogram<\/em><\/strong>. The distance of split or merge (called height) is shown on the top line of the dendrogram below.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-535027f elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"535027f\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-aecabb6\" data-id=\"aecabb6\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-fc8353d elementor-widget elementor-widget-image\" data-id=\"fc8353d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028324098-N1Q2OCZO45647T526N4W\/ke17ZwdGBToddI8pDm48kNIL_6YAJPHfVu6Erf7y05lZw-zPPgdn4jUwVcJE1ZvWEtT5uBSRWt4vQZAgTJucoTqqXjS3CfNDSuuf31e0tVF39dlbnNZjm0YK9tQLK5tJD4quBC_bhnEC9NDrsR3LapuG45vQwBxdpDrCGUSSl5w\/dendogram.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eadba74 elementor-widget elementor-widget-text-editor\" data-id=\"eadba74\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIn the left figure, at first\u00a0<strong><em>goat <\/em><\/strong>and\u00a0<strong><em>kid<\/em><\/strong>\u00a0are combined into one cluster, say cluster 1, since they were the closest in distance followed by\u00a0<strong><em>chick<\/em><\/strong>\u00a0and\u00a0<strong><em>duckling<\/em><\/strong>, say cluster 2. After that\u00a0<strong><em>turkey <\/em><\/strong>was merged in cluster 2, followed by another cluster of\u00a0<strong><em>duck<\/em><\/strong>\u00a0and\u00a0<strong><em>goose<\/em><\/strong>. We found another cluster consisting of\u00a0<strong><em>lamb<\/em><\/strong>\u00a0and\u00a0<strong><em>sheep<\/em><\/strong>, merging that into cluster 1. We keep repeating this until the clustering process stops.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c5f49bf elementor-widget elementor-widget-text-editor\" data-id=\"c5f49bf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThere are several ways to measure the distance between clusters in order to decide the rules for clustering, and they are often called Linkage Methods. Some of the common linkage methods are:\n<ul data-rte-list=\"default\">\n \t<li><strong>Complete-linkage<\/strong>: calculates the maximum distance between clusters before merging.<\/li>\n \t<li><strong>Single-linkage<\/strong>: calculates the minimum distance between the clusters before merging. This linkage may be used to detect high values in your dataset which may be outliers as they will be merged at the end.<\/li>\n \t<li><strong>Average-linkage<\/strong>: calculates the average distance between clusters before merging.<\/li>\n \t<li><strong>Centroid-linkage<\/strong>: finds the centroid of cluster 1 and centroid of cluster 2, and then calculates the distance between the two before merging.<\/li>\n<\/ul>\nThe choice of linkage method entirely depends on you and there is no hard and fast method that will always give you good results. Different linkage methods lead to different clusters.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0291825 elementor-widget elementor-widget-text-editor\" data-id=\"0291825\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong>Example<\/strong>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-46b4590 elementor-widget elementor-widget-text-editor\" data-id=\"46b4590\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe following example traces a hierarchical clustering of distances in miles between US cities. The method of clustering is\u00a0<em>single-link.\u00a0<\/em>Let\u2019s say we have the input distance matrix below:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a497dbc elementor-widget elementor-widget-image\" data-id=\"a497dbc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028383874-X78CJC4XG42VJ1OI2XNR\/ke17ZwdGBToddI8pDm48kKTuWZVMOXRxRJy6-uIhDy8UqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKctztrvX1_OW2CDFPHgLNNTs75ATis74AkxCG0ImXjgmKRb-2FMB2yxGiJrfui_Qhm\/hierarchical-example1.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2093707 elementor-widget elementor-widget-text-editor\" data-id=\"2093707\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe nearest pair of cities is BOS and NY, at distance 206. These are merged into a single cluster called \u201cBOS\/NY\u201d. Then we compute the distance from this new compound object to all other objects. In single link clustering, the rule is that the distance from the compound object to another object is equal to the shortest distance from any member of the cluster to the outside object. So the distance from \u201cBOS\/NY\u201d to DC is chosen to be 233, which is the distance from NY to DC. Similarly, the distance from \u201cBOS\/NY\u201d to DEN is chosen to be 1771.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-26b2eab elementor-widget elementor-widget-text-editor\" data-id=\"26b2eab\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAfter merging BOS with NY:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-98db1d3 elementor-widget elementor-widget-image\" data-id=\"98db1d3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028402665-668B93J2CO38E0PP9YM9\/ke17ZwdGBToddI8pDm48kGHleu6BdHh0DFo_fCklrwAUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcLbfA2fUGxHa28KBs8pvpoZyunSdx2AtSs5ACJa4E7NPHqeIc7b1GhwrVTeOPbgEm\/hierarchical-example2.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eb20594 elementor-widget elementor-widget-text-editor\" data-id=\"eb20594\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe nearest pair of objects is BOS\/NY and DC, at distance 223. These are merged into a single cluster called \u201cBOS\/NY\/DC\u201d. Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7e95159 elementor-widget elementor-widget-text-editor\" data-id=\"7e95159\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\nAfter merging DC with BOS\/NY:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2e181d7 elementor-widget elementor-widget-image\" data-id=\"2e181d7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028435778-VL6MEULFPVJCR2JQIS8A\/ke17ZwdGBToddI8pDm48kOGmT-ZBFUyX7iQFisd54JoUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKc7oqsfUEoZe5q6TLuYjW2MUhlrbK5uYavLUTI3Qpmn00DBMcD4gAbHKjVVHqDk-0x\/hierarchical-example3.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9b6b0b5 elementor-widget elementor-widget-text-editor\" data-id=\"9b6b0b5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tNow, the nearest pair of objects is SF and LA, at distance 379. These are merged into a single cluster called \u201cSF\/LA\u201d. Then we compute the distance from this new cluster to all other objects, to get a new distance matrix.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7338ac4 elementor-widget elementor-widget-text-editor\" data-id=\"7338ac4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAfter merging SF with LA:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4e4b334 elementor-widget elementor-widget-image\" data-id=\"4e4b334\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028467594-WUOJ9UEXJK86PNXZFR8L\/ke17ZwdGBToddI8pDm48kJXulZpaJlFXZb32KnglIkkUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKccl4DapHx5BEGWa0ipV6RNI0AtN5eUxC6Sdl9OPeVu0exzahxoC3Jj3dwC7d32-U5\/hierarchical-example4.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ff1859c elementor-widget elementor-widget-text-editor\" data-id=\"ff1859c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tNow, the nearest pair of objects is CHI and BOS\/NY\/DC, at distance 671. These are merged into a single cluster called \u201cBOS\/NY\/DC\/CHI\u201d. Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-aa71d5d elementor-widget elementor-widget-text-editor\" data-id=\"aa71d5d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\nAfter merging CHI with BOS\/NY\/DC:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fa52a15 elementor-widget elementor-widget-image\" data-id=\"fa52a15\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028496685-172OAAQVZXRT97I1YPMQ\/ke17ZwdGBToddI8pDm48kKf5nUMZAOE4Xn86LwuEtEIUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKccq5KXqtNAJDFESt5kBzAEKe733xG6YsreE9D5lBIzmdFH4hwGcDyyVLO8CbRQCus\/hierarchical-example5.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-99f3731 elementor-widget elementor-widget-text-editor\" data-id=\"99f3731\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tNow, the nearest pair of objects is SEA and SF\/LA, at distance 808. These are merged into a single cluster called \u201cSF\/LA\/SEA\u201d. Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d14c4d3 elementor-widget elementor-widget-text-editor\" data-id=\"d14c4d3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAfter merging SEA with SF\/LA:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c70b4b1 elementor-widget elementor-widget-image\" data-id=\"c70b4b1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028496685-172OAAQVZXRT97I1YPMQ\/ke17ZwdGBToddI8pDm48kKf5nUMZAOE4Xn86LwuEtEIUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKccq5KXqtNAJDFESt5kBzAEKe733xG6YsreE9D5lBIzmdFH4hwGcDyyVLO8CbRQCus\/hierarchical-example5.png?format=750w\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7f11047 elementor-widget elementor-widget-image\" data-id=\"7f11047\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028507785-K9X8W5B5MLHC7GNOBQKT\/ke17ZwdGBToddI8pDm48kHorDMEA7WryRNpRhXQwTM8UqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcezi6UxER7BLhCT82I2e7ndDRT4vlq425Q-ngrxBBjqybIeFUSJpQoOKTuZ_jf6NE\/hierarchical-example6.png?format=750w\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b978b26 elementor-widget elementor-widget-text-editor\" data-id=\"b978b26\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tNow, the nearest pair of objects is DEN and BOS\/NY\/DC\/CHI, at distance 996. These are merged into a single cluster called \u201cBOS\/NY\/DC\/CHI\/DEN\u201d. Then we compute the distance from this new cluster to all other clusters, to get a new distance matrix.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b566ae2 elementor-widget elementor-widget-text-editor\" data-id=\"b566ae2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAfter merging DEN with BOS\/NY\/DC\/CHI:\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3095707 elementor-widget elementor-widget-image\" data-id=\"3095707\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028539073-U2FM4UXJGLNXKI9MVTVV\/ke17ZwdGBToddI8pDm48kH0hZO-CasUmpV9Bq4QSySkUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcxv2aOtcFDauUCxWZO_zcu2tcHqjqHC8RJL8FinSRAcrKpbEyhJBWu1Uvy-PIAIgF\/hierarchical-example7.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a97a880 elementor-widget elementor-widget-text-editor\" data-id=\"a97a880\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tNow, the nearest pair of objects is BOS\/NY\/DC\/CHI\/DEN and SF\/LA\/SEA, at distance 1059. These are merged into a single cluster called \u201cBOS\/NY\/DC\/CHI\/DEN\/SF\/LA\/SEA\u201d. Then we compute the distance from this new compound object to all other objects, to get a new distance matrix.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4221ba6 elementor-widget elementor-widget-text-editor\" data-id=\"4221ba6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAfter merging SF\/LA\/SEA with BOS\/NY\/DC\/CHI\/DEN:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3f2797e elementor-widget elementor-widget-image\" data-id=\"3f2797e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028565435-NSEPRM2TT6B8I795X7W3\/ke17ZwdGBToddI8pDm48kDgB9Is93GcCTBXB7wbWUZMUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcHXB6VYVJ6bYh9--dXOqWaT97O8mgYysMJpUHANDJ_xyqB4XoTGD2dUpacm1cv-ZY\/hierarchical-example8.png?format=750w\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-48d3f64 elementor-widget elementor-widget-image\" data-id=\"48d3f64\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028507785-K9X8W5B5MLHC7GNOBQKT\/ke17ZwdGBToddI8pDm48kHorDMEA7WryRNpRhXQwTM8UqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcezi6UxER7BLhCT82I2e7ndDRT4vlq425Q-ngrxBBjqybIeFUSJpQoOKTuZ_jf6NE\/hierarchical-example6.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7821329 elementor-widget elementor-widget-heading\" data-id=\"7821329\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>3\u200a\u2014\u200aDBSCAN<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1e1d37d elementor-widget elementor-widget-text-editor\" data-id=\"1e1d37d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tDBSCAN is an instance of density-based clustering models, in which we group points with similar density. It does a great job of seeking areas in the data that have a high density of observations, versus areas of the data that are not very dense with observations. DBSCAN can sort data into clusters of varying shapes as well, another strong advantage. DBSCAN works as such:\n<ul data-rte-list=\"default\">\n \t<li>Divides the dataset into\u00a0<em>n<\/em>\u00a0dimensions.<\/li>\n \t<li>For each point in the dataset, DBSCAN forms an\u00a0<em>n-dimensional<\/em>\u00a0shape around that data point and then counts how many data points fall within that shape.<\/li>\n \t<li>DBSCAN counts this shape as a\u00a0<em>cluster.<\/em>\u00a0DBSCAN iteratively expands the cluster, by going through each individual point within the cluster, and counting the number of other data points nearby.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9df003f elementor-widget elementor-widget-image\" data-id=\"9df003f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028690970-2S2XRXHS085XCPMZ4W1Z\/ke17ZwdGBToddI8pDm48kJj674Vd9nvHZkCowe6dMVt7gQa3H78H3Y0txjaiv_0fDoOvxcdMmMKkDsyUqMSsMWxHk725yiiHCCLfrh8O1z4YTzHvnKhyp6Da-NYroOW3ZGjoBKy3azqku80C789l0jeh37-sdiTHoeh6x3oATgAYiNDW_iVh62WRF-oMupih2XI1iljEpSVCHOYLzfHeKA\/image-asset.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-04e777f elementor-widget elementor-widget-image\" data-id=\"04e777f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028507785-K9X8W5B5MLHC7GNOBQKT\/ke17ZwdGBToddI8pDm48kHorDMEA7WryRNpRhXQwTM8UqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcezi6UxER7BLhCT82I2e7ndDRT4vlq425Q-ngrxBBjqybIeFUSJpQoOKTuZ_jf6NE\/hierarchical-example6.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7c33a58 elementor-widget elementor-widget-text-editor\" data-id=\"7c33a58\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIllustrated in the graphic above, the\u00a0<em>epsilon<\/em>\u00a0is the radius given to test the distance between data points. If a point falls within the\u00a0<em>epsilon<\/em>\u00a0distance of another point, those two points will be in the same cluster. Furthermore, the\u00a0<em>minimum number of points needed<\/em><strong><em>\u00a0<\/em><\/strong>is set to 4 in this scenario. When going through each data point, as long as DBSCAN finds 4 points within\u00a0<em>epsilon <\/em>distance of each other, a cluster is formed.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3044361 elementor-widget elementor-widget-text-editor\" data-id=\"3044361\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tDBSCAN does\u00a0<em>NOT<\/em>\u00a0necessarily categorize every data point and is therefore terrific with handling outliers in the dataset. Let&#8217;s examine the graphic below:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0ba2908 elementor-widget elementor-widget-image\" data-id=\"0ba2908\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028708361-P38E0BTEA9JN1MM1EXN0\/ke17ZwdGBToddI8pDm48kD39m-L6tE8WujE5yM9OgA0UqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYy7Mythp_T-mtop-vrsUOmeInPi9iDjx9w8K4ZfjXt2dtH3q-PjPKQ0EIZHEffM8_76C_H2M9Y8KGGJeJswGIxSCjLISwBs8eEdxAxTptZAUg\/image-asset.png?format=750w\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bfa6953 elementor-widget elementor-widget-image\" data-id=\"bfa6953\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1555028507785-K9X8W5B5MLHC7GNOBQKT\/ke17ZwdGBToddI8pDm48kHorDMEA7WryRNpRhXQwTM8UqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcezi6UxER7BLhCT82I2e7ndDRT4vlq425Q-ngrxBBjqybIeFUSJpQoOKTuZ_jf6NE\/hierarchical-example6.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0819855 elementor-widget elementor-widget-text-editor\" data-id=\"0819855\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe left image depicts a more traditional clustering method, such as K-Means, that does not account for multi-dimensionality. Whereas the right image shows how DBSCAN can contort the data into different shapes and dimensions in order to find similar clusters. We also notice in the right image, that the points along the outer edge of the dataset are not classified, suggesting they are outliers amongst the data.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3c4f38e elementor-widget elementor-widget-text-editor\" data-id=\"3c4f38e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAnd that\u2019s the end of this post on clustering! I hope you found this helpful and get a good grasp of the basics of clustering.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>INTRODUCTIONClustering is a Machine Learning technique that involves the grouping of data points. Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. In theory, data points that are in the same group should have similar properties and\/or features, while data points in different<\/p>\n","protected":false},"author":86,"featured_media":3480,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[95],"ppma_author":[1842],"class_list":["post-1856","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-big-data-amp-technology"],"authors":[{"term_id":1842,"user_id":86,"is_guest":0,"slug":"james-le","display_name":"James Le","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Le","first_name":"James","job_title":"","description":"James Le is a Software Developer with experiences in Product Management and Data Analytics. He played a pivotal role in the operation of a start-up organization at Denison University."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1856","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/86"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1856"}],"version-history":[{"count":4,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1856\/revisions"}],"predecessor-version":[{"id":36947,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1856\/revisions\/36947"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3480"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1856"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1856"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1856"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1856"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}