{"id":11039,"date":"2020-11-06T11:30:06","date_gmt":"2020-11-06T11:30:06","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=11039"},"modified":"2023-10-09T19:56:28","modified_gmt":"2023-10-09T19:56:28","slug":"big-data-analysis-spark-hadoop","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/big-data-analysis-spark-hadoop\/","title":{"rendered":"Big Data Analysis: Spark and Hadoop"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"11039\" class=\"elementor elementor-11039\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-4c3de59 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4c3de59\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-0cd632e\" data-id=\"0cd632e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5c6f26d elementor-widget elementor-widget-text-editor\" data-id=\"5c6f26d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/p>\n<p<p class=\"has-normal-font-size\"><strong><em>Introduction to Big Data and the different techniques employed to handle it such as MapReduce, Apache Spark and Hadoop.<\/em><\/strong><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-dbb3887 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"dbb3887\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d626d5d\" data-id=\"d626d5d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9a674cf elementor-widget elementor-widget-heading\" data-id=\"9a674cf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Big Data<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-d4c5f48 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"d4c5f48\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-09ec1b5\" data-id=\"09ec1b5\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-cc2d5d4 elementor-widget elementor-widget-text-editor\" data-id=\"cc2d5d4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>According to Forbes, about 2.5 quintillion bytes of data is generated every day. Nonetheless, this number is just projected to constantly increase in the following years (90% of nowadays stored data has been produced within the last two years) [1].<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a744b04 elementor-widget elementor-widget-text-editor\" data-id=\"a744b04\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>What makes Big Data different from any other large amount of data stored in relational databases is its heterogeneity. The data comes from different sources and has been recorded using different formats.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-de4e645 elementor-widget elementor-widget-text-editor\" data-id=\"de4e645\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThree different ways of formatting data are commonly employed:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list -->\n<ul>\n<li><strong>Unstructured<\/strong>\u00a0= unorganised data (eg. videos).<\/li>\n<li><strong>Semi-structured<\/strong>\u00a0= the data is organised in a not fixed format (eg. JSON).<\/li>\n<li><strong>Structured<\/strong>\u00a0= the data is stored in a structured format (eg. RDBMS).<\/li>\n<\/ul>\n<!-- \/wp:list -->\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-204ccf3 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"204ccf3\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-97e3df3\" data-id=\"97e3df3\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d0e411d elementor-widget elementor-widget-text-editor\" data-id=\"d0e411d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Big Data is defined by three properties:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1982cd2 elementor-widget elementor-widget-text-editor\" data-id=\"1982cd2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><!-- wp:list {\"ordered\":true} --><\/p>\n<p>\u00a0<\/p>\n<p><strong>Volume\u00a0<\/strong>= because of the large amount of data, storing data on a single machine is impossible. How can we process data across multiple machines assuring fault tolerance?<\/p>\n<p>\u00a0<\/p>\n<p><strong>Variety<\/strong>= How can we deal with data coming from varied sources which have been formatted using different schemas?<\/p>\n<p>\u00a0<\/p>\n<p><strong>Velocity\u00a0<\/strong>= How can we quickly store and process new data?<\/p>\n<p><br \/><br \/><\/p>\n<!-- \/wp:list -->\n<p>\u00a0<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-5178ffe elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5178ffe\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-7ce6900\" data-id=\"7ce6900\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a80667b elementor-widget elementor-widget-text-editor\" data-id=\"a80667b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Big Data can be analysed using two different processing techniques:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f4140f8 elementor-widget elementor-widget-text-editor\" data-id=\"f4140f8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>\u00a0<\/p>\n<ul>\n<li><strong>Batch processing<\/strong>= usually used if we are concerned by the volume and variety of our data. We first store all the needed data and then process it in one go (this can lead to high latency). A common application example can be calculating monthly payroll summaries.<\/li>\n<li><strong>Stream processing<\/strong> = usually employed if we are interested in fast response times. We process our data as soon as is received (low latency). An application example can be determining if a bank transaction is fraudulent or not.<\/li>\n<\/ul>\n<p><!-- \/wp:list --><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-3440f9c elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"3440f9c\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4a2a26d\" data-id=\"4a2a26d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9cf3d6d elementor-widget elementor-widget-text-editor\" data-id=\"9cf3d6d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Big Data can be processed using different tools such as MapReduce, Spark, Hadoop, Pig, Hive, Cassandra and Kafka. Each of these different tools has its advantages and disadvantages which determines how companies might decide to employ them [2].<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-7f26312 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"7f26312\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-aae97b6\" data-id=\"aae97b6\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-eb2d34f elementor-widget elementor-widget-image\" data-id=\"eb2d34f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"552\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_OxQRy91ZgWWgaO0RHgqO1Q-1024x552.png\" class=\"attachment-large size-large wp-image-33350\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_OxQRy91ZgWWgaO0RHgqO1Q-1024x552.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_OxQRy91ZgWWgaO0RHgqO1Q-300x162.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_OxQRy91ZgWWgaO0RHgqO1Q-768x414.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_OxQRy91ZgWWgaO0RHgqO1Q-1536x828.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_OxQRy91ZgWWgaO0RHgqO1Q-610x329.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_OxQRy91ZgWWgaO0RHgqO1Q-750x404.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_OxQRy91ZgWWgaO0RHgqO1Q-1140x614.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_OxQRy91ZgWWgaO0RHgqO1Q.png 1843w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-952b1b4 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"952b1b4\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8eb71d3\" data-id=\"8eb71d3\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-227d00e elementor-widget elementor-widget-text-editor\" data-id=\"227d00e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Big Data Analysis is now commonly used by many companies to predict market trends, personalise customers experiences, speed up companies workflow, etc\u2026<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-fa12d18 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"fa12d18\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4090c23\" data-id=\"4090c23\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-73dbf90 elementor-widget elementor-widget-heading\" data-id=\"73dbf90\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">MapReduce<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-ab45d53 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ab45d53\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-5267385\" data-id=\"5267385\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-4e04916 elementor-widget elementor-widget-text-editor\" data-id=\"4e04916\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>When working with a large amount of data and we run out of resources there are two possible solutions: scaling horizontally or vertically.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a020c4f elementor-widget elementor-widget-text-editor\" data-id=\"a020c4f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In horizontal scaling, we solve this problem by adding more machines of the same capacity and distributing the workload. If using vertical scaling we instead scale by adding more computational power to our machine (eg. CPU, RAM).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a6abc4a elementor-widget elementor-widget-text-editor\" data-id=\"a6abc4a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Vertical scaling is easier to manage and control than horizontal scaling and is proved to be efficient if working with a relatively small size problem. Although, horizontal scaling is generally less expensive and faster than vertical scaling when working with a large problem.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f58e9f8 elementor-widget elementor-widget-text-editor\" data-id=\"f58e9f8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>MapReduce is based upon horizontal scaling. In MapReduce, a cluster of computers is used for parallelization making so easier to handle Big Data.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ffd7fc2 elementor-widget elementor-widget-text-editor\" data-id=\"ffd7fc2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In MapReduce, we take the input data and divide it into many parts. Each part is then sent to a different machine to be processed and finally aggregated according to a specified <em>groupby <\/em>function.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-065054f elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"065054f\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a3774c3\" data-id=\"a3774c3\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e80ddf6 elementor-widget elementor-widget-image\" data-id=\"e80ddf6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"831\" height=\"438\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_MrHPIUqCywZJx5QDuFospw.jpeg\" class=\"attachment-large size-large wp-image-33351\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_MrHPIUqCywZJx5QDuFospw.jpeg 831w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_MrHPIUqCywZJx5QDuFospw-300x158.jpeg 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_MrHPIUqCywZJx5QDuFospw-768x405.jpeg 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_MrHPIUqCywZJx5QDuFospw-610x322.jpeg 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_MrHPIUqCywZJx5QDuFospw-750x395.jpeg 750w\" sizes=\"(max-width: 831px) 100vw, 831px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-47b3fe9 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"47b3fe9\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-ffb0f3e\" data-id=\"ffb0f3e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-cb39a2e elementor-widget elementor-widget-heading\" data-id=\"cb39a2e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Apache Spark<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-e4fb3f6 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"e4fb3f6\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a1d1ae2\" data-id=\"a1d1ae2\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ffe304c elementor-widget elementor-widget-text-editor\" data-id=\"ffe304c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The Apache Spark framework has been developed as an advancement of MapReduce. What makes Spark stand out from its competitors is its execution speed, which is about 100 times faster than MapReduce (intermediated results are not stored and everything is executed in memory).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0d1792b elementor-widget elementor-widget-text-editor\" data-id=\"0d1792b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Apache Spark is commonly used for:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-865e1cb elementor-widget elementor-widget-text-editor\" data-id=\"865e1cb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>\u00a0<\/p>\n<ol>\n<li style=\"list-style-type: none;\">\n<ol>\n<li>Reading stored and real-time data.<\/li>\n<li>Preprocess a large amount of data (SQL).<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p>Analyse data using Machine Learning and process graph networks.<\/p>\n<p>\u00a0<\/p>\n<p><!-- \/wp:list --><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-dcc2e50 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"dcc2e50\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-65d9980\" data-id=\"65d9980\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-db80d34 elementor-widget elementor-widget-image\" data-id=\"db80d34\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"570\" height=\"268\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_n62DivImmGFUwAU7C1nTNw.png\" class=\"attachment-large size-large wp-image-33352\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_n62DivImmGFUwAU7C1nTNw.png 570w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_n62DivImmGFUwAU7C1nTNw-300x141.png 300w\" sizes=\"(max-width: 570px) 100vw, 570px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-e1ad542 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"e1ad542\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b24b432\" data-id=\"b24b432\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-b4dfadf elementor-widget elementor-widget-text-editor\" data-id=\"b4dfadf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>When using Spark our <a href=\"https:\/\/www.experfy.com\/hire\/big-data-management\">Big Data is parallelized<\/a>using Resilient Distributed Datasets (RDDs). RDDs are Apache Spark\u2019s most basic abstraction, which takes our original data and divides it across different clusters (workers). RRDs are fault tolerant, which means they are able to recover the data lost in case any of the workers fail.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fd3caa5 elementor-widget elementor-widget-text-editor\" data-id=\"fd3caa5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>RDDs can be used to perform two types of operations in Spark: Transformations and Actions (Figure 4).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-ece4a00 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ece4a00\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-6df520c\" data-id=\"6df520c\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-47fbd17 elementor-widget elementor-widget-image\" data-id=\"47fbd17\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"764\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_C0kdjWUggJqdKgWvjlWxqA.png\" class=\"attachment-large size-large wp-image-33353\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_C0kdjWUggJqdKgWvjlWxqA.png 512w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_C0kdjWUggJqdKgWvjlWxqA-201x300.png 201w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-b703aa4 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"b703aa4\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4cec256\" data-id=\"4cec256\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-8e8faad elementor-widget elementor-widget-text-editor\" data-id=\"8e8faad\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Transformations create new datasets from RDDs and returns as result an RDD (eg. map, filter and reduce by key operations). All transformations are lazy, they are executed just once when an action is called (they are placed in an execution map and then performed when an Action is called).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-851188b elementor-widget elementor-widget-text-editor\" data-id=\"851188b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Actions are instead used to get our analysis results out of Apache Spark and return a value to our Python\/R application (eg. collect and take operations).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-56f7352 elementor-widget elementor-widget-text-editor\" data-id=\"56f7352\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In order to store key\/value pairs in Spark, Pair RDDs are used. Pair RDDs are formed by two RRDs stored in a tuple. The first tuple element is used to store the key values and the second one to store value elements (key, value).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-7f73a62 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"7f73a62\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-bac610f\" data-id=\"bac610f\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d4ff989 elementor-widget elementor-widget-heading\" data-id=\"d4ff989\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Hadoop<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-6d4cd98 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6d4cd98\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-798bbb6\" data-id=\"798bbb6\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2864ffa elementor-widget elementor-widget-text-editor\" data-id=\"2864ffa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Hadoop is a set of open source programs written in Java which can be used to perform operations on a large amount of data. Hadoop is a scalable, distributed and fault tolerant ecosystem. The main components of Hadoop are [6]:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-12284de elementor-widget elementor-widget-text-editor\" data-id=\"12284de\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Hadoop YARN<\/strong> = manages and schedules the resources of the system, dividing the workload on a cluster of machines.<\/li><\/p>\n<p><li><strong>Hadoop Distributed File System (HDFS)<\/strong> = is a clustered file storage system which is designed to be fault-tolerant, offer high throughput and high bandwidth. It is additionally able to store any type of data in any possible format.<\/li><\/p>\n<p><li><strong>Hadoop MapReduce<\/strong> = is used for loading the data from a database, formatting it and performing a quantitative analysis on it.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-dc33345 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"dc33345\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-0aa60ca\" data-id=\"0aa60ca\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2f7cfe9 elementor-widget elementor-widget-image\" data-id=\"2f7cfe9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_WV4svFnRNAPPUzIxMbsNvg-1024x536.jpeg\" class=\"attachment-large size-large wp-image-33354\" alt=\"\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_WV4svFnRNAPPUzIxMbsNvg-1024x536.jpeg 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_WV4svFnRNAPPUzIxMbsNvg-300x157.jpeg 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_WV4svFnRNAPPUzIxMbsNvg-768x402.jpeg 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_WV4svFnRNAPPUzIxMbsNvg-610x319.jpeg 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_WV4svFnRNAPPUzIxMbsNvg-750x392.jpeg 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/11\/1_WV4svFnRNAPPUzIxMbsNvg.jpeg 1080w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-85b14c9 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"85b14c9\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3e98fbc\" data-id=\"3e98fbc\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a8251b9 elementor-widget elementor-widget-text-editor\" data-id=\"a8251b9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Some application examples of Hadoop are: search (eg. Yahoo), log processing\/Data warehouse (eg. Facebook) and Video\/Image Analysis (eg. New York Times).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a8ec381 elementor-widget elementor-widget-text-editor\" data-id=\"a8ec381\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Hadoop has traditionally been the first system to make MapReduce available on a large scale, although Apache Spark is nowadays the framework of preference by many companies thanks to it\u2019s greater execution speed.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-57128b0 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"57128b0\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a12c6d2\" data-id=\"a12c6d2\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ea8a25f elementor-widget elementor-widget-heading\" data-id=\"ea8a25f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Conclusion<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-a907be7 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a907be7\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-ad26e29\" data-id=\"ad26e29\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-8e9000c elementor-widget elementor-widget-text-editor\" data-id=\"8e9000c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The term Big Data has initially been coined to describe a problem: we are generating more data than we can actually process. After years of research and technology advancements, Big Data is instead now seen as an opportunity. Thanks to Big Data recent advancements in Artificial Intelligence and Deep Learning have been possible, enabling machines to perform tasks which seemed to be impossible to perform just a few years ago.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-1342287 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"1342287\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-96b14f7\" data-id=\"96b14f7\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9cf6383 elementor-widget elementor-widget-heading\" data-id=\"9cf6383\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\">Bibliography<\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-95717ae elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"95717ae\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a552fbf\" data-id=\"a552fbf\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-17003d9 elementor-widget elementor-widget-text-editor\" data-id=\"17003d9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>1] What is Big Data? \u2014 A Beginner\u2019s Guide to the World of Big Data. Anushree Subramaniam, edureka! . Accessed at:<a href=\"https:\/\/www.edureka.co\/blog\/what-is-big-data\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.edureka.co\/blog\/what-is-big-data\/<\/a><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-de9d5fd elementor-widget elementor-widget-text-editor\" data-id=\"de9d5fd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>[2] See some Best-Known Big Data tools, their Advantages and Disadvantages to Analyze your Data. House of Bots. Accessed at: https:\/\/www.houseofbots.com\/news-detail\/12023-1-see-some-best-known-big-data-tools,-there-advantages-and-disadvantages-to-analyze-your-data&lt;<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2b5f476 elementor-widget elementor-widget-text-editor\" data-id=\"2b5f476\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>[3] What is MapReduce?. Shana Pearlman, talend. Accessed at: &lt;a href=&#8221;https:\/\/www.talend.com\/resources\/what-is-mapreduce\/&#8221; target=&#8221;_blank&#8221; rel=&#8221;noreferrer noopener&#8221;&gt;https:\/\/www.talend.com\/resources\/what-is-mapreduce\/&lt;\/a&gt;&lt;<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-078e6f7 elementor-widget elementor-widget-text-editor\" data-id=\"078e6f7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>[4] Apache Spark Documentation. Accessed at: <a href=\"https:\/\/spark.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/spark.apache.org\/<\/a><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-321a4b5 elementor-widget elementor-widget-text-editor\" data-id=\"321a4b5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>5] How Apache Spark\u2019s Transformations And Action works\u2026. Alex Anthony, Medium. Accessed at:\u00a0<a href=\"https:\/\/medium.com\/@aristo_alex\/how-apache-sparks-transformations-and-action-works-ceb0d03b00d0\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">https:\/\/medium.com\/@aristo_alex\/how-apache-sparks-transformations-and-action-works-ceb0d03b00d0<\/a><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5946d74 elementor-widget elementor-widget-text-editor\" data-id=\"5946d74\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>[6] Apache Hadoop Explained in 5 Minutes or Less. CREDERA, Shravanthi Denthumdas. Accessed at: <a href=\"https:\/\/www.credera.com\/blog\/technology-insights\/open-source-technology-insights\/apache-hadoop-explained-5-minutes-less\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.credera.com\/blog\/technology-insights\/open-source-technology-insights\/apache-hadoop-explained-5-minutes-less\/<\/a><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7a7f9e6 elementor-widget elementor-widget-text-editor\" data-id=\"7a7f9e6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>[7] Hadoop Ecosystem and Their Components \u2014 A Complete Tutorial. Data Flair. Accessed at:\u00a0<a href=\"https:\/\/data-flair.training\/blogs\/hadoop-ecosystem-components\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">https:\/\/data-flair.training\/blogs\/hadoop-ecosystem-components\/<\/a><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>The term Big Data has initially been coined to describe a problem: we are generating more data than we can actually process. After years of research and technology advancements, Big Data is instead now seen as an opportunity. Thanks to Big Data recent advancements in Artificial Intelligence and Deep Learning have been possible, enabling machines to perform tasks which seemed to be impossible to perform just a few years ago.<\/p>\n","protected":false},"author":952,"featured_media":11041,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[97,122,206,143,922,921],"ppma_author":[3676],"class_list":["post-11039","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-artificial-intelligence","tag-big-data","tag-deep-learning","tag-hadoop","tag-mapreduce","tag-spark"],"authors":[{"term_id":3676,"user_id":952,"is_guest":0,"slug":"pier-paolo-ippolito","display_name":"Pier Paolo Ippolito","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/Pier-Paolo-Ippolito-150x150.jpg","user_url":"https:\/\/pierpaolo28.github.io\/","last_name":"Paolo Ippolito","first_name":"Pier","job_title":"","description":"Pier Paolo Ippolito is a Data Scientist and MSc in Artificial Intelligence graduate with an interest in research areas such as Data Science, Machine Learning, and Cloud Development. Aside from his work activities, he is a freelancer and technical writer."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/11039","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/952"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=11039"}],"version-history":[{"count":8,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/11039\/revisions"}],"predecessor-version":[{"id":33357,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/11039\/revisions\/33357"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/11041"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=11039"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=11039"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=11039"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=11039"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}