{"id":2204,"date":"2020-01-21T01:13:06","date_gmt":"2020-01-21T01:13:06","guid":{"rendered":"http:\/\/kusuaks7\/?p=1809"},"modified":"2024-01-25T15:33:23","modified_gmt":"2024-01-25T15:33:23","slug":"how-gpus-are-beginning-to-displace-clusters-for-big-data-and-data-science","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/how-gpus-are-beginning-to-displace-clusters-for-big-data-and-data-science\/","title":{"rendered":"How GPUs are Beginning to Displace Clusters for Big Data &amp; Data Science"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"2204\" class=\"elementor elementor-2204\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-d7275f5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"d7275f5\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-ae5bf28\" data-id=\"ae5bf28\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-657133da elementor-widget elementor-widget-text-editor\" data-id=\"657133da\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<em>More recently on my data science journey I have been using a low grade consumer GPU (NVIDIA GeForce 1060) to accomplish things that were previously only realistically capable on a cluster &#8211; here is why I think this is the direction data science will go in the next 5 years.<\/em>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cc859af elementor-widget elementor-widget-heading\" data-id=\"cc859af\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>Clusters Clusters Clusters!<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-095ad2a elementor-widget elementor-widget-text-editor\" data-id=\"095ad2a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tNow let me preface this article by saying that I don&#8217;t think GPU&#8217;s will replace clusters for\u00a0<strong>ALL<\/strong>\u00a0HPC use cases, however I do think we will see a leap to a more &#8216;GPU First&#8217; mindset over the coming years.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3a987a4 elementor-widget elementor-widget-text-editor\" data-id=\"3a987a4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tA standard workflow for many big-data based companies is to develop some kind of pipeline that takes in some data &#8211; mangles it in some way by combining it with other data or running some statistical analysis on it and then outputting the results to some kind of BI dashboard to produce insights.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1c04403 elementor-widget elementor-widget-text-editor\" data-id=\"1c04403\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tTo do this someone with experience would need to spin up and configure a cluster (Spark, MR or Similar) and provide the interface for departments to execute this on. The cluster could be anywhere from 3 nodes up to 1000+ nodes depending on the workflow.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ddf43df elementor-widget elementor-widget-text-editor\" data-id=\"ddf43df\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tEven at conservative estimates this gets expensive quickly and although Cloud platforms like AWS and GCP make this easier than ever before there is still a learning curve to getting it right and these platforms can cost serious money &#8211; especially when things don&#8217;t go right the first time\u00a0<em>(99% completion data-skew anyone?)<\/em>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-79fbcd7 elementor-widget elementor-widget-heading\" data-id=\"79fbcd7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>GPUs in place of a traditional Cluster<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8dddb85 elementor-widget elementor-widget-text-editor\" data-id=\"8dddb85\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe reason above it why I think many are turning to GPU&#8217;s. As a developer (especially freelance) we often don&#8217;t have the spare $1000&#8217;s to run a proof of concept to develop a pipeline like this at real scale, everyone has access to a GPU in some form or other and after all what is a GPU really\u00a0<em>other than a\u00a0<\/em><strong><em>self contained cluster<\/em><\/strong>?\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dc28f65 elementor-widget elementor-widget-text-editor\" data-id=\"dc28f65\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tNVIDIA GPU&#8217;s contain chips that have what are called \u201cCUDA Cores\u201d, each one of these cores is a miniature processor that can execute some code.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-546e280 elementor-widget elementor-widget-text-editor\" data-id=\"546e280\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tA popular consumer GPU \u2014 the GTX 1080 Ti is illustrated below, it shows that this card has 3584 CUDA cores that can process this data in parallel. If that doesn&#8217;t look like a multi-floor data-center to you then I don&#8217;t know what to say.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-528533b elementor-widget elementor-widget-text-editor\" data-id=\"528533b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tYou, reading this article right now have some form of GPU in the machine you are using, if you are the kind of person who requires any kind of graphics performance there might even be an NVIDIA card in there.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2f06ee5 elementor-widget elementor-widget-text-editor\" data-id=\"2f06ee5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<blockquote>Note: although I am referring to NVIDIA in this article, other GPU&#8217;s are also capable of performing the same tasks, however unfortunately the tooling isn&#8217;t as mature as what NVIDIA provides with the CUDA toolkit.<\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2944aaa elementor-widget elementor-widget-heading\" data-id=\"2944aaa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>Comparing performance of a Cluster with a GPU<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e46c59c elementor-widget elementor-widget-text-editor\" data-id=\"e46c59c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBear with me here as this isn&#8217;t as simple as comparing 2 systems, it is to a degree apples and oranges so lets focus on the outcomes.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-690b394 elementor-widget elementor-widget-text-editor\" data-id=\"690b394\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tOne tool that often needs to be completed by your average data-engineer is converting row based data to a columnar format (such as ORC or Parquet). This is also one of those tasks that can be performed on a single node right up to a 1000 node cluster with a fairly logarithmic increase in speed the more nodes that you add. We do this because columnar formats have many benefits over row based formats (this is for another article!).\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fc8efae elementor-widget elementor-widget-heading\" data-id=\"fc8efae\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>Cluster speed comparisons for converting CSV to Columnar<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-799173d elementor-widget elementor-widget-text-editor\" data-id=\"799173d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tA big data consultant that I have followed for some time helpfully produced some benchmarks of some cluster systems for performing this very task (<a href=\"https:\/\/tech.marksblogg.com\/faster-csv-to-orc-conversions.html\" rel=\"noopener\">https:\/\/tech.marksblogg.com\/faster-csv-to-orc-conversions.html<\/a>)\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-06f9a90 elementor-widget elementor-widget-text-editor\" data-id=\"06f9a90\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\nMark compared (what I consider) to the be the current generation of tools\u00a0<strong>Hive<\/strong>,\u00a0<strong>Presto\u00a0<\/strong>and\u00a0<strong>Spark\u00a0<\/strong>using a 21 node cluster on AWS EMR.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0ecb92e elementor-widget elementor-widget-text-editor\" data-id=\"0ecb92e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tHe used a 100GB New York Taxi rides dataset as the comparison for his benchmark. The fastest conversion result was that of Presto (no surprise &#8211; I love Presto!) that came out at 37 minutes.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8f9fe70 elementor-widget elementor-widget-text-editor\" data-id=\"8f9fe70\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAccording to the AWS cost calculator this costs\u00a0<strong>$430 USD per month<\/strong>\u00a0if used for a maximum of\u00a0<strong>2 hours per day<\/strong>\u00a0(Marks longest conversion)\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-435aff8 elementor-widget elementor-widget-heading\" data-id=\"435aff8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>GPU speed comparisons for converting CSV to Columnar<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-65af0d5 elementor-widget elementor-widget-text-editor\" data-id=\"65af0d5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tI didn&#8217;t have access to the same dataset Mark had so I used a similar sized one of our own\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-263f940 elementor-widget elementor-widget-text-editor\" data-id=\"263f940\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe dataset comes in at just over 2 Billion Rows and has 41 fields. The total size of this data is\u00a0<strong>397GB uncompressed<\/strong>\u00a0or around\u00a0<strong>127GB gzip compressed<\/strong>. This is about 25% larger than the dataset used for the cluster tests.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-172ae8a elementor-widget elementor-widget-text-editor\" data-id=\"172ae8a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tUsing the very excellent RAPIDS.ai framework (<a href=\"https:\/\/rapids.ai\/\" rel=\"noopener\">https:\/\/rapids.ai<\/a>\u00a0&#8211; Supported by NVIDIA) I imported the rapids.ai Docker container on my QNAP NAS\u00a0<em>(32GB, i7-6700, 14TB SATA, 2TB NVMe, GeForce GTX 1060 6GB)<\/em>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9cc2d55 elementor-widget elementor-widget-text-editor\" data-id=\"9cc2d55\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\nThen using the provided Jupyter notebook and my datasets I created a basic script that would handle the conversion:\n<pre><code>%%time\nimport dask_cudf as dc\n\nddf = dc.read_csv('\/data\/Data Files\/Vegas\/datafiles\/csv\/*.csv.gz', compression='gzip')\n\nCPU times: user 1.82 s, sys: 870 ms, total: 2.69 s\nWall time 6.99 s\n\n%%time\nddf = ddf.repartition(npartitions=3000)\n\nCPU times: user 60.2 ms, sys: 159 \u00b5s, total: 60.4 ms\nWall time: 57.6 ms\n\n%%time\nddf.to_orc('\/data\/Data Files\/Vegas\/datafiles\/orc\/')\n\nCPU times: user 1h 4min 4s, sys: 30min 19s, total 1h 34min 23s\nWall time: 41min 57s<\/code><\/pre>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c99f9e6 elementor-widget elementor-widget-text-editor\" data-id=\"c99f9e6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThis produced a total time of\u00a0<strong>~42 Minutes<\/strong>\u00a0&#8211; not too shabby but I felt we could do better by utilising the NVMe drives I have in my NAS.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b02e8ef elementor-widget elementor-widget-text-editor\" data-id=\"b02e8ef\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tI ran the same test except this time I read and wrote back to the NVMe drives which came out at\u00a0<strong>~31 Minutes,\u00a0<\/strong>a bit disappointing if I am honest considering the vast difference in read and write speeds. I think because I was reading and writing to a single drive it probably had some IO blocking going on &#8211; once I can get a second NVMe drive in the NAS I will try this again.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d8dff88 elementor-widget elementor-widget-heading\" data-id=\"d8dff88\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>How did a single $200 GPU beat a massive 21 node cluster at this task?<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d54809e elementor-widget elementor-widget-text-editor\" data-id=\"d54809e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tWell as I said at the start the test isn&#8217;t massively fair, there are deficiencies in this that with the current software are a way off testing &#8211; for example the re-partition stage isn&#8217;t ideal as it produces 3000 files where columnar storage formats do better with a smaller number of files (there is active work happening on Rapids.ai to solve this however!).\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5915faa elementor-widget elementor-widget-text-editor\" data-id=\"5915faa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBut mostly I would say because the computation is actually fairly suited for a GPU in this case and by having the &#8220;nodes&#8221; located so close together and without networking or other inefficiencies introduced a lot of the overheads are taken out. I am sure there are cases that more advanced data-science guys come across where a GPU isn&#8217;t as suitable (comment below as I am interested in what they are!)\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9cf4a02 elementor-widget elementor-widget-heading\" data-id=\"9cf4a02\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>Conclusion<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-27f823c elementor-widget elementor-widget-text-editor\" data-id=\"27f823c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAs a developer reading this (and one writing this) I know that I would be much more likely to try out a proof of concept if I know I can do it on my local computer with no additional costs.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8fea4e6 elementor-widget elementor-widget-text-editor\" data-id=\"8fea4e6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAnalytics India recently saw a large uptick in the number of developers using GPU processing in their jobs:\n<blockquote>Another significant change seen this year is the increase in the use of GPUs at work. While most of the data scientists still use PCs and similar models, the second-favourite product is Nvidia GeForce GTX 9 Series GPU. The number of people using it has grown from a mere\u00a08%\u00a0last year, to\u00a028%\u00a0in 2019\u00a0 \u00a0<a href=\"https:\/\/analyticsindiamag.com\/data-science-skills-study-2019-by-aim-imarticus-learning\/\" rel=\"noopener\">https:\/\/analyticsindiamag.com\/data-science-skills-study-2019-by-aim-imarticus-learning\/<\/a><\/blockquote>\nI think as the data-science tooling for GPU&#8217;s gets better and the price for GPU&#8217;s reduces, even older model GPU&#8217;s such as what I am using can be used to demonstrate and get executive buy-in for a GPU based strategy going forward.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a2561a1 elementor-widget elementor-widget-text-editor\" data-id=\"a2561a1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIf my little QNAP can produce results like this on a 4 year old GPU &#8211; imagine what the latest Tesla Turing and P100 models can produce for a few bucks an hour on any cloud provider. Then imagine putting multiple GPU&#8217;s into an instance to accomplish things even faster.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7f6e10a elementor-widget elementor-widget-text-editor\" data-id=\"7f6e10a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tWe truly are entering the age of GPU data processing for the masses.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-484d5b2 elementor-widget elementor-widget-text-editor\" data-id=\"484d5b2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThis article was originally published on <a href=\"https:\/\/hackernoon.com\/how-gpus-are-beginning-to-displace-clusters-for-data-science-opbn36pv\" rel=\"noopener\">Hackernoon<\/a>.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>A significant change seen 2019 is the increase in the use of GPUs at work. While most of the data scientists still use PCs and similar models, the second-favourite product is Nvidia GeForce GTX 9 Series GPU. The number of people using it has grown from a mere&nbsp;8%&nbsp;last year, to&nbsp;28%&nbsp;in 2019. As the data-science tooling for GPU&#8217;s gets better and the price for GPU&#8217;s reduces, even older model GPU&#8217;s can be used to demonstrate and get executive buy-in for a GPU based strategy going forward.<\/p>\n","protected":false},"author":710,"featured_media":3398,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[94],"ppma_author":[3524],"class_list":["post-2204","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":3524,"user_id":710,"is_guest":0,"slug":"dan-voyce","display_name":"Dan Voyce","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Voyce","first_name":"Dan","job_title":"","description":"Dan Voyce is Chief Technology Officer at LOCALLY. He is a Linux Foundation Certified Systems Engineer and an accomplished systems architect."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2204","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/710"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=2204"}],"version-history":[{"count":6,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2204\/revisions"}],"predecessor-version":[{"id":35660,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2204\/revisions\/35660"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3398"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=2204"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=2204"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=2204"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=2204"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}