{"id":1105,"date":"2019-02-15T10:31:58","date_gmt":"2019-02-15T10:31:58","guid":{"rendered":"http:\/\/kusuaks7\/?p=710"},"modified":"2023-09-19T14:10:36","modified_gmt":"2023-09-19T14:10:36","slug":"where-can-i-get-relevant-data","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/where-can-i-get-relevant-data\/","title":{"rendered":"The Case for Using Data Simulators to Drive Big Data Success"},"content":{"rendered":"<p><em><strong>Ready to learn Big Data?\u00a0<a href=\"https:\/\/www.experfy.com\/training\/tracks\/big-data-training-certification\">Browse courses<\/a>\u00a0developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/strong><\/em><\/p>\n<h3><strong><span style=\"font-size: 18px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">Business Rationale for a Data Simulator<\/span><\/span><\/strong><\/h3>\n<p><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">While a lot has been said and written about the business value of unifying\u00a0data silos for insights, Big Data solution providers\u00a0often encounter problems\u00a0in convincing their\u00a0customers to break\u00a0down these silos. \u00a0It is not that customers are unwilling to act by either finding data in-house\u00a0or procuring\u00a0third\u00a0party data that can be brought together, they need to be convinced that this extra effort will result in significantly enhanced business outcomes. The speed of vendor access to data in a complex corporate\u00a0environment\u00a0is directly proportional to the number of legacy systems and proprietary mechanisms in the company&#8217;s IT operations.\u00a0Case studies and generic demos only help\u00a0in starting a conversation\u00a0but not in closing the deal. \u00a0As a solutions provider trying to get work done, this became a\u00a0challenging\u00a0issue, and\u00a0to address it we started using Data Simulators to break the impasse. The use of a simulator allowed us to generate synthetic data in numerous types, shapes, and value to suit most business cases. It also helped our Proof of Concepts (POCs) to be more focused on insights and action, and less on ingestion and ETL processes.\u00a0While ingestion of data is a real problem and needs to be solved, we found that our customers prefer deferring it to the production phase rather than the POC phase. This way the insight horse was always ahead of the ingestion cart!\u00a0Data simulators are also\u00a0accelerating our deployments in newer domains where there is little historical data or where it&#8217;s\u00a0hard to get data \u2013 wearables, industrial internet,\u00a0expensive third-party data and more.<\/span><\/span><\/p>\n<p><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">Now that I have said something to highlight the business value of a Data Simulator, I present below a deep dive of the architecture and implementation of our simulator &#8211; BigSim.<\/span><\/span><\/p>\n<h3><strong><span style=\"font-size: 18px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">Under The Hood \u2013 BigSim<\/span><\/span><\/strong><\/h3>\n<p><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">BigSim is designed to provide flexibility and control in generating large data sets through templates and minimal coding. Users just need to provide the data specifications in an XML template defining the semantic type, range, volume, velocity, and shape.\u00a0Since much of the data generation process is an independent task, multiple simulator instances can run independently on different machines;\u00a0thereby creating large data sets that can be pushed to a common data storage, or streamed.\u00a0These simulated data sets can\u00a0be used for capacity planning, what-if scenario testing, extrapolating\u00a0small data sets with certain amount of randomness so as to simulate real-world data sets, fill in missing data in incomplete data sets and such.<\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><strong><span style=\"font-size: 18px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">Key Features of BigSim<\/span><\/span><\/strong><\/h3>\n<h3>Extensibility and Adaptability<\/h3>\n<p><!-- [if--><!--[endif]--><\/p>\n<p><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">The simulator can easily be extended and adapted to generate custom data patterns using a library of pre-built primitive and user defined types.The XML snippets below show examples of how this can be done.<\/span><\/span><\/p>\n<h3>Fine Grain Control<\/h3>\n<p><!--![endif]--><!--![if--><\/p>\n<p><!-- [if--><!--[endif]--><\/p>\n<p><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">A robust simulation platform should be able to support easy control of the volume and velocity of the data to support multiple usage scenarios. Smart grids, Black Friday sales, high frequency trading, and Twitter fire hose, all generate data of varying types, volumes and velocity. BigSim provides adequate dials and knobs to deal with such needs.<\/span><\/span><\/p>\n<p><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">This load distribution template shown below generates data records for an hour with varying loads distributed across different time slices. \u00a0<\/span><\/span><\/p>\n<h3>Support for Data in Motion and Data at Rest<\/h3>\n<p><!-- [if--><!--[endif]--><\/p>\n<p><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">With streaming analytics gaining popularity alongside batch analytics, simulators are expected to generate large volumes of data to support both forms of analytics. BigSim has the ability to push data into a <\/span><\/span><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">CSV<\/span><\/span><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\"> file and into various SQL and NoSQL databases. It can also stream the generated data in real-time or at desired intervals for consumption by stream-based services.<\/span><\/span><\/p>\n<p><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">The snippet below shows the configuration for a Batch (<\/span><\/span><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">CSV<\/span><\/span><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">, Cassandra) and Streaming data generation.<\/span><\/span><\/p>\n<h3><big><strong><span style=\"font-size: 18px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">Conclusion<\/span><\/span><\/strong><\/big><\/h3>\n<p><span style=\"font-size: 14px;\"><span style=\"font-family: arial,helvetica,sans-serif;\">For a long time now, simulators \u00a0have played a vital role in engineering domain with offerings such as wind tunnels, flight simulators, and load and stress testers. These have without a doubt resulted in bringing innovative and safer products faster to market.\u00a0 Our experience has shown that rolling out data-driven products and services targeting both\u00a0enterprises and consumers can be accelerated through a robust data simulator. Big Data projects no longer have to be stymied by <em>not enough data, cannot access data, missing data, or incorrect data.<\/em><\/span><\/span><\/p>\n<p><!--![endif]--><!--![if--><!--![endif]--><!--![if--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data simulators can be an important asset to a company because they can emulate the data you don&#8217;t have access to, and help you gauge the compatibility of your tools accordingly.<\/p>\n","protected":false},"author":27,"featured_media":4073,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[95],"ppma_author":[2453],"class_list":["post-1105","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-big-data-amp-technology"],"authors":[{"term_id":2453,"user_id":27,"is_guest":0,"slug":"ravi-condamoor","display_name":"Ravi Condamoor","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Condamoor","first_name":"Ravi","job_title":"","description":"After working at some of the industry&nbsp;leaders such as IBM and Oracle, Ravi has co-founded multiple successful companies in the Big Data &amp; Analytics industry. He has experience in building scalable products in domains in including&nbsp;Healthcare, Ad Tech, Media and Industrial Internet.&nbsp;"}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1105","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/27"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1105"}],"version-history":[{"count":3,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1105\/revisions"}],"predecessor-version":[{"id":33033,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1105\/revisions\/33033"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/4073"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1105"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1105"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1105"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1105"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}