{"id":1299,"date":"2019-02-15T10:32:03","date_gmt":"2019-02-15T07:32:03","guid":{"rendered":"http:\/\/kusuaks7\/?p=904"},"modified":"2023-08-29T16:59:44","modified_gmt":"2023-08-29T16:59:44","slug":"sixteen-useful-advices-for-aspiring-data-scientists","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/sixteen-useful-advices-for-aspiring-data-scientists\/","title":{"rendered":"Sixteen useful Advices for Aspiring Data Scientists"},"content":{"rendered":"<p><strong><em>Ready to learn Data Science? <a href=\"https:\/\/www.experfy.com\/training\/courses\">Browse courses<\/a>\u00a0like\u00a0<a href=\"https:\/\/www.experfy.com\/training\/tracks\/data-science-training-certification\">Data Science Training and Certification<\/a> developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/em><\/strong><\/p>\n<p>Why is data science sexy? It has something to do with so many new applications and entire new industries come into being from the judicious use of copious amounts of data. Examples include speech recognition, object recognition in computer vision, robots and self-driving cars, bioinformatics, neuroscience, the discovery of exoplanets and an understanding of the origins of the universe, and the assembling of inexpensive but winning baseball teams. In each of these instances, the data scientist is central to the whole enterprise. He\/she must combine knowledge of the application area with statistical expertise and implement it all using the latest in computer science ideas.<\/p>\n<p>In the end, sexiness comes down to being effective. I recently read\u00a0<a href=\"https:\/\/www.amazon.com\/Data-Scientists-Work-Sebastian-Gutierrez\/dp\/1430265981\" target=\"_blank\" rel=\"noopener noreferrer\"><strong><em>Sebastian Gutierrez\u2019s \u201cData Scientists at Work\u201d<\/em><\/strong><\/a>, in which he interviewed 16 data scientists across 16 different industries to understand both how they think about it theoretically and also very practically what problems they\u2019re solving, how data\u2019s helping, and what it takes to be successful. All 16 interviewees are at the forefront of understanding and extracting value from data across an array of public and private organizational types \u2014 from startups and mature corporations to primary research groups and humanitarian nonprofits \u2014 and across a diverse range of industries \u2014 advertising, e-commerce, email marketing, enterprise cloud computing, fashion, industrial internet, internet television and entertainment, music, nonprofit, neurobiology, newspapers and media, professional and social networks, retail, sales intelligence, and venture capital.<\/p>\n<p>In particular, Sebastian asked open-ended questions so that the personalities and spontaneous thought processes of each interviewee would shine through clearly and accurately. The practitioners in this book share their thoughts on what data science means to them and how they think about it, their suggestions on how to join the field, and their wisdom won through experience on what a data scientist must understand deeply to be successful within the field.<\/p>\n<p>In this post, I want to share the best answers that these data scientists gave for the question:<\/p>\n<p><strong><em>\u201cWhat advice would you give to someone starting out in data science?\u201d<\/em><\/strong><\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" style=\"width: 650px; height: 296px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5a6caeda8165f5ccc7762a5c\/1517072096055\/cover.jpeg?format=750w\" alt=\"experfy-blog\" \/><\/p>\n<p><strong>1\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/wiggins\/\" target=\"_blank\" rel=\"noopener noreferrer\">CHRIS WIGGINS<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>CHIEF DATA SCIENTIST AT\u00a0THE NEW YORK TIMES AND ASSOCIATE PROFESSOR OF APPLIED MATHEMATICS AT COLUMBIA<\/em><\/strong><\/p>\n<p>\u201cCreativity and caring. You have to really like something to be willing to think about it hard for a long time. Also, some level of skepticism.\u00a0So that\u2019s one thing I like about PhD students \u2014 five years is enough time for you to have a discovery, and then for you to realize all of the things that you did wrong along the way. It\u2019s great for you intellectually to go back and forth from thinking \u201ccold fusion\u201d to realizing, \u201cOh, I actually screwed this up entirely,\u201d and thus making a series of mistakes and fixing them. I do think that the process of going through a PhD is useful for giving you that skepticism about what looks like a sure thing, particularly in research. I think that\u2019s useful because, otherwise, you could easily too quickly go down a wrong path \u2014 just because your first encounter with the path looked so promising.<\/p>\n<p>And although it\u2019s a boring answer, the truth is you need to actually have technical depth. Data science is not yet a field, so there are no credentials in it yet. It\u2019s very easy to get a Wikipedia-level understanding of, say, machine learning. For actually doing it, though, you really need to know what the right tool is for the right job, and you need to have a good understanding of all the limitations of each tool.\u00a0There\u2019s no shortcut for that sort of experience.\u00a0You have to make many mistakes. You have to find yourself shoehorning a classification problem into a clustering problem, or a clustering problem into a hypothesis testing problem.<\/p>\n<p>Once you find yourself trying something out, confident that it\u2019s the right thing, then finally realizing you were totally dead wrong, and experiencing that many times over \u2014 that\u2019s really a level of experience that unfortunately there\u2019s not a shortcut for. You just have to do it and keep making mistakes at it, which is another thing I like about people who have been working in the field for several years. It takes a long time to become an expert in something. It takes years of mistakes. This has been true for centuries. There\u2019s a quote from the famous physicist Niels Bohr, who posits that the way you become an expert in a field is to make every mistake possible in that field.\u201d<\/p>\n<p><strong>2\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/caitlinsmallwood\/\" target=\"_blank\" rel=\"noopener noreferrer\">CAITLIN SMALLWOOD<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>VICE PRESIDENT OF SCIENCE AND ALGORITHMS AT\u00a0NETFLIX<\/em><\/strong><\/p>\n<p>\u201cI would say to always bite the bullet with regard to understanding the basics of the data first before you do anything else, even though it\u2019s not sexy and not as fun. In other words,\u00a0put effort into understanding how the data is captured, understand exactly how each data field is defined, and understand when data is missing. If the data is missing, does that mean something in and of itself? Is it missing only in certain situations? These little, teeny nuanced data gotchas will really get you. They really will.<\/p>\n<p>You can use the most sophisticated algorithm under the sun, but it\u2019s the same old junk-in\u2013junk-out thing. You cannot turn a blind eye to the raw data, no matter how excited you are to get to the fun part of the modeling. Dot your i\u2019s, cross your t\u2019s, and check everything you can about the underlying data before you go down the path of developing a model.<\/p>\n<p>Another thing I\u2019ve learned over time is that a mix of algorithms is almost always better than one single algorithm in the context of a system, because different techniques exploit different aspects of the patterns in the data, especially in complex large data sets.\u00a0So while you can take one particular algorithm and iterate and iterate to make it better, I have almost always seen that a combination of algorithms tends to do better than just one algorithm.\u201d<\/p>\n<p><strong>3\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/yann-lecun-0b999\/\" target=\"_blank\" rel=\"noopener noreferrer\">YANN LECUN<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>DIRECTOR OF AI RESEARCH AT FACEBOOK AND PROFESSOR OF DATA SCIENCE\/COMPUTER SCIENCE\/NEUROSCIENCE AT NYU<\/em><\/strong><\/p>\n<p>\u201cI always give the same advice, as I get asked this question often. My take on it is that if you\u2019re an undergrad, study a specialty where you can take as many math and physics courses as you can. And it has to be the right courses, unfortunately. What I\u2019m going to say is going to sound paradoxical, but majors in engineering or physics are probably more appropriate than say math, computer science, or economics. Of course, you need to learn to program, so you need to take a large number of classes in computer science to learn the mechanics of how to program. Then, later, do a graduate program in data science. Take undergrad machine learning, AI, or computer vision courses, because you need to get exposed to those techniques. Then, after that, take all the math and physics courses you can take. Especially the continuous applied mathematics courses like optimization, because they prepare you for what\u2019s really challenging.<\/p>\n<p>It depends where you want to go because there are a lot of different jobs in the context of data science or AI. People should really think about what they want to do and then study those subjects. Right now the hot topic is deep learning, and what that means is learning and understanding classic work on neural nets, learning about optimization, learning about linear algebra, and similar topics. This helps you learn the underlying mathematical techniques and general concepts we confront every day.\u201d<\/p>\n<p><strong>4\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/erinshellman\/\" target=\"_blank\" rel=\"noopener noreferrer\">ERIN SHELLMAN<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>DATA SCIENCE MANAGER AT ZYMERGEN, EX-DATA SCIENTIST AT NORDSTROM DATA LAB AND AWS S3<\/em><\/strong><\/p>\n<p>\u201cFor the person still deciding what to study I would say STEM fields are no-brainers, and in particular the \u2018TEM ones. Studying a STEM subject will give you tools to test and understand the world. That\u2019s how I see math, statistics, and machine learning. I\u2019m not super interested in math per se, I\u2019m interested in using math to describe things. These are tool sets after all, so even if you\u2019re not stoked on math or statistics, it\u2019s still super worth it to invest in them and think about how to apply it in the things you\u2019re really passionate about.<\/p>\n<p>For the person who\u2019s trying to transition like I did, I would say, for one, it\u2019s hard. Be aware that it\u2019s difficult to change industries and you are going to have to work hard at it. That\u2019s not unique to data science \u2014 that\u2019s life. Not having any connections in the field is tough but you can work on it through meet-ups and coffee dates with generous people. My number-one rule in life is \u201cfollow up.\u201d If you talk to somebody who has something you want, follow up.<\/p>\n<p>Postings for data scientists can be pretty intimidating because most of them read like a data science glossary. The truth is that the technology changes so quickly that no one possesses experience of everything liable to be written on a posting. When you look at that, it can be overwhelming, and you might feel like, \u201cThis isn\u2019t for me. I don\u2019t have any of these skills and I have nothing to contribute.\u201d I would encourage against that mindset as long as you\u2019re okay with change and learning new things all the time.<\/p>\n<p>Ultimately, what companies want is a person who can rigorously define problems and design paths to a solution. They also want people who are good at learning. I think those are the core skills.\u201d<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" style=\"width: 650px; height: 471px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5a6caf0c085229bac5de7c0e\/1517072147604\/pic1.jpg?format=750w\" alt=\"experfy-blog\" \/><\/p>\n<p><strong>5\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/dtunkelang\/\" target=\"_blank\" rel=\"noopener noreferrer\">DANIEL TUNKELANG<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>CHIEF SEARCH EVANGELIST AT TWIGGLE, EX-HEAD OF SEARCH QUALITY AT<\/em><\/strong><strong>\u00a0<em>LINKEDIN<\/em><\/strong><\/p>\n<p>\u201cTo someone coming from math or the physical sciences, I\u2019d suggest investing in learning software skills \u2014 especially Hadoop and R, which are the most widely used tools. Someone coming from software engineering should take a class in machine learning and work on a project with real data, lots of which is available for free. As many people have said, the best way to become a data scientist is to do data science. The data is out there and the science isn\u2019t that hard to learn, especially for someone trained in math, science, or engineering.<\/p>\n<p>Read \u201cThe Unreasonable Effectiveness of Data\u201d \u2014 a classic essay by Google researchers Alon Halevy, Peter Norvig, and Fernando Pereira. The essay is usually summarized as \u201cmore data beats better algorithms.\u201d It is worth reading the whole essay, as it gives a survey of recent successes in using web-scale data to improve speech recognition and machine translation. Then for good measure, listen to what\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=F7iopLnhDik\" target=\"_blank\" rel=\"noopener noreferrer\">Monica Rogati<\/a>\u00a0has to say about how better data beats more data. Understand and internalize these two insights, and you\u2019re well on your way to becoming a data scientist.\u201d<\/p>\n<p><strong>6\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/johnwforeman\/\" target=\"_blank\" rel=\"noopener noreferrer\">JOHN FOREMAN<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>VICE PRESIDENT OF PRODUCT MANAGEMENT AND EX-CHIEF DATA SCIENTIST AT MAILCHIMP<\/em><\/strong><\/p>\n<p>\u201cI find it tough to find and hire the right people. It\u2019s actually a really hard thing to do, because when we think about the university system as it is, whether undergrad or grad school, you focus in on only one thing. You specialize. But data scientists are kind of like the new Renaissance folks, because data science is inherently multidisciplinary.<\/p>\n<p>This is what leads to the big joke of how a data scientist is someone who knows more stats than a computer programmer and can program better than a statistician. What is this joke saying? It\u2019s saying that a data scientist is someone who knows a little bit about two things. But I\u2019d say they know about more than just two things. They also have to know to communicate. They also need to know more than just basic statistics; they\u2019ve got to know probability, combinatorics, calculus, etc. Some visualization chops wouldn\u2019t hurt. They also need to know how to push around data, use databases, and maybe even a little OR. There are a lot of things they need to know. And so it becomes really hard to find these people because they have to have touched a lot of disciplines and they have to be able to speak about their experience intelligently. It\u2019s a tall order for any applicant.<\/p>\n<p>It takes a long time to hire somebody, which is why I think people keep talking about how there is not enough talent out there for data science right now. I think that\u2019s true to a degree. I think that some of the degree programs that are starting up are going to help. But even still, coming out of those degree programs, for MailChimp we would look at how you articulate and communicate to us how you\u2019ve used the data science chops across many disciplines that this particular program taught you. That\u2019s something that\u2019s going to weed out so many people. I wish more programs would focus on the communication and collaboration aspect of being a data scientist in the workplace.\u201d<\/p>\n<p><strong>7\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/rehrenberg\/\" target=\"_blank\" rel=\"noopener noreferrer\">ROGER EHRENBERG<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>MANAGING PARTNER OF IA VENTURES<\/em><\/strong><\/p>\n<p>&#8220;I think the areas where the biggest opportunities are also have the most challenges. Healthcare data obviously has some of the biggest issues with PII and privacy concerns. Added to that, you\u2019ve also got sclerotic bureaucracies, fossilized infrastructures, and data silos that make it very hard to solve hard problems requiring integration across multiple data sets. It will happen, and I think a lot the technologies we\u2019ve talked about here are directly relevant to making health care better, more affordable, and more distributed. I see this representing a generational opportunity.<\/p>\n<p>Another huge area in its early days is risk management \u2014 whether it\u2019s in finance, trading, or insurance. It\u2019s a really hard problem when you\u2019re talking about incorporating new data sets into risk assessment \u2014 especially when applying these technologies to an industry like insurance, which, like health care, has lots of privacy issues and data trapped within large bureaucracies. At the same time, these old fossilized companies are just now starting to open up and figure out how to best interact with the startup community in order to leverage new technologies. This is another area that I find incredibly exciting.<\/p>\n<p>The third area I\u2019m passionate about is reshaping manufacturing and making it more efficient. There has been a trend towards manufacturing moving back onshore. A stronger manufacturing sector could be a bridge to recreating a vibrant middle class in the US. I think technology can help hasten this beneficial trend.&#8221;<\/p>\n<p><strong>8\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/cperlich\/\" target=\"_blank\" rel=\"noopener noreferrer\">CLAUDIA PERLICH<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>CHIEF SCIENTIST AT DSTILLERY<\/em><\/strong><\/p>\n<p>\u201cI think, ultimately, learning how to do data science is like learning to ski. You have to do it. You can only listen to so many videos and watch it happen. At the end of the day, you have to get on your damn skis and go down that hill. You will crash a few times on the way and that is fine. That is the learning experience you need. I actually much prefer to ask interviewees about things that did not go well rather than what did work, because that tells me what they learned in the process.<\/p>\n<p>Whenever people come to me and ask, \u201cWhat should I do?\u201d I say, \u201cYeah, sure, take online courses on machine learning techniques. There is no doubt that this is useful. You clearly have to be able to program, at least somewhat. You do not have to be a Java programmer, but you must get something done somehow. I do not care how.\u201d<\/p>\n<p>Ultimately, whether it is volunteering at DataKind to spend your time at NGOs to help them, or going to the Kaggle website and participating in some of their data mining competitions \u2014 just get your hands and feet wet. Especially on Kaggle, read the discussion forums of what other people tell you about the problem, because that is where you learn what people do, what worked for them, and what did not work for them. So anything that gets you actually involved in doing something with data, even if you are not paid being for it, is a great thing.<\/p>\n<p>Remember, you have to ski down that hill. There is no way around it. You cannot learn any other way. So volunteer your time, get your hands dirty in any which way you can think, and if you have a chance to do internships \u2014 perfect. Otherwise, there are many opportunities where you can just get started. So just do it.\u201d<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" style=\"width: 650px; height: 396px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5a6caf3f085229bac5de86e4\/1517072195441\/pic2.jpg?format=750w\" alt=\"experfy-blog\" \/><\/p>\n<p><strong>9\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/jonathan-lenaghan-64b0b42\/\" target=\"_blank\" rel=\"noopener noreferrer\">JONATHAN LENAGHAN<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>CHIEF SCIENTIST AND SENIOR VICE PRESIDENT OF PRODUCT DEVELOPMENT AT PLACEIQ<\/em><\/strong><\/p>\n<p>&#8220;First and foremost, it is very important to be self-critical: always question your assumptions and be paranoid about your outputs. That is the easy part. In terms of skills that people should have if they really want to succeed in the data science field, it is essential to have good software engineering skills. So even though we may hire people who come in with very little programming experience, we work very hard to instill in them very quickly the importance of engineering, engineering practices, and a lot of good agile programming practices. This is helpful to them and us, as these can all be applied almost one-to-one to data science right now.<\/p>\n<p>If you look at dev ops right now, they have things such as continuous integration, continuous build, automated testing, and test harnesses \u2014 all of which map very well from the dev ops world to the data ops (a phrase I stole from Red Monk) world very easily. I think this is a very powerful notion. It is important to have testing frameworks for all of your data, so that if you make a code change, you can go back and test all of your data. Having an engineering mindset is essential to moving with high velocity in the data science world. Reading\u00a0<em><a href=\"https:\/\/www.amazon.com\/Code-Complete-Practical-Handbook-Construction\/dp\/0735619670\" target=\"_blank\" rel=\"noopener noreferrer\">Code Complete<\/a>\u00a0<\/em>and\u00a0<a href=\"https:\/\/www.amazon.com\/Pragmatic-Programmer-Journeyman-Master\/dp\/020161622X\" target=\"_blank\" rel=\"noopener noreferrer\"><em>The Pragmatic Programmer<\/em><\/a>\u00a0is going to get you much further than reading machine learning books \u2014 although you do, of course, have to read the machine learning books, too.\u201d<\/p>\n<p><strong>10\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/anna-smith-338996b\/\" target=\"_blank\" rel=\"noopener noreferrer\">ANNA SMITH<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>SENIOR DATA ENGINEER AT SPOTIFY, EX-ANALYTICS ENGINEER AT RENT THE RUNWAY<\/em><\/strong><\/p>\n<p>\u201cIf someone is just starting out in data science, the most important thing to understand is that it\u2019s okay to ask people questions. I also think humility is very important. You\u2019ve got to make sure that you\u2019re not tied up in what you\u2019re doing. You can always make changes and start over. Being able to scrap code, I think, is really hard when you\u2019re starting out, but the most important thing is to just do something.<\/p>\n<p>Even if you don\u2019t have a job in data science, you can still explore data sets in your downtime and can come up with questions to ask the data. In my personal time, I\u2019ve played around with Reddit data. I asked myself, \u201cWhat can I explore about Reddit with the tools that I have or don\u2019t have?\u201d This is great because once you\u2019ve started, you can see how other people have approached the same problem. Just use your gut and start reading other people\u2019s articles and be like, \u201cI can use this technique in my approach.\u201d Start out very slowly and move slowly. I tried reading a lot when I started, but I think that\u2019s not as helpful until you\u2019ve actually played around with code and with data to understand how it actually works, how it moves. When people present it in books, it\u2019s all nice and pretty. In real life, it\u2019s really not.<\/p>\n<p>I think trying a lot of different things is also very important. I don\u2019t think I\u2019d ever thought that I would be here. I also have no idea where I\u2019ll be in five years. But maybe that\u2019s how I learn, by doing a bit of everything across many different disciplines to try to understand what fits me best.\u201d<\/p>\n<p><strong>11\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/andre-karpistsenko\/\" target=\"_blank\" rel=\"noopener noreferrer\">ANDRE KARPISTSENKO<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>DATA SCIENCE LEAD AT TAXIFY, CO-FOUNDER AND RESEARCH LEAD AT PLANETOS<\/em><\/strong><\/p>\n<p>\u201cThough somewhat generic advice, I believe you should trust yourself and follow your passion. I think it\u2019s easy to get distracted by the news in the media and the expectations presented by the media and choose a direction that you didn\u2019t want to go. So when it comes to data science, you should look at it as a starting point for your career. Having this background will be beneficial in anything you do. Having an ability to create software and the ability to work with statistics will enable you to make smarter decisions in any field you choose. For example, we can read about how an athlete\u2019s performance is improved through data, like someone becoming the gold medalist in the long jump because they optimized and practiced the angle at which they should jump. This is all led by a data-driven approach to sports.<\/p>\n<p>If I were to go into more specific technical advice, then it depends on the ambitions of the person who is receiving the advice. If the person wants to create new methods and tools, then that advice would be very different. You need to persist and keep going in your direction, and you will succeed. But if your intent is to be diverse and flexible in many situations, then you want to have a big toolbox of different methods.<\/p>\n<p>I think the best advice given to me was given by a Stanford professor whose course I attended a while ago. He recommended having a T-shaped profile of competence but with a small second competence next to the core competence, so that you have an alternative route in life if you need it or want it. In addition to the vertical stem of single-field expertise, he recommended that you have the horizontal bar of backgrounds broad enough so that you can work with many different people in many different situations. So the while you are in a university, building a T shape with another small competence in it is probably the best thing to do.<\/p>\n<p>Maybe the most important thing is to surround yourself with people greater than you are and to learn from them. That\u2019s the best advice. If you\u2019re in a university, that\u2019s the best environment to see how diverse the capabilities of people are. If you manage to work with the best people, then you will succeed at anything.\u201d<\/p>\n<p><strong>12\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/amy-heineike-b877062\/\" target=\"_blank\" rel=\"noopener noreferrer\">AMY HEINEIKE<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>VICE PRESIDENT OF TECHNOLOGY AT PRIMERAI, EX-DIRECTOR OF MATHEMATICS AT QUID<\/em><\/strong><\/p>\n<p>\u201cI think perhaps they would need to start by looking at themselves and figuring out what it is they really care about. What is it they want to do? Right now, data science is a bit of a hot topic, and so I think there are a lot of people who think that if they can have the \u201cdata science\u201d label, then magic, happiness, and money will come to them. So I really suggest figuring out what bits of data science you actually care about. That is the first question you should ask yourself. And then you want to figure out how to get good at that. You also want to start thinking about what kinds of jobs are out there that really play to what you are interested in.<\/p>\n<p>One strategy is to go really deep into one part of what you need to know. We have people on our team who have done PhDs in natural language processing or who got PhDs in physics, where they\u2019ve used a lot of different analytical methods. So you can go really deep into an area and then find people for whom that kind of problem is important or similar problems that you can use the same kind of thinking to solve. So that\u2019s one approach.<\/p>\n<p>Another approach is to just try stuff out. There are a lot of data sets out there. If you\u2019re in one job and you\u2019re trying to change jobs, try to think whether there\u2019s data you could use in your current role that you could go and get and crunch in interesting ways. Find an excuse to get to try something out and see if that\u2019s really what you want to do. Or just from home there\u2019s open data you can pull. Just poke around and see what you can find and then start playing with that. I think that\u2019s a great way to start. There are a lot of different roles that are going under the name \u201cdata science\u201d right now, and there are also a lot of roles that are probably what you would think of data science but don\u2019t have a label yet because people aren\u2019t necessarily using it. Think about what it is that you really want.\u201d<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" style=\"width: 650px; height: 435px;\" src=\"https:\/\/static1.squarespace.com\/static\/59d9b2749f8dce3ebe4e676d\/t\/5a6caf749140b70ed920e31c\/1517072258096\/pic3.jpg?format=750w\" alt=\"experfy-blog\" \/><\/p>\n<p><strong>13\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/victorwhu\/\" target=\"_blank\" rel=\"noopener noreferrer\">VICTOR HU<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>HEAD OF DATA SCIENCE AT QBE INSURANCE, EX-CHIEF DATA SCIENTIST AT NEXT BIG SOUND<\/em><\/strong><\/p>\n<p>\u201cFirst is that you definitely have to tell a story. At the end of the day, what you are doing is really digging into the fundamentals of how a system or an organization or an industry works. But for it be useful and understandable to people, you have to tell a story.<\/p>\n<p>Being able to write about what you do and being able to speak about your work is very critical. Also worth understanding is that you should maybe worry less about what algorithm you are using. More data or better data beats a better algorithm, so if you can set up a way for you to analyze and get a lot of good, clean, useful data \u2014 great!\u201d<\/p>\n<p><strong>14\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/kira-radinsky-6b68601\/\" target=\"_blank\" rel=\"noopener noreferrer\">KIRA RADINSKY<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>CHIEF SCIENTIST AND DIRECTOR OF DATA SCIENCE AT EBAY, EX-CTO AND CO-FOUNDER OF SALESPREDICT<\/em><\/strong><\/p>\n<p>\u201cFind a problem you\u2019re excited about. For me, every time I started something new, it\u2019s really boring to just study without a having a problem I\u2019m trying to solve. Start reading material and as soon as you can, start working with it and your problem. You\u2019ll start to see problems as you go. This will lead you to other learning resources, whether they are books, papers, or people. So spend time with the problem and people, and you\u2019ll be fine.<\/p>\n<p>Understand the basics really deeply. Understand some basic data structures and computer science. Understand the basis of the tools you use and understand the math behind them, not just how to use them. Understand the inputs and the outputs and what is actually going on inside, because otherwise you won\u2019t know when to apply it. Also, it depends on the problem you\u2019re tackling. There are many different tools for so many different problems. You\u2019ve got to know what each tool can do and you\u2019ve got to know the problem that you\u2019re doing really well to know which tools and techniques to apply.\u201d<\/p>\n<p><strong>15\u00a0<\/strong><strong>\u2014\u00a0<a href=\"http:\/\/ericjonas.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">ERIC JONAS<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>POSTDOC AT UC BERKELEY EECS, EX-CHIEF PREDICTIVE SCIENTIST AT SALESFORCE<\/em><\/strong><\/p>\n<p>\u201cThey should understand probability theory forwards and backwards. I\u2019m at the point now where everything else I learn, I then map back into probability theory. It\u2019s great because it provides this amazing, deep, rich basis set along which I can project everything else out there. There\u2019s a book by E. T. Jaynes called\u00a0<a href=\"https:\/\/www.amazon.com\/Probability-Theory-Science-T-Jaynes\/dp\/0521592712\" target=\"_blank\" rel=\"noopener noreferrer\">Probability Theory: The Logic of Science<\/a>, and it\u2019s our bible. We really buy it in some sense. The reason I like the probabilistic generative approach is you have these two orthogonal axes \u2014 the modeling axis and the inference axis. Which basically translates into how do I express my problem and how do I compute the probability of my hypothesis given the data? The nice thing I like from this Bayesian perspective is that you can engineer along each of these axes independently. Of course, they\u2019re not perfectly independent, but they can be close enough to independent that you can treat them that way.<\/p>\n<p>When I look at things like deep learning or any kind of LASSO-based linear regression systems, which is so much of what counts as machine learning these days, they\u2019re engineering along either one axis or the other. They\u2019ve kind of collapsed that down. Using these LASSO-based techniques as an engineer, it becomes very hard for me to think about: \u201cIf I change this parameter slightly, what does that really mean?\u201d Linear regression as a model has a very clear linear additive Gaussian model baked into it. Well, what if I want things to look different? Suddenly all of these regularized least squares things fall apart. The inference technology just doesn\u2019t even accept that as a thing you\u2019d want to do.\u201d<\/p>\n<p><strong>16\u00a0<\/strong><strong>\u2014\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/jakeporway\/\" target=\"_blank\" rel=\"noopener noreferrer\">JAKE PORWAR<\/a><\/strong><strong>,<\/strong>\u00a0<strong><em>FOUNDER AND EXECUTIVE DIRECTOR OF DATAKIND<\/em><\/strong><\/p>\n<p>\u201cI think a strong statistical background is a prerequisite, because you need to know what you\u2019re doing, and understand the guts of the model you build. Additionally, my statistics program also taught a lot about ethics, which is something that we think a lot about at DataKind. You always want to think about how your work is going to be applied. You can give anybody an algorithm. You can give someone a model for using stop-and-frisk data, where the police are going to make arrests, but why and to what end? It\u2019s really like building any new technology. You\u2019ve got to think about the risks as well as the benefits and really weigh that because you are responsible for what you create.<\/p>\n<p>No matter where you come from, as long as you understand the tools that you\u2019re using to draw conclusions, that is the best thing you can do. We are all scientists now, and I\u2019m not just talking about designing products. We are all drawing conclusions about the world we live in. That\u2019s what statistics is \u2014 collecting data to prove a hypothesis or to create a model of the way the world works. If you just trust the results of that model blindly, that\u2019s dangerous because that\u2019s your interpretation of the world, and as flawed as it is, your understanding is how flawed the result is going to be.<\/p>\n<p>In short, learn statistics and be thoughtful.\u201d<\/p>\n<p><a href=\"https:\/\/www.goodreads.com\/book\/show\/22945255-data-scientists-at-work\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Data Scientists at Work<\/strong><\/a>\u00a0displays how some of the world\u2019s top data scientists work across a dizzyingly wide variety of industries and applications \u2014 each leveraging her own blend of domain expertise, statistics, and computer science to create tremendous value and impact.<\/p>\n<p>Data is being generated exponentially and those who can understand that data and extract value from it are needed now more than ever. The hard-earned lessons and joy about data and models from these thoughtful practitioners would be tremendously useful if you aspire to join the next generation of data scientists.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ready to learn Data Science? Browse courses\u00a0like\u00a0Data Science Training and Certification developed by industry thought leaders and Experfy in Harvard Innovation Lab. Why is data science sexy? It has something to do with so many new applications and entire new industries come into being from the judicious use of copious amounts of data. Examples include<\/p>\n","protected":false},"author":86,"featured_media":2892,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187],"tags":[94],"ppma_author":[1842],"class_list":["post-1299","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":1842,"user_id":86,"is_guest":0,"slug":"james-le","display_name":"James Le","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Le","first_name":"James","job_title":"","description":"James Le is a Software Developer with experiences in Product Management and Data Analytics. He played a pivotal role in the operation of a start-up organization at Denison University."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1299","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/86"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1299"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1299\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/2892"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1299"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1299"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1299"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1299"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}