{"id":1572,"date":"2019-03-13T03:27:25","date_gmt":"2019-03-13T03:27:25","guid":{"rendered":"http:\/\/kusuaks7\/?p=1177"},"modified":"2023-08-23T14:43:24","modified_gmt":"2023-08-23T14:43:24","slug":"data-science-in-the-real-world","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/data-science-in-the-real-world\/","title":{"rendered":"Data Science in the Real World"},"content":{"rendered":"<section>\n<blockquote>\n<h4 id=\"fd45\">Read about the actual work of a Data Scientist. Spoiler alert: It\u2019s nothing like\u00a0kaggle.<\/h4>\n<\/blockquote>\n<p>The online world to help students and enthusiasts prepare for the work as a Data Scientist is vast. There is a plethora of ways to access data and to get information. One could think that creating value from Data Science is as easy as spinning up a Jupyter Notebook and changing a few lines of code. All you have to do is to take a few online courses, and it\u2019s all rainbows and unicorns.<\/p>\n<figure id=\"4cf2\"><canvas width=\"75\" height=\"46\"><\/canvas><img decoding=\"async\" style=\"width: 640px; height: 421px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*0nR-pjD7PTQA1G1GblzUfw.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*0nR-pjD7PTQA1G1GblzUfw.jpeg\" \/><\/figure>\n<p id=\"871e\" style=\"text-align: center;\"><span style=\"font-size: 11px;\">Photo by\u00a0Boudewijn Huysmans\u00a0on\u00a0<a href=\"https:\/\/unsplash.com\/search\/photos\/unicorn?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/unsplash.com\/search\/photos\/unicorn?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" data->Unsplash<\/a><\/span><\/p>\n<p>This post aims to shed light on the opportunities as well as unconventional challenges you might encounter when working as a Data Scientist. You will walk through a real-world use-case and get a more realistic view of the job of a Data Scientist. Spoiler alert: it\u2019s not all rainbows and unicorns.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*kfh4SPJ0Gl31-AVt4TwdSQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*kfh4SPJ0Gl31-AVt4TwdSQ.png\" \/><\/p>\n<\/section>\n<section>\n<h3 id=\"1be0\">The Problem<\/h3>\n<p id=\"1634\">Every Data Science project starts with a problem you aim to solve. It\u2019s important to keep this in mind. Too often, Data Scientists run around looking to solve problems with Machine Learning. It should be the other way around.<\/p>\n<blockquote id=\"4324\"><p>First comes the problem, second comes the Data\u00a0Science.<\/p><\/blockquote>\n<p id=\"83a8\">Our use-case starts with a radical change in the legal landscape. The introduction of the European Union (EU) General Data Protection Regulation\u00a0<a href=\"https:\/\/eugdpr.org\/\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">(GDPR)<\/a>\u00a0in 2018 affected more industries than just\u00a0<a href=\"https:\/\/www.wired.com\/story\/how-gdpr-affects-you\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.wired.com\/story\/how-gdpr-affects-you\/\" data->online marketing<\/a>. GDPR aims at strengthening the privacy rights of individuals in the EU. The regulation is widely hailed by privacy advocates and equally alienated in the industry.<\/p>\n<figure id=\"0a2a\"><canvas width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*kpSJz2g-B9qPk1d1zb94Ww.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*kpSJz2g-B9qPk1d1zb94Ww.jpeg\" \/><\/figure>\n<p id=\"78a4\">Companies generally struggle to interpret how GDPR will be applied in specific use-cases because there aren\u2019t any exemplary rulings yet. In its essence, GDPR requires companies to give individuals the right to request and delete their data. In addition, companies should only collect the data needed for a specific, pre-determined use-case. GDPR prohibits unnecessary hoarding of personal data. The legislation introduces heavy uncertainties since the actual limits of enforcement yet remain to be explored.<\/p>\n<figure id=\"ee60\"><canvas width=\"50\" height=\"75\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*ZYAaXgJYGnz3Uv2qH-VgPg.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*ZYAaXgJYGnz3Uv2qH-VgPg.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\"><span style=\"font-size: 11px;\">This is you, the wandering Data Science unicorn. Photo by\u00a0<a href=\"https:\/\/unsplash.com\/photos\/rHVYk9BbRKk?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/unsplash.com\/photos\/rHVYk9BbRKk?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" data->Andrea Tummons<\/a>\u00a0on\u00a0<a href=\"https:\/\/unsplash.com\/search\/photos\/unicorn?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/unsplash.com\/search\/photos\/unicorn?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" data->Unsplash<\/a><\/span><\/p>\n<p id=\"8cd1\">Now imagine a self-driving car roaming the streets in the EU. They often record their environment with cameras. Per definition, faces and license plates count as personal data and need to be protected. How are car manufacturers supposed to drive around without unintendedly collecting faces and license plates of individuals? Some would say it\u2019s almost impossible. Here we have identified an issue that is relevant to our partners. We also believe that Machine Learning can bring a solution. Let\u2019s develop the use-case.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"8ce7\">The Use-Case<\/h3>\n<p id=\"c464\">Automotive companies need to collect real-world data without infringing GDPR protected personal data rights. There are many ways to solve this: only drive in areas without humans or cars, only collect data at night, rely entirely on simulated data, etc. None of these solutions are ideal. Data-driven function development requires real-world data, without constraints.<\/p>\n<p id=\"2cdf\">We could detect faces and license plates and anonymize them. Technically speaking this would be pseudo-anonymization, but due to abbreviation, we will stick to anonymization in this article.<\/p>\n<figure id=\"250c\"><canvas width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*fqgC5s73JMCgzjrakpqoEw.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*fqgC5s73JMCgzjrakpqoEw.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\"><span style=\"font-size: 11px;\">He is anonymized.<\/span><\/p>\n<p id=\"6a35\">You might notice that we haven\u2019t even spoken of using Machine Learning yet! How we should solve this problem relies entirely on finding the best method, which doesn\u2019t necessarily need to be Machine Learning driven.<\/p>\n<p id=\"264b\">We understand that there is a need to anonymize individuals in images and video to protect their privacy. After conducting some research, we can show that Deep Learning is the state-of-the-art approach to accurately detect objects in images. Let\u2019s define the bounds of the project next.<\/p>\n<p id=\"8b70\">The main goal is to focus on the anonymization of human faces recorded from outside car cameras. First, we need to detect faces in an image. Second, we will replace the face with a mask. There are other ways to replace the face, e.g. with a synthetic face, but we won\u2019t get into it in this post.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"f014\">Defining the\u00a0Goal<\/h3>\n<p id=\"1aae\">A Machine Learning product is of little value if it stands for itself. Very often, you will integrate your model into an existing pipeline or build a pipeline around the product. The current engineering go-to framework is to build\u00a0<a href=\"https:\/\/blog.algorithmia.com\/deploying-machine-learning-at-scale\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/blog.algorithmia.com\/deploying-machine-learning-at-scale\/\" data->microservices<\/a>. Microservices handle only the isolated task that they are designed to do. They are easily integrated into existing architectures. Standard tools to do that are\u00a0<a href=\"https:\/\/towardsdatascience.com\/designing-a-machine-learning-model-and-deploying-it-using-flask-on-heroku-9558ce6bde7b\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">Python Flask<\/a>\u00a0and\u00a0<a href=\"https:\/\/towardsdatascience.com\/docker-for-data-scientists-5732501f0ba4\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">Docker containers<\/a>. This is what we want.<\/p>\n<p id=\"1c5f\">To formalize our approach, we vouch to use Objectives and Key Results (OKRs). We learned in this\u00a0<a href=\"https:\/\/towardsdatascience.com\/the-power-of-goal-setting-for-your-data-science-project-9338bf475abd\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">post<\/a>\u00a0about the benefits of using OKRs to steer our Data Science project, so we come up with the following goals:<\/p>\n<figure id=\"b4eb\"><canvas width=\"75\" height=\"31\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*lrscWj6x2i3nvrhiduPqqA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*lrscWj6x2i3nvrhiduPqqA.png\" \/><\/figure>\n<p id=\"efbc\">These OKRs are ambitious stretch-goals for the full-blown face anonymization project. Later in this post, we will see that the scope of the project is limited to a prototype and thus the OKRs should change as well.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"a104\">Creating a\u00a0Project<\/h3>\n<p id=\"19ee\">No matter where you work as a Data Scientist, you always work together in a group with other stakeholders. Before we can get started with our work, we need to take the first hurdle and create a project pitch to convince our partners. We aim to get a Data Scientists for some time to work on this project to prototype a solution.<\/p>\n<p id=\"dfe1\">Management is well aware of the data privacy issue. After all, they are responsible that the company is adhering to the legal requirements. It also makes intuitive sense to anonymize the content of images! In a small pitch, we use the above-defined OKRs together with a\u00a0<a href=\"https:\/\/towardsdatascience.com\/storytelling-for-data-scientists-317c2723aa31\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">persuasive story<\/a>\u00a0to convince management to sponsor a prototype solution as a project. If it\u2019s promising enough, we will look for more partners and take the project to the next level.<\/p>\n<figure id=\"426c\"><canvas width=\"75\" height=\"46\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*PflLUH-l_1gXzycOfdiLhQ.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*PflLUH-l_1gXzycOfdiLhQ.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\"><span style=\"font-size: 11px;\">Photo by\u00a0<a href=\"https:\/\/unsplash.com\/photos\/WkJPu3rEeJE?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/unsplash.com\/photos\/WkJPu3rEeJE?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" data->Steve Johnson<\/a>\u00a0on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/unsplash.com\/?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" data->Unsplash<\/a><\/span><\/p>\n<p id=\"99e7\">Congratulations! We got ourselves a Machine Learning project. Now, let the fun begin.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"6277\">The Work<\/h3>\n<p id=\"a6e5\">Deep Learning Guru Andrew Ng\u00a0<a href=\"https:\/\/towardsdatascience.com\/structuring-your-machine-learning-project-course-summary-in-1-picture-and-22-nuggets-of-wisdom-95b051a6c9dd\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">recommends<\/a>\u00a0to come up with a working model as quickly as possible and then to iterate the idea until the goal is met. Andrew recommends starting to experiment with existing pre-trained models before adjusting them to fit our specific use-case.<\/p>\n<figure id=\"1509\"><canvas width=\"75\" height=\"46\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*Pr3piETfWr0mPNLN4P3B1A.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*Pr3piETfWr0mPNLN4P3B1A.png\" \/><\/figure>\n<p id=\"c0d0\">If we look at our OKRs, we realize we need to take three steps: research available face detection models, gather data, and compare the performance of different models. Let\u2019s start with the first part.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"6ead\">Researching available Models<\/h3>\n<p id=\"d14b\">Face detection has been a vital part of computer vision researcher for many decades already. Our assumption is that finding a good model to detect faces should be easy. Open-source packages like OpenCV offer built-in\u00a0<a href=\"https:\/\/www.pyimagesearch.com\/2018\/02\/26\/face-detection-with-opencv-and-deep-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.pyimagesearch.com\/2018\/02\/26\/face-detection-with-opencv-and-deep-learning\/\" data->face detection models<\/a>\u00a0from the get-go.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*dLiJWKoAhhROlpiv22YYWw.gif\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*dLiJWKoAhhROlpiv22YYWw.gif\" \/><\/p>\n<p style=\"text-align: center;\"><span style=\"font-size: 11px; text-align: center;\">Hi there,\u00a0<\/span><a style=\"font-size: 11px; text-align: center;\" href=\"https:\/\/www.pyimagesearch.com\/2018\/02\/26\/face-detection-with-opencv-and-deep-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.pyimagesearch.com\/2018\/02\/26\/face-detection-with-opencv-and-deep-learning\/\" data->Adrian<\/a><span style=\"font-size: 11px; text-align: center;\">! Check out\u00a0<\/span><a style=\"font-size: 11px; text-align: center;\" href=\"https:\/\/www.pyimagesearch.com\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/www.pyimagesearch.com\/\" data->PyImageSearch<\/a><span style=\"font-size: 11px; text-align: center;\">\u00a0for more great OpenCV tutorials.<\/span><\/p>\n<p id=\"86ad\">The downside is, however, that many face detection models focus on identifying faces which are close to the camera. In our automotive context, it\u2019s too late of we recognize faces when they are right in front of the camera!<\/p>\n<figure id=\"21f3\"><canvas width=\"75\" height=\"46\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*7l4BAe_Xn0av987NBmstxw.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*7l4BAe_Xn0av987NBmstxw.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\"><span style=\"font-size: 11px;\"><a href=\"https:\/\/www.dailystar.co.uk\/news\/latest-news\/518178\/shes-mad-at-me-right-now-guy-hangs-to-front-of-moving-car-American-businessman\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">Too late if you recognize the face just\u00a0now!<\/a><\/span><\/p>\n<p id=\"be1f\">Additionally, cars on the road will record faces with occlusions like hats and sunglasses, in dim light conditions and from different poses. Thus, we should focus our research on models that satisfy these needs.<\/p>\n<p id=\"b6fc\">We analyze recent research papers on state-of-the-art face detection models and get clues from these papers about other existing models. One particular model catches our eye: the Tiny Faces model!<\/p>\n<figure id=\"8760\"><canvas width=\"75\" height=\"37\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*42XawV8dauouG23jNlD7gw.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*42XawV8dauouG23jNlD7gw.png\" \/><\/figure>\n<p id=\"a170\">Apparently, the model is trained on a dataset which includes faces in different poses, lighting conditions, with occlusions and with a focus in small, distant faces. It looks very fitting to our use-case.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"0a89\">Licensing\u200a\u2014\u200aThe sleeping\u00a0Beast<\/h3>\n<p id=\"e8d8\">Researchers have it easy\u200a\u2014\u200alicenses for released datasets or models are generally available for scientific purposes. Sparetime kagglers also don\u2019t have to consider licenses, as they just try out models for personal use. However, this changes when you work for a profit-generating company. Suddenly, many data sets or models are taboo!<\/p>\n<p id=\"e8cb\">As a tip, if you read about a model license and it says \u201cnot for commercial purposes\u201d, the model is pretty much out of reach for you. You can\u2019t even test it out for an internal prototype. Let\u2019s forget about this model and research some pre-trained more industry-friendly models.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"4400\">Gathering the\u00a0Data<\/h3>\n<p id=\"8d18\">After we\u2019ve identified suitable models, it\u2019s time to let them loose on real-world data. Since we\u2019re working in the automotive industry, we can be sure to have access to vast amounts of clean and labeled data!<\/p>\n<p id=\"9b5a\">Not so fast, rookie. If you\u2019re not working in a start-up, chances are high that your company is divided into different brands and subsidiaries with many subdivisions whose organization changes frequently. Additionally, GDPR makes it more difficult to share data between departments for different use-cases. As you can imagine, finding the right dataset is equal to finding a needle in a haystack!<\/p>\n<figure id=\"7ef8\"><canvas width=\"75\" height=\"40\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*tbFLMErVzKvHw_7Isz0qFg.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*tbFLMErVzKvHw_7Isz0qFg.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\"><span style=\"font-size: 11px;\">Go ahead, find the needle. Photo by\u00a0<a href=\"https:\/\/unsplash.com\/photos\/9Mq_Q-4gs-w?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/unsplash.com\/photos\/9Mq_Q-4gs-w?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" data->Lucas Gallone<\/a>\u00a0on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/unsplash.com\/?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" data->Unsplash<\/a><\/span><\/p>\n<p id=\"9a6f\">So we don\u2019t really have the data we need. This might seem outrageous to the reader, but not having the right data is one of the most common reasons why Data Science\u00a0<a href=\"https:\/\/towardsdatascience.com\/why-data-science-succeeds-or-fails-c24edd2d2f9\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">projects fail<\/a>, regardless of the company you work for.<\/p>\n<p id=\"e941\">There is a great public dataset called\u00a0<a href=\"http:\/\/mmlab.ie.cuhk.edu.hk\/projects\/WIDERFace\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"http:\/\/mmlab.ie.cuhk.edu.hk\/projects\/WIDERFace\/\" data->WIDER Face<\/a>. Unfortunately, the licensing beast strikes here again. Google published its\u00a0<a href=\"https:\/\/storage.googleapis.com\/openimages\/web\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/storage.googleapis.com\/openimages\/web\/index.html\" data->Open Images<\/a>\u00a0dataset which contains many labeled human faces and is free for commercial use. The images are still not the real-world data that we need our model to perform well on.<\/p>\n<p id=\"64e7\">Thus, the only way we can proceed is to collect our own dataset. Luckily, we have the equipment to gather real-world data that we want to do well on. Let\u2019s take a spin and collect some data of pedestrians in a controlled environment.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"4672\">Apply the\u00a0Models<\/h3>\n<p id=\"c143\">Now that we have collected some data, its time to try out the models. We have spent most of the time researching models and collecting data, so we don\u2019t have enough time to label the dataset. How can we escape this misery? We help ourselves with approximations like comparing the detected face count and good ol\u2019 intuitions about which model performs better.<\/p>\n<p id=\"812f\">This should do for an intermediate result though. The performance shows a winner, so we collect another more realistic dataset and create a showcase video as our result. We try out a few more things like hyperparameter tuning to get rid of too many false positives to improve the showcase, but the time is ticking for this project. We containerize the code present our results.<\/p>\n<figure id=\"8c6d\"><canvas width=\"75\" height=\"46\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*1xtRL_3pj-VwyWCD4jaeag.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*1xtRL_3pj-VwyWCD4jaeag.jpeg\" \/><\/figure>\n<p id=\"c2f9\">We\u2019re done with our project! We show that face detection to protect GDPR data in the automotive industry works. Next, we have to convince other stakeholders and partners to sponsor the full project for which we have gathered the OKRs above. The next steps could include license plate detection, hyperparameter tuning, preparing a proper dataset, collecting more data, etc. Shortly after the project, the tremendous folks from understand.ai open-sourced their\u00a0<a href=\"https:\/\/github.com\/understand-ai\/anonymizer\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/github.com\/understand-ai\/anonymizer\" data->anonymization code<\/a>, so we should definitely give this a try as well.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"b65a\">The Conclusion<\/h3>\n<p id=\"5bca\">As you can see, the actual work on this picture pretty use-case is messy. Data is not always available. Be careful when using licenses. Funding for your project might be limited to a certain point. Priorities and circumstances change. You have to stay flexible and work within the given time limits, even if you don\u2019t like it.<\/p>\n<figure id=\"731f\"><canvas width=\"75\" height=\"46\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*OGSBNjaI4C0BXA_wxFHHFg.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/640\/1*OGSBNjaI4C0BXA_wxFHHFg.jpeg\" \/><\/figure>\n<p style=\"text-align: center;\"><span style=\"font-size: 11px;\">Photo by\u00a0<a href=\"https:\/\/unsplash.com\/photos\/g2pdILavGCY?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/unsplash.com\/photos\/g2pdILavGCY?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" data->Todd Cravens<\/a>\u00a0on\u00a0<a href=\"https:\/\/unsplash.com\/search\/photos\/rainbow?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/unsplash.com\/search\/photos\/rainbow?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" data->Unsplash<\/a><\/span><\/p>\n<p id=\"16e0\">With this post, I hope I could shed some light on the real-world work of a Data Scientist through a tiny project. The challenges are surely not always the same, but I imagine them to be similar at different companies. Now, you are prepared that the Data Science world is not all rainbows and unicorns.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"c4f5\">Key Take-Aways<\/h3>\n<p id=\"3ffb\">As a real-world Data Scientist, you should be aware of the following challenges:<\/p>\n<ul>\n<li id=\"10af\">You need to convince management and stakeholders to sponsor your new project<\/li>\n<li id=\"7be4\">Check for the right licensing when incorporating existing models or datasets<\/li>\n<li id=\"e88f\">Most of the work you\u2019re doing is research and data preparation<\/li>\n<li id=\"8263\">You need to stay within the time scope of the pre-defined project<\/li>\n<\/ul>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>Every Data Science project starts with a problem you aim to solve. It&rsquo;s important to keep this in mind. Too often, Data Scientists run around looking to solve problems with Machine Learning. It should be the other way around. As a real-world Data Scientist, you should be aware of the following challenges. You need to convince management and stakeholders to sponsor your new project. Check for the right licensing when incorporating existing models or datasets. Most of the work you&rsquo;re doing is research and data preparation.<\/p>\n","protected":false},"author":344,"featured_media":4129,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[94],"ppma_author":[2067],"class_list":["post-1572","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":2067,"user_id":344,"is_guest":0,"slug":"jan-zawadzki","display_name":"Jan Zawadzki","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Zawadzki","first_name":"Jan","job_title":"","description":"Jan Zawadzki is Data Scientist at Volkswagon Grooup Services with 4 years of global experience in machine learning and management consulting."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1572","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/344"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1572"}],"version-history":[{"count":2,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1572\/revisions"}],"predecessor-version":[{"id":28250,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1572\/revisions\/28250"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/4129"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1572"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1572"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1572"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1572"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}