I have loved being a data scientist. The job is challenging. The job market is great. I can't imagine any other job that would provide more career fulfillment for me now. This is my ideal career.
Across the way, I see people that I care about trying to break into this field. Wondering what to study, what skills they need, feeling clueless about how to get that first real data science job in this market. This article is for them.
Becoming a data scientist from scratch
Many experts will disagree with me and say it is not possible to do this. They will say that there is no realistic pathway for someone to go from 0 to a serious data science job in such a short amount of time. Especially, without going through a paid program to do so. They will reference 4-8 years of graduate education as a necessary step to routing someone to this point. I disagree.
To understand the mechanics of this I will throw it on myself. I will create a hypothetical situation. I am going to delete all of my work war wounds and scars, lessons learned, network connections, academic credentials, work experience, and knowledge.
Experience And Credentials = 0
My resume has suddenly gone from this (left) to this (right)
Network Benefit = 0
In addition to losing any experience credibility, my help network will vanish. My network of recruiters, data experts, conference organizers, publishers, followers, and more importantly decision makers who could hire me... I am now a network of 1.
Data Science Powers = 0
Programming and math have digressed to that of a new high school graduate. I have no memory of any of my past projects, lessons learned, or language familiarity.
Before I pull the trigger on this reset I get to write myself a letter, which I will have as a guide.
You have a TON of work cut out for you during the next 6 months to land that $100K+ data science job. Stacks of books to read, video tutorials to watch, plenty of uncomfortable networking, and public speaking on subjects you know nothing about. Scared? You should be. Success? Absolutely, but you just need to listen to everything I am about to tell you. First you need to go and buy a laptop that you enjoy typing on, with a trackpad you can tolerate. A refurbished/used Macbook air might be a fine starting point. You will be spending 3-5 hours per day on this thing so make it count.
Starting now you will spend a minimum of 3 hours per day doing python programming tutorials (*EDITED: spin up tutorials with R, Julia, Java, and Octave so you are familiar but I'd still say focus on being a python expert). Start by searching YouTube for "Google Python Class Day 1 Part 1", go through the whole series and others. Keep searching for python tutorials and lessons online that work for your current understanding. Once you have started to feel more comfortable with python then start looking for tutorials on numpy, sklearn, pandas, and matplotlib. I would also order "Python Machine Learning" by Sebastian Raschka on Amazon. You need to understand every page of this book during the next three months, and show that by literally marking and writing on each page in red ink.
Look up the local meetups for data science and python. Plan on going to ALL of these meetups and local conferences during the next 6 months. When you go there, introduce yourself to new people and connect with them on LinkedIn and Twitter afterward. Also, if there are speakers or organizers make sure you connect with them and work to form meaningful connections with them. They will have the most influence on your future. During the next 3 months you will be presenting at these meetups. Set up a LinkedIn profile, connect with local people at these meetups, once you have more than 20 local connections start connecting to every single data scientist you can find on LinkedIn and every data science recruiter to get over 500 connections. As you build momentum and network size these future connections will be easier to get accepted. Your tag line under your name on LinkedIn will be: "Data Science Enthusiast". As you cross milestones publish what you have learned on your blog and encourage comments and feedback. As you become more sophisticated the value of your posts will grow.
Statistics / Math:
For someone who is truly entry level I like the dummies books:
Work through that. Make sure you understand the basic concepts of mean, median, mode, std, skew, and histograms. Now start searching for online tutorials on entry level statistics, or statistics for engineers. I would also start searching for basic tutorials on linear algebra, especially anything using numpy with python.
Your biggest issue right now is that your foundation looks like swiss cheese, you have holes everywhere. You need to find these gaps, and fill them as quickly as possible. One option might be to go to http://scikit-learn.org/stable/, see the topics supervised learning, unsupervised learning, clustering. You should start following through on all of these topics and understand the algorithms below them. If you can't understand them (i.e. k-means, or logistic regression) look them up online and see if you can find a YouTube tutorial, there are some great ones out there. If you are still stuck save those questions for your meetup where you can ask someone directly what they would recommend. Also, the python ML book mentioned earlier should help.
Finding A Friend/Mentor
If your drive, passion, and motivation are maxed out that can be detected. If you are going to meetups and showing a clear improvement in your abilities based on your efforts it will be easier to engage a mentor. Nothing is worse than someone who wants to become a data scientists but lacks the drive, passion, or focus to do so. They remind me of limp handshakes and over cooked spaghetti noodles. A mentor or friend in this space can help direct you on filling in your gaps, and they can also inspire you for topics you can present on at the meetups.
I'm running out of time for this letter so I am going to cram a final push in here. Start going through the UCI data repository and making sure you can understand python machine learning examples that run on some of these datasets (MNIST, Boston Housing, Iris flower, Titantic deaths, etc..). Once you feel comfortable there go to Kaggle and make sure you understand some of the solutions and approaches, ask your mentor for help if you get stuck on a concept. Start using jupyter.orgnotebooks and matplotlib for your testing/development. I would also install Docker and learn how to launch Jupyter notebooks from that and install missing packages through apt-get or pip. I would scan odesk.com or elance.com for ANY machine learning related jobs that I think I can do. I don't care if I get paid $1/hr, I need that experience on my blank resume. As I start doing more jobs and feeling more confident that pay will gradually go up.
So much to learn... really so much, but if you hack at least 3 hour per day on python programming, and 2 hours on machine-learning/stats you can start filling in those holes. Deep learning is a very exciting topic that I would include in your data tool-kit. This might be weird but I know my brain wiring is the same so I would watch my own hour long deep learning introduction video on YouTube:https://www.youtube.com/watch?v=lr3ZZAyHgsM. If possible I would try to find a free internship or even ask my mentor if I could shadow them and just look over their shoulder while they hack on a project. I would keep the books flowing as well and as I have bandwidth I would purchase any and all recommendation books around python and data science. Even books on the same topic can take a different approach and open your eyes to new perspectives and understanding.
To land the first job I will need to show historical consulting where I am billing at the $100+/hr rate. I also need to focus heavily on my local network to the point where I have referrals referencing me based on interactions or local presentations. I would also ask several local peers about their opinion on why I would or would not be able to secure a data science job at the pay range I want.