Building a Natural Language Processing Roadmap
Hello again! I hope your journey Towards NLP is going well. Whether that’s the case or you feel like you are struggling, I thought I would share this really helpful roadmap I created for myself when I first felt like I was trying to move in every direction. Sometimes it is hard to prioritize or to see in which direction you should be going, so hopefully, this will help you. This is a complete Natural Language Processing Roadmap, but it is also so much more! I divided it into three sections: a Data Science Roadmap (this first part you are reading); a Deep Learning Roadmap, and finally, a Natural Language Processing Roadmap. We will talk about each section in a different article.
By clicking on the topic box, you will be redirected to the corresponding part of the article, where you will find links to different courses and resources. I will also link other blog articles where I go more into details for each topic. At the end of our journey, you will also find more useful tips to improve your NLP and Data Science skills, so stick around, and as always, enjoy!
Step 0 – Find a Mentor
I go into much more details about where and why you should find a mentor in this article. While this is not a necessary step (that’s why you’ll find it dotted in the roadmap), it is incredibly helpful. Especially if you are starting from scratch, it is nice to have someone to talk to.
Step 1 – Learn Computer Science Fundamentals
Here is the deal. You don’t really need to learn about computer science basics. You can definitely go by and become a very good Data Scientist or NLP Engineer without knowing about transistors, Ada Lovelace, Charles Cabbage, Web protocols, binary counting, etc.
However, it is a great base to start. Having a general idea of how your computer is working and how the different pieces fit together really helps when you are trying to understand some obscure bug. Or when you are reading documentation and academic papers, and you need to figure out why some design or architectural choices were made. Of course, you don’t need to go into too many details if you are not interested, but dipping your toe in the water won’t hurt. And frankly, having at least a high level idea of how computers work nowadays is absolutely a must.
Here you can find two really nice resources to introduce you to computer science here:
- CS50: Introduction to Computer Science | Harvard University
- Crash Course Computer Science
- Data Structures and Algorithms
- Introduction to Computer Science and Programming Using Python (edX — MIT)
The first two resources are a really great place to start learning more about Computer Science. They are easy to follow but really informative at the same time (which isn’t an easy feat to accomplish!).
A Few Words about My Favorite Courses
Harvard’s CS50 has more of a hands-on, programming oriented approach. You will cover all the basics as you carry out small projects and exercises. This course is an excellent resource to start programming while understanding what is going on under the hood of your program. The Computer Science concepts discussed in the course are a true must, and mastering them early on will make programming so much easier.
Crash Course Computer Science is a truly amazing resource. I personally believe it should be used in schools to teach small kids about Computer Science. This is a YouTube series of 40 videos of about 10 minutes each, which makes it perfect if you have to pack your learning time in an already jammed schedule. It is fun and easy to follow, and it makes even harder concepts easy to understand. The host, Carrie Anne Philbin does a remarkable job at explaining the different topics. And since each topic rarely takes up more than one video, you are sure to get a really well-rounded view of the field! This course is perfect if you are starting from zero, if you want to make sure you have covered all your basics, or simply if you are curious about Computer Science.
A little Plus
You’ll learn bits and pieces about how to use Git and GitHub in any of the different courses you will follow, but I believe these are fundamental tools to master even if you plan on programming for small personal projects. They will make your life so much easier! So here are a couple of courses you can have a look at:
Having a basic understanding of what the Command Line is, and how it works (and how it can work for you!) is really important. Here is a nice introduction with some really helpful explanations:
Step 2 – Master a Programming Language
I won’t go into too many details about the pros and the cons and the whys and the whos of Python. You can read more about the pros and cons of Python and why some people do not believe it is a good language to start programming in this article I wrote a while back. I personally believe (and I am not the only one!) that Python is a great language to start learning. It is easy to read, intuitive, elegant, sleek, and flexible. It is undoubtedly the programming language of Scientific Research, Data Science, Machine Learning and NLP. And of course as you undertake different projects you will be faced with different challenges, and you might need to use other programming languages. That is how we learn and grow our skill set.
But for the moment being, we just want to start, and Python is the perfect place. So here you will find a list of resources to learn Programming with Python. If you want more information about this topic, don’t forget to check out this article. In the article you will find more details about each course, and, you guessed it, a complete Roadmap to learn Python!
Step 3 – Learn how to Play with Data
As you can see in the roadmap, Data is going to be a pretty big part of your journey. After all, this branch of Computer Science is called Data Science. And as someone once said (sorry, I don’t remember who said it specifically), the Data comes before the Science. And I’ll go even further and say, there is no Data Science without Data. In Machine Learning especially, Data is the starting point. You need to find it, collect it, store it, organize it, clean it, analyze it, test it, augment it, and much more.
In the roadmap, I put three branches concerning Data:
- Data Collecting and Cleaning
- Data Visualization
- Data Engineering
In the list above, I put them in order of importance.
No matter what your goal is, if you want to dabble in Data Science, ML and/or NLP, you are going to have to learn how to collect and clean data. You might get on without knowing SQL and database management in the beginning, since there are so many datasets ready to be downloaded with a couple of lines of code. But further down the line, you are (hopefully!) going to want to create your own dataset to start experimenting on personal/professional projects. So the sooner you start playing with SQL, Pandas and NumPy, the better. Here are a few interesting resources for you to develop your Data collecting and cleaning skills:
A Roadmap Beyond Data Science
Engineers develop ETL pipelines, automate file system chores, and optimize database processes to make them high-performance using Shell (CLI), SQL, and Python/Scala.
Another important talent is the ability to deploy these data structures, which necessitates knowledge of cloud service providers such as Amazon Web Services, Google Cloud Platform, Microsoft Azure, and others. Google Cloud offers a couple of useful resources:
Info Point
Click here to find more info about the roadmap down below.
How to Read the Roadmap
- in black, you will find broader topics
- in light blue, you will find skills to master
- the little tabs with a settings/clog icon represent suggested languages, libraries, tools etc.
Step 4 – Master Essential Statistics
Here is a great little specialization you can follow that will explain all the concepts you need:
Step 5 – Master Machine Learning
- Machine Learning with Python (freeCodeCamp)
- Machine Learning Crash Course (Google)
- Advanced Machine Learning Specialization
- CS50’s Introduction to Artificial Intelligence with Python
I hope you enjoyed the journey so far! Click on the Deep Learning and Natural Language Processing buttons to access parts two and three of our journey! You will find two more complete roadmaps, a complete Deep Learning roadmap, and a complete Natural Language processing roadmap. Make sure you let me know if there is anything else I missed or I should add!
As always, have fun, and see you in an 8-bit!