Learning Resources and Roadmap for Data Scientists
The following is a roadmap that will help you enter the data science field. The resources mentioned here are available for free. Data science is a field that has multiple prerequisites: mathematics, programming, and machine learning theory. The fields are highly interconnected. In some cases to understand the mathematics behind it, you will benefit if you know the implementation or vice versa. If you get stuck on a topic and don’t understand it fully, continue learning to the next topic or the same topic in another field (i.e. math vs theory). Think of this learning in a cyclical manner. Go over things, just to grasp the concepts, then revisit them to understand fully.
Free Resources for Beginners
1. Programming
- Online interactive python course:
Learn Python – Free Interactive Python Tutorial - Corey Schafer (Youtube):
Python Tutorials
2. Calculus
3. Linear Algebra
4. Probability and Statistics
Khan Academy Statistics and Probability
Main Topics
1. Machine Learning
- Stanford lecture with Andrew Ng on Machine Learning.
Lecture 1 – Welcome | Stanford CS229: Machine Learning (Autumn 2018)
- Introduction to Statistical Learning(Must Read)
https://www.statlearning.com/s/ISLRSeventhPrinting.pdf
- Stanford course for Machine learning, which includes R codes.
In-depth introduction to machine learning in 15 hours of expert videos
2. Useful Libraries and Youtube Videos
- numpy
Working with vectors, numbers, matrices, tensors
Keith Galli (Numpy):
Complete Python NumPy Tutorial (Creating Arrays, Indexing, Math, Statistics, Reshaping) - pandas
To read csv files and manipulate tabular data
Keith Galli (Pandas):
Complete Python Pandas Data Science Tutorial! (Reading CSV/Excel files, Sorting, Filtering, Groupby) - sklearn
For classical ML solutions, most algorithms can be found here
Keith Galli (Sklearn):
Real-World Python Machine Learning Tutorial w/ Scikit Learn (sklearn basics, NLP, classifiers, etc) - Matplotlib and seaborn
For visualizing tabular data
Derek Banas:
Seaborn Tutorial 2021 - os
To create folders, get folder/file names, make usable path strings
Corey Schafer:
Python Tutorial: OS Module – Use Underlying Operating System Functionality - glob
To get a list of files/folders within a folder. If you see yourself using os.dirlist, os.walk you should consider using glob instead
PythonHumanities:
Python and the Glob Function Easy Tutorial - shutil
To move and copy files around
shutil — High-level file operations — Python 3.9.6 documentation - cv2
To read images and visualize them, and a lot more computer vision operations
OpenCV Course – Full Tutorial with Python - PIL
To read images and visualize them, and detect corrupted images
Corey Schafer:
Python Tutorial: Image Manipulation with Pillow
3. Deep Learning
You need to develop intuition and understanding of how neural networks work and are implemented. In this section, I have included courses and materials.
Deep Learning Book
- Coursera Andrew Ng
This course is a must-watch, it covers a lot of underlying concepts and mathematical tools to understand what deep learning is. You can watch it freely on Coursera.
Deep Learning by deeplearning.ai
I recommend watching the first 4 courses of this specialization.
- Pytorch FrameWork:
Official tutorials are really good if you have a good understanding of deep learning theory: https://pytorch.org/tutorials/
There are two youtube videos that I found which covers a lot of topics
PyTorch for Deep Learning – Full Course / Tutorial Watch until 8th hour
PyTorch Tutorial 01 – Installation Watch until tutorial 17
- Krish Naik (YouTube)
Complete Road Map To Prepare For Deep Learning
- 3Blue1Brown
This is an awesome channel about mathematics that visualizes various concepts I highly recommend.
Neural Network: https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi