Learning Resources and Roadmap for Data Scientists

Learning Data science

The following is a roadmap that will help you enter the data science field. The resources mentioned here are available for free. Data science is a field that has multiple prerequisites: mathematics, programming, and machine learning theory. The fields are highly interconnected. In some cases to understand the mathematics behind it, you will benefit if you know the implementation or vice versa. If you get stuck on a topic and don’t understand it fully, continue learning to the next topic or the same topic in another field (i.e. math vs theory). Think of this learning in a cyclical manner. Go over things, just to grasp the concepts, then revisit them to understand fully.

Free Resources for Beginners

1. Programming

2. Calculus

Calculus 1 | Math

Essence of calculus

3. Linear Algebra

Khan Academy Linear Algebra

Essence of linear algebra

4. Probability and Statistics

Khan Academy Statistics and Probability

Main Topics

1. Machine Learning

  • Stanford lecture with Andrew Ng on Machine Learning.

Lecture 1 – Welcome | Stanford CS229: Machine Learning (Autumn 2018)

  • Introduction to Statistical Learning(Must Read)

https://www.statlearning.com/s/ISLRSeventhPrinting.pdf

  • Stanford course for Machine learning, which includes R codes.

 In-depth introduction to machine learning in 15 hours of expert videos

2. Useful Libraries and Youtube Videos

  1. numpy
    Working with vectors, numbers, matrices, tensors
    Keith Galli (Numpy):
    Complete Python NumPy Tutorial (Creating Arrays, Indexing, Math, Statistics, Reshaping)
  2. pandas
    To read csv files and manipulate tabular data
    Keith Galli (Pandas):
    Complete Python Pandas Data Science Tutorial! (Reading CSV/Excel files, Sorting, Filtering, Groupby)
  3. sklearn
    For classical ML solutions, most algorithms can be found here
    Keith Galli (Sklearn):
    Real-World Python Machine Learning Tutorial w/ Scikit Learn (sklearn basics, NLP, classifiers, etc)
  4. Matplotlib and seaborn
    For visualizing tabular data
    Derek Banas:
    Seaborn Tutorial 2021
  5. os
    To create folders, get folder/file names, make usable path strings
    Corey Schafer:
    Python Tutorial: OS Module – Use Underlying Operating System Functionality
  6. glob
    To get a list of files/folders within a folder. If you see yourself using os.dirlist, os.walk you should consider using glob instead
    PythonHumanities:
    Python and the Glob Function Easy Tutorial
  7. shutil
    To move and copy files around
    shutil — High-level file operations — Python 3.9.6 documentation
  8. cv2
    To read images and visualize them, and a lot more computer vision operations
    OpenCV Course – Full Tutorial with Python
  9. PIL
    To read images and visualize them, and detect corrupted images
    Corey Schafer:
    Python Tutorial: Image Manipulation with Pillow

3. Deep Learning

You need to develop intuition and understanding of how neural networks work and are implemented. In this section, I have included courses and materials.
Deep Learning Book

  • Coursera Andrew Ng

This course is a must-watch, it covers a lot of underlying concepts and mathematical tools to understand what deep learning is. You can watch it freely on Coursera.

Deep Learning by deeplearning.ai

I recommend watching the first 4 courses of this specialization.

  • Pytorch FrameWork:

Official tutorials are really good if you have a good understanding of deep learning theory: https://pytorch.org/tutorials/

There are two youtube videos that I found which covers a lot of topics

PyTorch for Deep Learning – Full Course / Tutorial Watch until 8th hour

PyTorch Tutorial 01 – Installation Watch until tutorial 17

  • Krish Naik (YouTube)

Complete Road Map To Prepare For Deep Learning

  • 3Blue1Brown

This is an awesome channel about mathematics that visualizes various concepts I highly recommend.
Neural Network: https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi