Data Science Roadmap

This is a practical, step-by-step roadmap to go from zero to employable Data Scientist in 12–18 months (full-time) or 18–24 months (part-time). Focus on skills that pay, portfolio projects, and real-world impact.

Data Science Roadmap

This is a practical, step-by-step roadmap to go from zero to employable Data Scientist in 12–18 months (full-time) or 18–24 months (part-time). Focus on skills that pay, portfolio projects, and real-world impact.

Phase 0: Mindset

Task	Resources
Install Python, VS Code, Git	Anaconda
Create GitHub + LinkedIn	Clean profile photo + headline
Join communities	Reddit r/datascience, Discord (DataTalks.Club), LinkedIn groups

Phase 1: Foundations

Goal: Speak the language of data

Topic	Resources
Python Basics	Automate the Boring Stuff (Ch 1–6)
Pandas & NumPy	10 Minutes to Pandas (official)
Data Cleaning	Kaggle "Pandas" course (free)
SQL	Mode Analytics SQL Tutorial OR LeetCode SQL 50

Mini-Project:

Clean + analyze a Kaggle dataset (e.g., Titanic) → GitHub repo with README.md

Phase 2: Statistics & Math

Goal: Don’t just run models — understand them

Topic	Resources
Descriptive & Inferential Stats	StatQuest (YouTube)
Probability (Bayes, distributions)	Khan Academy
Hypothesis Testing (p-values, A/B)	Practical Statistics for Data Scientists (book)
Linear Algebra (vectors, matrices)	3Blue1Brown Essence of Linear Algebra

Practice:

Solve 20 problems on DataCamp or StrataScratch

Phase 3: Data Visualization

Goal: Tell stories with data

Tool	Learn
Matplotlib/Seaborn	Python Plotting for Exploratory Analysis
Tableau Public	Build 3 dashboards
Power BI	(Optional for BI roles)

Project:

World Happiness Report → Interactive dashboard (Tableau Public)

Phase 4: Machine Learning Core

Goal: Build & evaluate models

Topic	Resources
Scikit-learn pipeline	Kaggle "Intermediate ML" course
Regression (Linear, Logistic)	Andrew Ng’s ML Course (free audit)
Classification (Trees, SVM, KNN)	Hands-On ML (Aurélien Géron) Ch 2–6
Model Evaluation (AUC, F1, confusion matrix)	StatQuest
Cross-validation & Hyperparameter tuning	GridSearchCV / Optuna

Projects (Pick 2):
1. House Prices → Feature eng + XGBoost
2. Customer Churn → Logistic + SHAP explanations

Phase 5: Advanced ML & MLOps

Goal: Production-ready models

Topic	Tools/Resources
XGBoost / LightGBM	Kaggle competitions
Feature Engineering	Feature-engine library
NLP Basics	HuggingFace "NLP Course" (free)
Time Series	Store Item Demand Forecasting (Kaggle)
Docker	"Docker for Data Science" (YouTube)
MLflow / DVC	Track experiments
FastAPI	Deploy model as API

Capstone Project:

End-to-end ML system:
data → clean → model → API → Streamlit dashboard
Example: Credit Card Fraud Detection with imbalance handling (SMOTE) + API

Phase 6: Big Data & Cloud

Optional but high-paying

Skill	Platform
PySpark	Databricks Community Edition
AWS/GCP	Free tier (S3, EC2, SageMaker)
dbt (data build tool)	For analytics engineering

Project:

Process 1M+ rows with PySpark → store in S3 → query with Athena

Phase 7: Job Prep & Portfolio

Goal: Get hired

Portfolio (3 Projects)

Type	Example
Predictive	House Price Prediction (Kaggle top 20%)
NLP	Sentiment Analysis on Twitter (HuggingFace)
End-to-End	Fraud Detection API + Dashboard

Host: GitHub + Streamlit/Gradio + LinkedIn posts

Resume

Quantify: “Improved AUC from 0.72 → 0.89”
Keywords: Pandas, Scikit-learn, SQL, AWS, A/B testing

Interview Prep

Type	Resource
SQL	LeetCode (Top 50)
Python	HackerRank Data Science
Case Studies	"Cracking the Data Science Interview"
Behavioral	STAR method

Weekly Schedule (Full-Time)

Day	Focus
Mon–Wed	Learn + code (4h)
Thu	Project work
Fri	LeetCode / SQL (50 problems)
Sat	Portfolio + write blog
Sun	Rest / review

Salary Expectations (2025)

Role	USA	India	Remote
Junior DS	$95K–$130K	₹12–20 LPA	$70K–$100K
Mid-Level	$130K–$180K	₹20–35 LPA	$100K–$140K

Pro Tips

Contribute to open source (e.g., scikit-learn bugs)
Write 1 LinkedIn post/week about your project
Apply to 10 jobs/week after Phase 5
Get 1 mentor (via ADPList.org)

Free Resources Summary

Topic	Link
Python	Python.org
Kaggle Courses	kaggle.com/learn
StatQuest	YouTube
HuggingFace	huggingface.co/course
Streamlit	streamlit.io

Start today: Open Kaggle Titanic, download data, and run pd.read_csv().

“The best time to start was yesterday. The next best time is now.”

Save this roadmap. Share with a friend. Tag me when you land your first DS job!

Last updated: Nov 09, 2025

Data Science Roadmap

Phase 0: Mindset

Task	Resources
Install Python, VS Code, Git	Anaconda
Create GitHub + LinkedIn	Clean profile photo + headline
Join communities	Reddit r/datascience, Discord (DataTalks.Club), LinkedIn groups

Phase 1: Foundations

Goal: Speak the language of data

Topic	Resources
Python Basics	Automate the Boring Stuff (Ch 1–6)
Pandas & NumPy	10 Minutes to Pandas (official)
Data Cleaning	Kaggle "Pandas" course (free)
SQL	Mode Analytics SQL Tutorial OR LeetCode SQL 50

Mini-Project:

Clean + analyze a Kaggle dataset (e.g., Titanic) → GitHub repo with README.md

Phase 2: Statistics & Math

Goal: Don’t just run models — understand them

Topic	Resources
Descriptive & Inferential Stats	StatQuest (YouTube)
Probability (Bayes, distributions)	Khan Academy
Hypothesis Testing (p-values, A/B)	Practical Statistics for Data Scientists (book)
Linear Algebra (vectors, matrices)	3Blue1Brown Essence of Linear Algebra

Practice:

Solve 20 problems on DataCamp or StrataScratch

Phase 3: Data Visualization

Goal: Tell stories with data

Tool	Learn
Matplotlib/Seaborn	Python Plotting for Exploratory Analysis
Tableau Public	Build 3 dashboards
Power BI	(Optional for BI roles)

Project:

World Happiness Report → Interactive dashboard (Tableau Public)

Phase 4: Machine Learning Core

Goal: Build & evaluate models

Topic	Resources
Scikit-learn pipeline	Kaggle "Intermediate ML" course
Regression (Linear, Logistic)	Andrew Ng’s ML Course (free audit)
Classification (Trees, SVM, KNN)	Hands-On ML (Aurélien Géron) Ch 2–6
Model Evaluation (AUC, F1, confusion matrix)	StatQuest
Cross-validation & Hyperparameter tuning	GridSearchCV / Optuna

Projects (Pick 2):
1. House Prices → Feature eng + XGBoost
2. Customer Churn → Logistic + SHAP explanations

Phase 5: Advanced ML & MLOps

Goal: Production-ready models

Topic	Tools/Resources
XGBoost / LightGBM	Kaggle competitions
Feature Engineering	Feature-engine library
NLP Basics	HuggingFace "NLP Course" (free)
Time Series	Store Item Demand Forecasting (Kaggle)
Docker	"Docker for Data Science" (YouTube)
MLflow / DVC	Track experiments
FastAPI	Deploy model as API

Capstone Project:

End-to-end ML system:
data → clean → model → API → Streamlit dashboard
Example: Credit Card Fraud Detection with imbalance handling (SMOTE) + API

Phase 6: Big Data & Cloud

Optional but high-paying

Skill	Platform
PySpark	Databricks Community Edition
AWS/GCP	Free tier (S3, EC2, SageMaker)
dbt (data build tool)	For analytics engineering

Project:

Process 1M+ rows with PySpark → store in S3 → query with Athena

Phase 7: Job Prep & Portfolio

Goal: Get hired

Portfolio (3 Projects)

Type	Example
Predictive	House Price Prediction (Kaggle top 20%)
NLP	Sentiment Analysis on Twitter (HuggingFace)
End-to-End	Fraud Detection API + Dashboard

Host: GitHub + Streamlit/Gradio + LinkedIn posts

Resume

Quantify: “Improved AUC from 0.72 → 0.89”
Keywords: Pandas, Scikit-learn, SQL, AWS, A/B testing

Interview Prep

Type	Resource
SQL	LeetCode (Top 50)
Python	HackerRank Data Science
Case Studies	"Cracking the Data Science Interview"
Behavioral	STAR method

Weekly Schedule (Full-Time)

Day	Focus
Mon–Wed	Learn + code (4h)
Thu	Project work
Fri	LeetCode / SQL (50 problems)
Sat	Portfolio + write blog
Sun	Rest / review

Salary Expectations (2025)

Role	USA	India	Remote
Junior DS	$95K–$130K	₹12–20 LPA	$70K–$100K
Mid-Level	$130K–$180K	₹20–35 LPA	$100K–$140K

Pro Tips

Contribute to open source (e.g., scikit-learn bugs)
Write 1 LinkedIn post/week about your project
Apply to 10 jobs/week after Phase 5
Get 1 mentor (via ADPList.org)

Free Resources Summary

Topic	Link
Python	Python.org
Kaggle Courses	kaggle.com/learn
StatQuest	YouTube
HuggingFace	huggingface.co/course
Streamlit	streamlit.io

Start today: Open Kaggle Titanic, download data, and run pd.read_csv().

“The best time to start was yesterday. The next best time is now.”

Save this roadmap. Share with a friend. Tag me when you land your first DS job!

Last updated: Nov 09, 2025