Data Science Roadmap
This is a practical, step-by-step roadmap to go from zero to employable Data Scientist in 12–18 months (full-time) or 18–24 months (part-time). Focus on skills that pay, portfolio projects, and real-world impact.
Data Science Roadmap
Data Science Roadmap
Data Science Roadmap
This is a practical, step-by-step roadmap to go from zero to employable Data Scientist in 12–18 months (full-time) or 18–24 months (part-time). Focus on skills that pay, portfolio projects, and real-world impact.
Phase 0: Mindset
| Task | Resources |
|---|---|
| Install Python, VS Code, Git | Anaconda |
| Create GitHub + LinkedIn | Clean profile photo + headline |
| Join communities | Reddit r/datascience, Discord (DataTalks.Club), LinkedIn groups |
Phase 1: Foundations
Goal: Speak the language of data
| Topic | Resources |
|---|---|
| Python Basics | Automate the Boring Stuff (Ch 1–6) |
| Pandas & NumPy | 10 Minutes to Pandas (official) |
| Data Cleaning | Kaggle "Pandas" course (free) |
| SQL | Mode Analytics SQL Tutorial OR LeetCode SQL 50 |
Mini-Project:
Clean + analyze a Kaggle dataset (e.g., Titanic) → GitHub repo with
README.md
Phase 2: Statistics & Math
Goal: Don’t just run models — understand them
| Topic | Resources |
|---|---|
| Descriptive & Inferential Stats | StatQuest (YouTube) |
| Probability (Bayes, distributions) | Khan Academy |
| Hypothesis Testing (p-values, A/B) | Practical Statistics for Data Scientists (book) |
| Linear Algebra (vectors, matrices) | 3Blue1Brown Essence of Linear Algebra |
Practice:
Solve 20 problems on DataCamp or StrataScratch
Phase 3: Data Visualization
Goal: Tell stories with data
| Tool | Learn |
|---|---|
| Matplotlib/Seaborn | Python Plotting for Exploratory Analysis |
| Tableau Public | Build 3 dashboards |
| Power BI | (Optional for BI roles) |
Project:
World Happiness Report → Interactive dashboard (Tableau Public)
Phase 4: Machine Learning Core
Goal: Build & evaluate models
| Topic | Resources |
|---|---|
| Scikit-learn pipeline | Kaggle "Intermediate ML" course |
| Regression (Linear, Logistic) | Andrew Ng’s ML Course (free audit) |
| Classification (Trees, SVM, KNN) | Hands-On ML (Aurélien Géron) Ch 2–6 |
| Model Evaluation (AUC, F1, confusion matrix) | StatQuest |
| Cross-validation & Hyperparameter tuning | GridSearchCV / Optuna |
Projects (Pick 2):
1. House Prices → Feature eng + XGBoost
2. Customer Churn → Logistic + SHAP explanations
Phase 5: Advanced ML & MLOps
Goal: Production-ready models
| Topic | Tools/Resources |
|---|---|
| XGBoost / LightGBM | Kaggle competitions |
| Feature Engineering | Feature-engine library |
| NLP Basics | HuggingFace "NLP Course" (free) |
| Time Series | Store Item Demand Forecasting (Kaggle) |
| Docker | "Docker for Data Science" (YouTube) |
| MLflow / DVC | Track experiments |
| FastAPI | Deploy model as API |
Capstone Project:
End-to-end ML system:
data → clean → model → API → Streamlit dashboard
Example: Credit Card Fraud Detection with imbalance handling (SMOTE) + API
Phase 6: Big Data & Cloud
Optional but high-paying
| Skill | Platform |
|---|---|
| PySpark | Databricks Community Edition |
| AWS/GCP | Free tier (S3, EC2, SageMaker) |
| dbt (data build tool) | For analytics engineering |
Project:
Process 1M+ rows with PySpark → store in S3 → query with Athena
Phase 7: Job Prep & Portfolio
Goal: Get hired
Portfolio (3 Projects)
| Type | Example |
|---|---|
| Predictive | House Price Prediction (Kaggle top 20%) |
| NLP | Sentiment Analysis on Twitter (HuggingFace) |
| End-to-End | Fraud Detection API + Dashboard |
Host: GitHub + Streamlit/Gradio + LinkedIn posts
Resume
- Quantify: “Improved AUC from 0.72 → 0.89”
- Keywords: Pandas, Scikit-learn, SQL, AWS, A/B testing
Interview Prep
| Type | Resource |
|---|---|
| SQL | LeetCode (Top 50) |
| Python | HackerRank Data Science |
| Case Studies | "Cracking the Data Science Interview" |
| Behavioral | STAR method |
Weekly Schedule (Full-Time)
| Day | Focus |
|---|---|
| Mon–Wed | Learn + code (4h) |
| Thu | Project work |
| Fri | LeetCode / SQL (50 problems) |
| Sat | Portfolio + write blog |
| Sun | Rest / review |
Salary Expectations (2025)
| Role | USA | India | Remote |
|---|---|---|---|
| Junior DS | $95K–$130K | ₹12–20 LPA | $70K–$100K |
| Mid-Level | $130K–$180K | ₹20–35 LPA | $100K–$140K |
Pro Tips
- Contribute to open source (e.g., scikit-learn bugs)
- Write 1 LinkedIn post/week about your project
- Apply to 10 jobs/week after Phase 5
- Get 1 mentor (via ADPList.org)
Free Resources Summary
| Topic | Link |
|---|---|
| Python | Python.org |
| Kaggle Courses | kaggle.com/learn |
| StatQuest | YouTube |
| HuggingFace | huggingface.co/course |
| Streamlit | streamlit.io |
Start today: Open Kaggle Titanic, download data, and run pd.read_csv().
“The best time to start was yesterday. The next best time is now.”
Save this roadmap. Share with a friend. Tag me when you land your first DS job!
Data Science Roadmap
This is a practical, step-by-step roadmap to go from zero to employable Data Scientist in 12–18 months (full-time) or 18–24 months (part-time). Focus on skills that pay, portfolio projects, and real-world impact.
Data Science Roadmap
Data Science Roadmap
Data Science Roadmap
This is a practical, step-by-step roadmap to go from zero to employable Data Scientist in 12–18 months (full-time) or 18–24 months (part-time). Focus on skills that pay, portfolio projects, and real-world impact.
Phase 0: Mindset
| Task | Resources |
|---|---|
| Install Python, VS Code, Git | Anaconda |
| Create GitHub + LinkedIn | Clean profile photo + headline |
| Join communities | Reddit r/datascience, Discord (DataTalks.Club), LinkedIn groups |
Phase 1: Foundations
Goal: Speak the language of data
| Topic | Resources |
|---|---|
| Python Basics | Automate the Boring Stuff (Ch 1–6) |
| Pandas & NumPy | 10 Minutes to Pandas (official) |
| Data Cleaning | Kaggle "Pandas" course (free) |
| SQL | Mode Analytics SQL Tutorial OR LeetCode SQL 50 |
Mini-Project:
Clean + analyze a Kaggle dataset (e.g., Titanic) → GitHub repo with
README.md
Phase 2: Statistics & Math
Goal: Don’t just run models — understand them
| Topic | Resources |
|---|---|
| Descriptive & Inferential Stats | StatQuest (YouTube) |
| Probability (Bayes, distributions) | Khan Academy |
| Hypothesis Testing (p-values, A/B) | Practical Statistics for Data Scientists (book) |
| Linear Algebra (vectors, matrices) | 3Blue1Brown Essence of Linear Algebra |
Practice:
Solve 20 problems on DataCamp or StrataScratch
Phase 3: Data Visualization
Goal: Tell stories with data
| Tool | Learn |
|---|---|
| Matplotlib/Seaborn | Python Plotting for Exploratory Analysis |
| Tableau Public | Build 3 dashboards |
| Power BI | (Optional for BI roles) |
Project:
World Happiness Report → Interactive dashboard (Tableau Public)
Phase 4: Machine Learning Core
Goal: Build & evaluate models
| Topic | Resources |
|---|---|
| Scikit-learn pipeline | Kaggle "Intermediate ML" course |
| Regression (Linear, Logistic) | Andrew Ng’s ML Course (free audit) |
| Classification (Trees, SVM, KNN) | Hands-On ML (Aurélien Géron) Ch 2–6 |
| Model Evaluation (AUC, F1, confusion matrix) | StatQuest |
| Cross-validation & Hyperparameter tuning | GridSearchCV / Optuna |
Projects (Pick 2):
1. House Prices → Feature eng + XGBoost
2. Customer Churn → Logistic + SHAP explanations
Phase 5: Advanced ML & MLOps
Goal: Production-ready models
| Topic | Tools/Resources |
|---|---|
| XGBoost / LightGBM | Kaggle competitions |
| Feature Engineering | Feature-engine library |
| NLP Basics | HuggingFace "NLP Course" (free) |
| Time Series | Store Item Demand Forecasting (Kaggle) |
| Docker | "Docker for Data Science" (YouTube) |
| MLflow / DVC | Track experiments |
| FastAPI | Deploy model as API |
Capstone Project:
End-to-end ML system:
data → clean → model → API → Streamlit dashboard
Example: Credit Card Fraud Detection with imbalance handling (SMOTE) + API
Phase 6: Big Data & Cloud
Optional but high-paying
| Skill | Platform |
|---|---|
| PySpark | Databricks Community Edition |
| AWS/GCP | Free tier (S3, EC2, SageMaker) |
| dbt (data build tool) | For analytics engineering |
Project:
Process 1M+ rows with PySpark → store in S3 → query with Athena
Phase 7: Job Prep & Portfolio
Goal: Get hired
Portfolio (3 Projects)
| Type | Example |
|---|---|
| Predictive | House Price Prediction (Kaggle top 20%) |
| NLP | Sentiment Analysis on Twitter (HuggingFace) |
| End-to-End | Fraud Detection API + Dashboard |
Host: GitHub + Streamlit/Gradio + LinkedIn posts
Resume
- Quantify: “Improved AUC from 0.72 → 0.89”
- Keywords: Pandas, Scikit-learn, SQL, AWS, A/B testing
Interview Prep
| Type | Resource |
|---|---|
| SQL | LeetCode (Top 50) |
| Python | HackerRank Data Science |
| Case Studies | "Cracking the Data Science Interview" |
| Behavioral | STAR method |
Weekly Schedule (Full-Time)
| Day | Focus |
|---|---|
| Mon–Wed | Learn + code (4h) |
| Thu | Project work |
| Fri | LeetCode / SQL (50 problems) |
| Sat | Portfolio + write blog |
| Sun | Rest / review |
Salary Expectations (2025)
| Role | USA | India | Remote |
|---|---|---|---|
| Junior DS | $95K–$130K | ₹12–20 LPA | $70K–$100K |
| Mid-Level | $130K–$180K | ₹20–35 LPA | $100K–$140K |
Pro Tips
- Contribute to open source (e.g., scikit-learn bugs)
- Write 1 LinkedIn post/week about your project
- Apply to 10 jobs/week after Phase 5
- Get 1 mentor (via ADPList.org)
Free Resources Summary
| Topic | Link |
|---|---|
| Python | Python.org |
| Kaggle Courses | kaggle.com/learn |
| StatQuest | YouTube |
| HuggingFace | huggingface.co/course |
| Streamlit | streamlit.io |
Start today: Open Kaggle Titanic, download data, and run pd.read_csv().
“The best time to start was yesterday. The next best time is now.”
Save this roadmap. Share with a friend. Tag me when you land your first DS job!