MongoDB Aggregation Pipelines – Step-by-Step Explanation
The MongoDB Aggregation Pipeline is a powerful framework for processing and transforming data within MongoDB. It allows you to perform complex data analysis, filtering, grouping, reshaping, and calculations — all in the database, without pulling raw data into your application.
MongoDB Aggregation Pipelines – Step-by-Step Explanation
MongoDB Aggregation Pipelines – Step-by-Step Explanation
MongoDB Aggregation Pipelines – Step-by-Step Explanation
The MongoDB Aggregation Pipeline is a powerful framework for processing and transforming data within MongoDB. It allows you to perform complex data analysis, filtering, grouping, reshaping, and calculations — all in the database, without pulling raw data into your application.
Think of it as a conveyor belt where documents flow through a series of stages, and each stage transforms the data before passing it to the next.
Why Use Aggregation Pipelines?
| Use Case | Example |
|---|---|
| Data summarization | Count users by country |
| Reporting | Monthly sales totals |
| Data cleaning | Remove duplicates, normalize fields |
| Real-time analytics | Top 10 products by revenue |
| ETL (Extract-Transform-Load) | Prepare data for dashboards |
Core Concept: The Pipeline
db.collection.aggregate([
{ $stage1 },
{ $stage2 },
{ $stage3 },
...
])
- Each
{ $stage }is a stage in the pipeline. - Documents flow from left to right.
- Output of one stage becomes input to the next.
- Final result is returned as an array of documents.
Key Aggregation Stages
Here are the most commonly used stages:
| Stage | Purpose | Syntax |
|---|---|---|
$match |
Filter documents | Like find() |
$project |
Select or reshape fields | Include, exclude, compute |
$group |
Group by field(s), aggregate | Like SQL GROUP BY |
$sort |
Sort results | 1 = asc, -1 = desc |
$limit / $skip |
Pagination | Limit number of results |
$unwind |
Deconstruct arrays | Turn array elements into separate docs |
$lookup |
Join with another collection | Like SQL JOIN |
$addFields |
Add new fields | Compute values |
$count |
Count documents | Returns total |
$out / $merge |
Write results to new collection | Export pipeline output |
Step-by-Step Examples
Let’s use a sample collection: orders
{
_id: 1,
customer: "Alice",
items: ["laptop", "mouse"],
total: 1200,
status: "completed",
country: "USA"
},
{
_id: 2,
customer: "Bob",
items: ["phone"],
total: 800,
status: "pending",
country: "Canada"
}
1. $match – Filter Documents
{ $match: { status: "completed" } }
Only completed orders pass through.
2. $project – Reshape Output
{
$project: {
customer: 1,
total: 1,
itemCount: { $size: "$items" }
}
}
Result:
{ customer: "Alice", total: 1200, itemCount: 2 }
3. $group – Group & Aggregate
Goal: Total sales per country
{
$group: {
_id: "$country", // Group by country
totalSales: { $sum: "$total" },
orderCount: { $sum: 1 },
avgOrder: { $avg: "$total" }
}
}
Result:
{ _id: "USA", totalSales: 1200, orderCount: 1, avgOrder: 1200 }
{ _id: "Canada", totalSales: 800, orderCount: 1, avgOrder: 800 }
Accumulators in
$group:
-$sum,$avg,$min,$max,$push,$addToSet,$first,$last
4. $sort – Order Results
{ $sort: { totalSales: -1 } }
Sort by total sales descending.
5. $limit – Top N Results
{ $limit: 3 }
Only top 3 countries.
6. $unwind – Expand Arrays
{ $unwind: "$items" }
Turns:
{ items: ["laptop", "mouse"] }
Into two documents:
{ items: "laptop" }, { items: "mouse" }
Useful for analyzing individual array elements.
7. $lookup – Join Collections
Assume another collection: customers
{
$lookup: {
from: "customers",
localField: "customer",
foreignField: "name",
as: "customerInfo"
}
}
Adds customer details (e.g., email, phone) from another collection.
8. $addFields – Compute New Fields
{
$addFields: {
tax: { $multiply: ["$total", 0.08] },
totalWithTax: { $add: ["$total", { $multiply: ["$total", 0.08] }] }
}
}
Full Example: Top Customers by Spending
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $group: {
_id: "$customer",
totalSpent: { $sum: "$total" },
orders: { $sum: 1 }
}},
{ $sort: { totalSpent: -1 } },
{ $limit: 5 },
{ $project: {
customer: "$_id",
totalSpent: 1,
orders: 1,
_id: 0
}}
])
Output:
{ customer: "Alice", totalSpent: 1200, orders: 1 }
Advanced: Conditional Logic
Use $cond, $switch, $ifNull
{
$addFields: {
statusLabel: {
$switch: {
branches: [
{ case: { $eq: ["$status", "completed"] }, then: "Done" },
{ case: { $eq: ["$status", "pending"] }, then: "Waiting" }
],
default: "Unknown"
}
}
}
}
Performance Tips
- Put
$matchearly – filter as soon as possible. - Use indexes on fields in
$matchand$sort. - Project only needed fields early to reduce memory.
- Avoid large
$unwindon big arrays. - Use
allowDiskUse: truefor large datasets:
db.collection.aggregate(pipeline, { allowDiskUse: true })
Tools to Visualize Pipelines
- MongoDB Compass – Drag-and-drop pipeline builder
- MongoDB Atlas – Visual pipeline editor
- mongosh – Test in shell
Summary: Pipeline Flow
[Raw Docs]
↓
[$match] → filter
↓
[$project/$addFields] → reshape
↓
[$unwind] → expand arrays
↓
[$group] → aggregate
↓
[$sort] → order
↓
[$limit] → paginate
↓
[Result]
Official Docs & Resources
Practice Tip: Start with small datasets. Build pipelines step by step in mongosh, adding one stage at a time and checking output with .pretty().
Let me know if you want a real-world example (e.g., e-commerce dashboard, log analysis, user analytics)!
MongoDB Aggregation Pipelines – Step-by-Step Explanation
The MongoDB Aggregation Pipeline is a powerful framework for processing and transforming data within MongoDB. It allows you to perform complex data analysis, filtering, grouping, reshaping, and calculations — all in the database, without pulling raw data into your application.
MongoDB Aggregation Pipelines – Step-by-Step Explanation
MongoDB Aggregation Pipelines – Step-by-Step Explanation
MongoDB Aggregation Pipelines – Step-by-Step Explanation
The MongoDB Aggregation Pipeline is a powerful framework for processing and transforming data within MongoDB. It allows you to perform complex data analysis, filtering, grouping, reshaping, and calculations — all in the database, without pulling raw data into your application.
Think of it as a conveyor belt where documents flow through a series of stages, and each stage transforms the data before passing it to the next.
Why Use Aggregation Pipelines?
| Use Case | Example |
|---|---|
| Data summarization | Count users by country |
| Reporting | Monthly sales totals |
| Data cleaning | Remove duplicates, normalize fields |
| Real-time analytics | Top 10 products by revenue |
| ETL (Extract-Transform-Load) | Prepare data for dashboards |
Core Concept: The Pipeline
db.collection.aggregate([
{ $stage1 },
{ $stage2 },
{ $stage3 },
...
])
- Each
{ $stage }is a stage in the pipeline. - Documents flow from left to right.
- Output of one stage becomes input to the next.
- Final result is returned as an array of documents.
Key Aggregation Stages
Here are the most commonly used stages:
| Stage | Purpose | Syntax |
|---|---|---|
$match |
Filter documents | Like find() |
$project |
Select or reshape fields | Include, exclude, compute |
$group |
Group by field(s), aggregate | Like SQL GROUP BY |
$sort |
Sort results | 1 = asc, -1 = desc |
$limit / $skip |
Pagination | Limit number of results |
$unwind |
Deconstruct arrays | Turn array elements into separate docs |
$lookup |
Join with another collection | Like SQL JOIN |
$addFields |
Add new fields | Compute values |
$count |
Count documents | Returns total |
$out / $merge |
Write results to new collection | Export pipeline output |
Step-by-Step Examples
Let’s use a sample collection: orders
{
_id: 1,
customer: "Alice",
items: ["laptop", "mouse"],
total: 1200,
status: "completed",
country: "USA"
},
{
_id: 2,
customer: "Bob",
items: ["phone"],
total: 800,
status: "pending",
country: "Canada"
}
1. $match – Filter Documents
{ $match: { status: "completed" } }
Only completed orders pass through.
2. $project – Reshape Output
{
$project: {
customer: 1,
total: 1,
itemCount: { $size: "$items" }
}
}
Result:
{ customer: "Alice", total: 1200, itemCount: 2 }
3. $group – Group & Aggregate
Goal: Total sales per country
{
$group: {
_id: "$country", // Group by country
totalSales: { $sum: "$total" },
orderCount: { $sum: 1 },
avgOrder: { $avg: "$total" }
}
}
Result:
{ _id: "USA", totalSales: 1200, orderCount: 1, avgOrder: 1200 }
{ _id: "Canada", totalSales: 800, orderCount: 1, avgOrder: 800 }
Accumulators in
$group:
-$sum,$avg,$min,$max,$push,$addToSet,$first,$last
4. $sort – Order Results
{ $sort: { totalSales: -1 } }
Sort by total sales descending.
5. $limit – Top N Results
{ $limit: 3 }
Only top 3 countries.
6. $unwind – Expand Arrays
{ $unwind: "$items" }
Turns:
{ items: ["laptop", "mouse"] }
Into two documents:
{ items: "laptop" }, { items: "mouse" }
Useful for analyzing individual array elements.
7. $lookup – Join Collections
Assume another collection: customers
{
$lookup: {
from: "customers",
localField: "customer",
foreignField: "name",
as: "customerInfo"
}
}
Adds customer details (e.g., email, phone) from another collection.
8. $addFields – Compute New Fields
{
$addFields: {
tax: { $multiply: ["$total", 0.08] },
totalWithTax: { $add: ["$total", { $multiply: ["$total", 0.08] }] }
}
}
Full Example: Top Customers by Spending
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $group: {
_id: "$customer",
totalSpent: { $sum: "$total" },
orders: { $sum: 1 }
}},
{ $sort: { totalSpent: -1 } },
{ $limit: 5 },
{ $project: {
customer: "$_id",
totalSpent: 1,
orders: 1,
_id: 0
}}
])
Output:
{ customer: "Alice", totalSpent: 1200, orders: 1 }
Advanced: Conditional Logic
Use $cond, $switch, $ifNull
{
$addFields: {
statusLabel: {
$switch: {
branches: [
{ case: { $eq: ["$status", "completed"] }, then: "Done" },
{ case: { $eq: ["$status", "pending"] }, then: "Waiting" }
],
default: "Unknown"
}
}
}
}
Performance Tips
- Put
$matchearly – filter as soon as possible. - Use indexes on fields in
$matchand$sort. - Project only needed fields early to reduce memory.
- Avoid large
$unwindon big arrays. - Use
allowDiskUse: truefor large datasets:
db.collection.aggregate(pipeline, { allowDiskUse: true })
Tools to Visualize Pipelines
- MongoDB Compass – Drag-and-drop pipeline builder
- MongoDB Atlas – Visual pipeline editor
- mongosh – Test in shell
Summary: Pipeline Flow
[Raw Docs]
↓
[$match] → filter
↓
[$project/$addFields] → reshape
↓
[$unwind] → expand arrays
↓
[$group] → aggregate
↓
[$sort] → order
↓
[$limit] → paginate
↓
[Result]
Official Docs & Resources
Practice Tip: Start with small datasets. Build pipelines step by step in mongosh, adding one stage at a time and checking output with .pretty().
Let me know if you want a real-world example (e.g., e-commerce dashboard, log analysis, user analytics)!