MongoDB Aggregation Pipelines – Step-by-Step Explanation

The MongoDB Aggregation Pipeline is a powerful framework for processing and transforming data within MongoDB. It allows you to perform complex data analysis, filtering, grouping, reshaping, and calculations — all in the database, without pulling raw data into your application.

MongoDB Aggregation Pipelines – Step-by-Step Explanation

MongoDB Aggregation Pipelines – Step-by-Step Explanation

MongoDB Aggregation Pipelines – Step-by-Step Explanation

The MongoDB Aggregation Pipeline is a powerful framework for processing and transforming data within MongoDB. It allows you to perform complex data analysis, filtering, grouping, reshaping, and calculations — all in the database, without pulling raw data into your application.

Think of it as a conveyor belt where documents flow through a series of stages, and each stage transforms the data before passing it to the next.


Why Use Aggregation Pipelines?

Use Case Example
Data summarization Count users by country
Reporting Monthly sales totals
Data cleaning Remove duplicates, normalize fields
Real-time analytics Top 10 products by revenue
ETL (Extract-Transform-Load) Prepare data for dashboards

Core Concept: The Pipeline

db.collection.aggregate([
  { $stage1 },
  { $stage2 },
  { $stage3 },
  ...
])
  • Each { $stage } is a stage in the pipeline.
  • Documents flow from left to right.
  • Output of one stage becomes input to the next.
  • Final result is returned as an array of documents.

Key Aggregation Stages

Here are the most commonly used stages:

Stage Purpose Syntax
$match Filter documents Like find()
$project Select or reshape fields Include, exclude, compute
$group Group by field(s), aggregate Like SQL GROUP BY
$sort Sort results 1 = asc, -1 = desc
$limit / $skip Pagination Limit number of results
$unwind Deconstruct arrays Turn array elements into separate docs
$lookup Join with another collection Like SQL JOIN
$addFields Add new fields Compute values
$count Count documents Returns total
$out / $merge Write results to new collection Export pipeline output

Step-by-Step Examples

Let’s use a sample collection: orders

{
  _id: 1,
  customer: "Alice",
  items: ["laptop", "mouse"],
  total: 1200,
  status: "completed",
  country: "USA"
},
{
  _id: 2,
  customer: "Bob",
  items: ["phone"],
  total: 800,
  status: "pending",
  country: "Canada"
}

1. $match – Filter Documents

{ $match: { status: "completed" } }

Only completed orders pass through.


2. $project – Reshape Output

{
  $project: {
    customer: 1,
    total: 1,
    itemCount: { $size: "$items" }
  }
}

Result:

{ customer: "Alice", total: 1200, itemCount: 2 }

3. $group – Group & Aggregate

Goal: Total sales per country

{
  $group: {
    _id: "$country",           // Group by country
    totalSales: { $sum: "$total" },
    orderCount: { $sum: 1 },
    avgOrder: { $avg: "$total" }
  }
}

Result:

{ _id: "USA", totalSales: 1200, orderCount: 1, avgOrder: 1200 }
{ _id: "Canada", totalSales: 800, orderCount: 1, avgOrder: 800 }

Accumulators in $group:
- $sum, $avg, $min, $max, $push, $addToSet, $first, $last


4. $sort – Order Results

{ $sort: { totalSales: -1 } }

Sort by total sales descending.


5. $limit – Top N Results

{ $limit: 3 }

Only top 3 countries.


6. $unwind – Expand Arrays

{ $unwind: "$items" }

Turns:

{ items: ["laptop", "mouse"] }

Into two documents:

{ items: "laptop" }, { items: "mouse" }

Useful for analyzing individual array elements.


7. $lookup – Join Collections

Assume another collection: customers

{
  $lookup: {
    from: "customers",
    localField: "customer",
    foreignField: "name",
    as: "customerInfo"
  }
}

Adds customer details (e.g., email, phone) from another collection.


8. $addFields – Compute New Fields

{
  $addFields: {
    tax: { $multiply: ["$total", 0.08] },
    totalWithTax: { $add: ["$total", { $multiply: ["$total", 0.08] }] }
  }
}

Full Example: Top Customers by Spending

db.orders.aggregate([
  { $match: { status: "completed" } },

  { $group: {
      _id: "$customer",
      totalSpent: { $sum: "$total" },
      orders: { $sum: 1 }
  }},

  { $sort: { totalSpent: -1 } },

  { $limit: 5 },

  { $project: {
      customer: "$_id",
      totalSpent: 1,
      orders: 1,
      _id: 0
  }}
])

Output:

{ customer: "Alice", totalSpent: 1200, orders: 1 }

Advanced: Conditional Logic

Use $cond, $switch, $ifNull

{
  $addFields: {
    statusLabel: {
      $switch: {
        branches: [
          { case: { $eq: ["$status", "completed"] }, then: "Done" },
          { case: { $eq: ["$status", "pending"] }, then: "Waiting" }
        ],
        default: "Unknown"
      }
    }
  }
}

Performance Tips

  1. Put $match early – filter as soon as possible.
  2. Use indexes on fields in $match and $sort.
  3. Project only needed fields early to reduce memory.
  4. Avoid large $unwind on big arrays.
  5. Use allowDiskUse: true for large datasets:
db.collection.aggregate(pipeline, { allowDiskUse: true })

Tools to Visualize Pipelines

  • MongoDB Compass – Drag-and-drop pipeline builder
  • MongoDB Atlas – Visual pipeline editor
  • mongosh – Test in shell

Summary: Pipeline Flow

[Raw Docs]
     
[$match] → filter
     
[$project/$addFields] → reshape
     
[$unwind] → expand arrays
     
[$group] → aggregate
     
[$sort] → order
     
[$limit] → paginate
     
[Result]

Official Docs & Resources


Practice Tip: Start with small datasets. Build pipelines step by step in mongosh, adding one stage at a time and checking output with .pretty().

Let me know if you want a real-world example (e.g., e-commerce dashboard, log analysis, user analytics)!

Last updated: Nov 12, 2025

MongoDB Aggregation Pipelines – Step-by-Step Explanation

The MongoDB Aggregation Pipeline is a powerful framework for processing and transforming data within MongoDB. It allows you to perform complex data analysis, filtering, grouping, reshaping, and calculations — all in the database, without pulling raw data into your application.

MongoDB Aggregation Pipelines – Step-by-Step Explanation

MongoDB Aggregation Pipelines – Step-by-Step Explanation

MongoDB Aggregation Pipelines – Step-by-Step Explanation

The MongoDB Aggregation Pipeline is a powerful framework for processing and transforming data within MongoDB. It allows you to perform complex data analysis, filtering, grouping, reshaping, and calculations — all in the database, without pulling raw data into your application.

Think of it as a conveyor belt where documents flow through a series of stages, and each stage transforms the data before passing it to the next.


Why Use Aggregation Pipelines?

Use Case Example
Data summarization Count users by country
Reporting Monthly sales totals
Data cleaning Remove duplicates, normalize fields
Real-time analytics Top 10 products by revenue
ETL (Extract-Transform-Load) Prepare data for dashboards

Core Concept: The Pipeline

db.collection.aggregate([
  { $stage1 },
  { $stage2 },
  { $stage3 },
  ...
])
  • Each { $stage } is a stage in the pipeline.
  • Documents flow from left to right.
  • Output of one stage becomes input to the next.
  • Final result is returned as an array of documents.

Key Aggregation Stages

Here are the most commonly used stages:

Stage Purpose Syntax
$match Filter documents Like find()
$project Select or reshape fields Include, exclude, compute
$group Group by field(s), aggregate Like SQL GROUP BY
$sort Sort results 1 = asc, -1 = desc
$limit / $skip Pagination Limit number of results
$unwind Deconstruct arrays Turn array elements into separate docs
$lookup Join with another collection Like SQL JOIN
$addFields Add new fields Compute values
$count Count documents Returns total
$out / $merge Write results to new collection Export pipeline output

Step-by-Step Examples

Let’s use a sample collection: orders

{
  _id: 1,
  customer: "Alice",
  items: ["laptop", "mouse"],
  total: 1200,
  status: "completed",
  country: "USA"
},
{
  _id: 2,
  customer: "Bob",
  items: ["phone"],
  total: 800,
  status: "pending",
  country: "Canada"
}

1. $match – Filter Documents

{ $match: { status: "completed" } }

Only completed orders pass through.


2. $project – Reshape Output

{
  $project: {
    customer: 1,
    total: 1,
    itemCount: { $size: "$items" }
  }
}

Result:

{ customer: "Alice", total: 1200, itemCount: 2 }

3. $group – Group & Aggregate

Goal: Total sales per country

{
  $group: {
    _id: "$country",           // Group by country
    totalSales: { $sum: "$total" },
    orderCount: { $sum: 1 },
    avgOrder: { $avg: "$total" }
  }
}

Result:

{ _id: "USA", totalSales: 1200, orderCount: 1, avgOrder: 1200 }
{ _id: "Canada", totalSales: 800, orderCount: 1, avgOrder: 800 }

Accumulators in $group:
- $sum, $avg, $min, $max, $push, $addToSet, $first, $last


4. $sort – Order Results

{ $sort: { totalSales: -1 } }

Sort by total sales descending.


5. $limit – Top N Results

{ $limit: 3 }

Only top 3 countries.


6. $unwind – Expand Arrays

{ $unwind: "$items" }

Turns:

{ items: ["laptop", "mouse"] }

Into two documents:

{ items: "laptop" }, { items: "mouse" }

Useful for analyzing individual array elements.


7. $lookup – Join Collections

Assume another collection: customers

{
  $lookup: {
    from: "customers",
    localField: "customer",
    foreignField: "name",
    as: "customerInfo"
  }
}

Adds customer details (e.g., email, phone) from another collection.


8. $addFields – Compute New Fields

{
  $addFields: {
    tax: { $multiply: ["$total", 0.08] },
    totalWithTax: { $add: ["$total", { $multiply: ["$total", 0.08] }] }
  }
}

Full Example: Top Customers by Spending

db.orders.aggregate([
  { $match: { status: "completed" } },

  { $group: {
      _id: "$customer",
      totalSpent: { $sum: "$total" },
      orders: { $sum: 1 }
  }},

  { $sort: { totalSpent: -1 } },

  { $limit: 5 },

  { $project: {
      customer: "$_id",
      totalSpent: 1,
      orders: 1,
      _id: 0
  }}
])

Output:

{ customer: "Alice", totalSpent: 1200, orders: 1 }

Advanced: Conditional Logic

Use $cond, $switch, $ifNull

{
  $addFields: {
    statusLabel: {
      $switch: {
        branches: [
          { case: { $eq: ["$status", "completed"] }, then: "Done" },
          { case: { $eq: ["$status", "pending"] }, then: "Waiting" }
        ],
        default: "Unknown"
      }
    }
  }
}

Performance Tips

  1. Put $match early – filter as soon as possible.
  2. Use indexes on fields in $match and $sort.
  3. Project only needed fields early to reduce memory.
  4. Avoid large $unwind on big arrays.
  5. Use allowDiskUse: true for large datasets:
db.collection.aggregate(pipeline, { allowDiskUse: true })

Tools to Visualize Pipelines

  • MongoDB Compass – Drag-and-drop pipeline builder
  • MongoDB Atlas – Visual pipeline editor
  • mongosh – Test in shell

Summary: Pipeline Flow

[Raw Docs]
     
[$match] → filter
     
[$project/$addFields] → reshape
     
[$unwind] → expand arrays
     
[$group] → aggregate
     
[$sort] → order
     
[$limit] → paginate
     
[Result]

Official Docs & Resources


Practice Tip: Start with small datasets. Build pipelines step by step in mongosh, adding one stage at a time and checking output with .pretty().

Let me know if you want a real-world example (e.g., e-commerce dashboard, log analysis, user analytics)!

Last updated: Nov 12, 2025