Pagination with MongoDB using Aggregation Pipelines

Friday, March 10th 2023

Here is a commonly used pattern for building aggregation pipelines for paginated queries in MongoDB using the JavaScript npm client:

const MongoClient = require('mongodb').MongoClient; const PAGE_SIZE = 10; async function getPaginatedResults(page) { const client = await MongoClient.connect('mongodb://localhost:27017'); const db = client.db('mydb'); const collection = db.collection('mycollection'); const pipeline = [ { $sort: { _id: 1 } }, { $skip: (page - 1) * PAGE_SIZE }, { $limit: PAGE_SIZE } ]; const results = await collection.aggregate(pipeline).toArray(); const total = await collection.countDocuments(); client.close(); return { results, total }; } async function test() { const page = 1; const { results, total } = await getPaginatedResults(page); console.log(`Showing page ${page} of ${Math.ceil(total / PAGE_SIZE)}:`); console.log(results); } test();

In this pattern, we define a constant PAGE_SIZE to determine the number of results per page. We then define an async function getPaginatedResults that takes a page number as input and returns an object containing the results for that page and the total number of documents in the collection.

Inside the getPaginatedResults function, we connect to the MongoDB database and retrieve the mycollection collection. We then define an aggregation pipeline that sorts the documents by _id, skips the documents for the previous pages, and limits the results to PAGE_SIZE documents for the current page. We execute the pipeline using the aggregate() method and convert the results to an array using the toArray() method. We also count the total number of documents in the collection using the countDocuments() method.

Finally, we close the database connection and return an object containing the results for the current page and the total number of documents in the collection.

We can then call the getPaginatedResults function with a page number and display the results in the console using the console.log() method.


Dealing with large data sets

When dealing with very large data sets, it's important to design your aggregation pipeline to take advantage of the parallel processing capabilities of MongoDB. One way to do this is to use the $facet operator to split the pipeline into multiple sub-pipelines that can be executed in parallel.

Here's an example that demonstrates this pattern:

const MongoClient = require('mongodb').MongoClient; const PAGE_SIZE = 1000; async function getPaginatedResults(page) { const client = await MongoClient.connect('mongodb://localhost:27017'); const db = client.db('mydb'); const collection = db.collection('mycollection'); const total = await collection.countDocuments(); const pipeline = [ { $sort: { _id: 1 } }, { $facet: { data: [ { $skip: (page - 1) * PAGE_SIZE }, { $limit: PAGE_SIZE } ], metadata: [ { $group: { _id: null, count: { $sum: 1 } } }, { $project: { _id: 0, count: 1 } } ] } } ]; const [result] = await collection.aggregate(pipeline).toArray(); client.close(); return { results: result.data, total: total, pageCount: Math.ceil(result.metadata[0].count / PAGE_SIZE) }; } async function test() { const page = 1; const { results, total, pageCount } = await getPaginatedResults(page); console.log(`Showing page ${page} of ${pageCount}:`); console.log(results); console.log(`Total documents: ${total}`); } test();

In this example, we start by defining a constant PAGE_SIZE to determine the number of results per page. We then define an async function getPaginatedResults that takes a page number as input and returns an object containing the results for that page, the total number of documents in the collection, and the total number of pages.

Inside the getPaginatedResults function, we connect to the MongoDB database and retrieve the mycollection collection. We first count the total number of documents in the collection using the countDocuments() method.

We then define an aggregation pipeline that sorts the documents by _id and uses the $facet operator to split the pipeline into two sub-pipelines. The data sub-pipeline skips the documents for the previous pages and limits the results to PAGE_SIZE documents for the current page. The metadata sub-pipeline counts the total number of documents in the pipeline using the $group and $project stages.

We execute the pipeline using the aggregate() method and convert the results to an array using the toArray() method. We destructure the first element of the resulting array into a variable named result.

Finally, we close the database connection and return an object containing the results for the current page, the total number of documents in the collection, and the total number of pages.

We can then call the getPaginatedResults function with a page number and display the results in the console using the console.log() method.


Adding a search term and return selected fields only

To add a search term to the pipeline, we can simply add a $match stage to filter the documents based on a search term.

Here's an updated example that includes a search term and adds a $project stage to the pipeline to include only the title and description fields:

const MongoClient = require('mongodb').MongoClient; const PAGE_SIZE = 1000; async function getPaginatedResults(page, searchTerm) { const client = await MongoClient.connect('mongodb://localhost:27017'); const db = client.db('mydb'); const collection = db.collection('mycollection'); const total = await collection.countDocuments(); const pipeline = [ { $match: { $text: { $search: searchTerm } } }, { $sort: { _id: 1 } }, { $project: { _id: 0, title: 1, description: 1 } }, { $facet: { data: [ { $skip: (page - 1) * PAGE_SIZE }, { $limit: PAGE_SIZE } ], metadata: [ { $group: { _id: null, count: { $sum: 1 } } }, { $project: { _id: 0, count: 1 } } ] } } ]; const [result] = await collection.aggregate(pipeline).toArray(); client.close(); return { results: result.data, total: total, pageCount: Math.ceil(result.metadata[0].count / PAGE_SIZE) }; } async function test() { const page = 1; const searchTerm = 'example'; const { results, total, pageCount } = await getPaginatedResults(page, searchTerm); console.log(`Showing page ${page} of ${pageCount} for search term "${searchTerm}":`); console.log(results); console.log(`Total documents: ${total}`); } test();

In this example, we add a searchTerm parameter to the getPaginatedResults function.

The $project stage includes the title and description fields and excludes the _id field using the 1 and 0 values, respectively.

We then add a $match stage to the pipeline that filters the documents based on the $text operator and the $search operator, using the searchTerm variable.

We then execute the rest of the pipeline as before and return the results as an object.

When we call the getPaginatedResults function, we pass in a search term as the second parameter. We then display the search term in the console along with the page number and the results.