MongoDB's flexible document model is one of its greatest strengths, but that flexibility can be a double-edged sword. Without careful schema design, you can end up with documents that are too large, queries that are too slow, and a database that doesn't scale. In this guide, I'll share the schema design principles that will help you build efficient, scalable MongoDB applications.
Design for Your Queries First
The most important principle of MongoDB schema design is to design for your queries, not for your data. In a relational database, you normalize your data first and then figure out how to query it. In MongoDB, you start with the queries your application needs and design your schema to support them efficiently.
This means you need to understand your application's access patterns before you design your schema. What queries will be run most often? What data needs to be returned together? What fields will be filtered and sorted on? The answers to these questions should drive your schema design decisions.
A good MongoDB schema often trades normalization for query efficiency. It's okay to duplicate data across documents if it means you can serve a query with a single read instead of multiple reads and joins.
Embed or Reference: Making the Right Choice
The most common schema design decision in MongoDB is whether to embed related data or use references. The right choice depends on how the data is accessed and how it changes.
When to Embed
Embed related data when it is read together with the parent document and does not change independently. For example, a blog post and its comments are a good candidate for embedding if you always display comments with the post and comments are not accessed separately.
// Embedded comments - good for read-together data
{
_id: ObjectId("..."),
title: "My Blog Post",
content: "...",
comments: [
{ author: "Alice", text: "Great post!", createdAt: ISODate("...") },
{ author: "Bob", text: "Thanks for sharing", createdAt: ISODate("...") }
]
}
Embedding is efficient because all the data is in one document. You read it with a single query, and it's fast.
When to Reference
Use references when the related data is large, accessed independently, or updated frequently. For example, a user's order history is better stored as a separate collection because orders are accessed independently and can grow very large.
// References - good for independent data
// users collection
{
_id: ObjectId("..."),
name: "Alice",
email: "alice@example.com"
}
// orders collection
{
_id: ObjectId("..."),
userId: ObjectId("..."), // Reference to user
total: 150.00,
items: [...]
}
Referencing keeps documents small and allows you to access related data independently. The trade-off is that you need multiple queries to get all the data.
Index with Purpose
Indexes are the most powerful tool for improving query performance in MongoDB, but they come with costs. Each index slows down writes and takes up disk space. The key is to create indexes that support your most important queries without over-indexing.
Create Indexes on Frequently Queried Fields
Create indexes on fields used in filters, sorts, and joins. Use compound indexes when queries filter on multiple fields. The order of fields in a compound index matters: put fields that filter for exact matches first, followed by fields used for sorting.
// Create a compound index for queries that filter by status and sort by date
db.orders.createIndex({ status: 1, createdAt: -1 })
This index supports queries that filter by status and sort by createdAt. The database can use the index to find the relevant orders quickly and return them in the correct order without an additional sort operation.
Use Partial Indexes
Use partial indexes for queries that only apply to a subset of data. If you frequently query for active users, create a partial index that only includes active users:
db.users.createIndex(
{ email: 1 },
{ partialFilterExpression: { status: 'active' } }
)
Partial indexes are smaller and faster than full indexes because they only index documents that match the filter.
Verify Index Usage
Use the explain() method to verify that your queries are using indexes. Look for "IXSCAN" in the explain output, which indicates an index scan. "COLLSCAN" means a collection scan, which is slow and should be avoided for frequently run queries.
Keep Documents Manageable
MongoDB documents have a 16MB size limit, but you should aim for much smaller documents in practice. Very large documents hurt performance because they take longer to read from disk, use more memory, and make indexing less effective.
Split Large Documents
Split documents when one record grows too large or when access patterns differ. For example, if a blog post has thousands of comments, store the comments in a separate collection instead of embedding them in the post document.
Use Projections
Use projections to return only the fields you need. This reduces the amount of data transferred from the database and improves query performance. Never use find() without a projection in production code.
// Only return the fields you need
db.users.find(
{ status: 'active' },
{ name: 1, email: 1, createdAt: 1 }
)
Handle Relationships Carefully
MongoDB does not support joins natively, but you can implement relationships using manual references, DBRefs, or the aggregation framework's $lookup stage.
Manual References
For simple relationships, manual references are the most efficient approach. Store the referenced document's _id in the parent document and look it up in your application code when needed.
Aggregation Framework
For more complex relationships, use the aggregation framework. The $lookup stage lets you perform left outer joins between collections, similar to SQL joins. Use it sparingly, as it can be slow on large collections.
Schema Validation
MongoDB's flexible schema is useful during development, but in production, you should validate your documents to prevent data quality issues. Use schema validation to enforce document structure at the database level.
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["email", "name"],
properties: {
email: {
bsonType: "string",
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
},
name: {
bsonType: "string",
minLength: 2,
maxLength: 100
}
}
}
}
})
Schema validation catches data quality issues early and ensures that your application code can rely on the structure of the data it reads from the database.
Frequently Asked Questions
Should I embed or reference?
It depends on your access patterns. Embed when data is read together and doesn't change independently. Reference when data is large, accessed independently, or updated frequently.
How big should my documents be?
Aim for documents under 100KB. While MongoDB supports up to 16MB, smaller documents are faster to read and use less memory.
How many indexes is too many?
There's no magic number, but each index slows down writes. Monitor your query performance and create indexes only for queries that are frequently run and need optimization.
Should I use MongoDB for relational data?
MongoDB can handle relational data, but it's not ideal for complex relationships with many joins. If your data is highly relational, consider using a relational database instead.
How do I handle schema migrations?
MongoDB doesn't have built-in schema migrations like relational databases. Use a migration tool like migrate-mongo or write custom scripts to update your documents when the schema changes.
The Bottom Line
Good MongoDB schema design balances flexibility, performance, and maintainability. Design for your queries first, choose between embedding and referencing based on access patterns, index with purpose, keep documents manageable, and validate your schemas in production. These principles will help you build MongoDB applications that perform well and scale gracefully.
Remember: MongoDB's flexibility is a strength, but with great power comes great responsibility. Design your schemas carefully, and your applications will perform well as they grow.