SDI.
All Databases
DocumentOps: medium

MongoDB

MongoDB is a distributed document database that stores data as flexible BSON (Binary JSON) documents, allowing nested structures and dynamic schemas without predefined table definitions. Its native sharding, replica sets, and aggregation pipeline make it a popular choice for applications with evolving data models and high availability requirements. MongoDB Atlas provides a fully managed service with global clusters, serverless instances, and integrated search via Atlas Search (Lucene-based).

Strengths

Flexible schema allows rapid iteration without costly ALTER TABLE migrationsBuilt-in horizontal sharding with automatic chunk balancing across shard nodesRich aggregation pipeline supports multi-stage data transformations, lookups, and faceted searchMulti-document ACID transactions (since 4.0) across shards with snapshot isolationChange streams provide real-time event-driven architectures without pollingAtlas Search integrates Lucene-based full-text search directly into the aggregation pipeline

Weaknesses

Document-level locking on writes to the same document can bottleneck hot-key workloadsJoins ($lookup) are expensive and lack the optimizer sophistication of relational join planningWiredTiger storage engine cache sizing requires careful tuning to avoid page eviction stormsShard key selection is critical and difficult to change post-deployment; poor choices cause hot spotsStorage overhead from BSON encoding and flexible schemas can exceed relational equivalents by 2-3x

Ideal Workloads

  • -Content management systems with heterogeneous document structures
  • -IoT platforms ingesting semi-structured sensor data with varying schemas per device type
  • -E-commerce product catalogs where each product category has different attributes
  • -Real-time analytics dashboards using the aggregation pipeline
  • -Mobile and serverless backends leveraging MongoDB Realm/Atlas Device Sync

Scaling Model

Horizontally scalable via automatic sharding. Data is distributed across shards using a shard key (range-based or hashed). Mongos query routers direct queries to the correct shard(s). Each shard is a replica set for HA. Adding shards increases write and storage capacity. Config servers store cluster metadata and manage chunk migrations.

Consistency Model

Configurable per-operation via read/write concerns. The default write concern (w:1) acknowledges writes on the primary only. w:majority ensures durability across a majority of replica set members. Read concern 'majority' returns data acknowledged by a majority, while 'linearizable' provides the strongest guarantee. Causal consistency sessions ensure read-your-own-writes semantics.

When to Use

  • Your data model is hierarchical or varies significantly between records (polymorphic documents)
  • You need horizontal scaling with built-in sharding and don't want to manage it yourself
  • Your application benefits from schema flexibility and rapid prototyping
  • You want change streams for event-driven architectures or CDC (Change Data Capture)
  • You need a single system that handles documents, search, and time-series data

When Not to Use

  • Your data is highly relational with many cross-collection references requiring joins
  • You need complex multi-table transactions with sophisticated constraint enforcement
  • Your workload is analytical with heavy aggregations over terabytes of data (consider a columnar store)
  • You need strong consistency guarantees without per-operation configuration overhead
  • Storage cost is a primary concern and your data has a predictable, flat schema

Source: editorial — Based on MongoDB 7.x documentation and distributed systems design patterns

Command Palette

Search for a command to run...