SDI.
All Databases
SearchOps: high

Elasticsearch

Elasticsearch is a distributed search and analytics engine built on Apache Lucene, designed for full-text search, log analytics, and real-time data exploration. It stores data as JSON documents across shards distributed over a cluster, with each shard being a self-contained Lucene index that supports inverted indexes, BKD trees for numerics, and doc values for aggregations. Elasticsearch powers search at companies like Wikipedia, GitHub, and Netflix, and serves as the 'E' in the ELK/Elastic Stack for observability.

Strengths

Best-in-class full-text search with BM25 scoring, analyzers, fuzzy matching, and query-time boostingNear-real-time indexing (typically 1-second refresh interval) with segment-based immutable index architecturePowerful aggregation framework for analytics: terms, histograms, percentiles, and nested bucket aggregationsHorizontally scalable with automatic shard allocation, rebalancing, and replica managementRich query DSL supporting bool queries, function scores, nested objects, and geo queriesIntegrated with Kibana for visualization and the Elastic Stack for logs, metrics, and APM

Weaknesses

Not a primary data store; lacks ACID transactions and relies on eventual consistency with a refresh intervalJVM heap management is complex; large heaps cause GC pauses, but small heaps limit fielddata and cachingShard count decisions at index creation are difficult to change; over-sharding causes cluster instabilityWrite amplification from Lucene segment merges increases I/O and can impact search latencyDeep pagination is expensive (default limit 10,000 hits); requires search_after or scroll API for large result setsSchema changes (mapping updates) cannot modify existing field types; requires reindexing

Ideal Workloads

  • -Full-text search for e-commerce product catalogs, content platforms, and knowledge bases
  • -Log aggregation and observability with the ELK Stack (Elasticsearch, Logstash, Kibana)
  • -Security analytics and SIEM platforms processing millions of events per second
  • -Autocomplete and typeahead suggestions using edge n-gram tokenizers and completion suggesters
  • -Geospatial search for location-based services with geo_point and geo_shape queries

Scaling Model

Horizontally scalable by adding data nodes to the cluster. Each index is divided into primary shards (set at creation) with configurable replicas. Shards are distributed across nodes by the master node's allocation algorithm. Index Lifecycle Management (ILM) automates rollover, shrink, and delete operations for time-series data. Cross-cluster search enables querying across multiple clusters without data movement.

Consistency Model

Near-real-time consistency with a configurable refresh interval (default 1 second) between indexing and searchability. Writes are durable once the translog is fsynced. Primary-replica synchronization is synchronous for write operations (in-sync replicas must acknowledge). However, search results may not reflect the most recent writes until the next refresh. There are no multi-document transactions; each document indexing operation is atomic individually.

When to Use

  • You need full-text search with relevance scoring, faceting, and highlighting
  • You are building a log analytics or observability platform processing high-volume event data
  • You need real-time aggregations and visualizations over semi-structured data
  • Your application requires autocomplete, fuzzy matching, or synonym-aware search
  • You need to search across multiple data types (text, numeric, geo, nested) in a single query

When Not to Use

  • You need a primary data store with ACID transactions and strong consistency
  • Your workload is primarily OLTP with frequent updates to existing records
  • You need relational joins across entities (consider a relational database with full-text search)
  • Your data volume is small enough that a simpler solution (PostgreSQL full-text search) would suffice
  • You cannot afford the operational overhead of managing JVM tuning, shard strategies, and cluster health

Source: editorial — Based on Elasticsearch 8.x documentation and Elastic Stack production deployment patterns

Command Palette

Search for a command to run...