Elasticsearch
Elasticsearch is a distributed search and analytics engine built on Apache Lucene, designed for full-text search, log analytics, and real-time data exploration. It stores data as JSON documents across shards distributed over a cluster, with each shard being a self-contained Lucene index that supports inverted indexes, BKD trees for numerics, and doc values for aggregations. Elasticsearch powers search at companies like Wikipedia, GitHub, and Netflix, and serves as the 'E' in the ELK/Elastic Stack for observability.
Strengths
Weaknesses
Ideal Workloads
- -Full-text search for e-commerce product catalogs, content platforms, and knowledge bases
- -Log aggregation and observability with the ELK Stack (Elasticsearch, Logstash, Kibana)
- -Security analytics and SIEM platforms processing millions of events per second
- -Autocomplete and typeahead suggestions using edge n-gram tokenizers and completion suggesters
- -Geospatial search for location-based services with geo_point and geo_shape queries
Scaling Model
Horizontally scalable by adding data nodes to the cluster. Each index is divided into primary shards (set at creation) with configurable replicas. Shards are distributed across nodes by the master node's allocation algorithm. Index Lifecycle Management (ILM) automates rollover, shrink, and delete operations for time-series data. Cross-cluster search enables querying across multiple clusters without data movement.
Consistency Model
Near-real-time consistency with a configurable refresh interval (default 1 second) between indexing and searchability. Writes are durable once the translog is fsynced. Primary-replica synchronization is synchronous for write operations (in-sync replicas must acknowledge). However, search results may not reflect the most recent writes until the next refresh. There are no multi-document transactions; each document indexing operation is atomic individually.
When to Use
- You need full-text search with relevance scoring, faceting, and highlighting
- You are building a log analytics or observability platform processing high-volume event data
- You need real-time aggregations and visualizations over semi-structured data
- Your application requires autocomplete, fuzzy matching, or synonym-aware search
- You need to search across multiple data types (text, numeric, geo, nested) in a single query
When Not to Use
- You need a primary data store with ACID transactions and strong consistency
- Your workload is primarily OLTP with frequent updates to existing records
- You need relational joins across entities (consider a relational database with full-text search)
- Your data volume is small enough that a simpler solution (PostgreSQL full-text search) would suffice
- You cannot afford the operational overhead of managing JVM tuning, shard strategies, and cluster health
Source: editorial — Based on Elasticsearch 8.x documentation and Elastic Stack production deployment patterns