Database Indexing
A database index is a data structure (typically a B-tree or hash table) that provides fast lookup of rows by specific column values, trading storage space and write overhead for dramatically faster read queries.
Without an index, a database must scan every row in a table to find matching records (full table scan) — O(n). An index creates a sorted data structure that allows O(log n) lookups, similar to a book's index. The most common index type is a B-tree (balanced tree), which supports equality lookups, range queries, and sorting. Indexes dramatically speed up reads but slow down writes (every INSERT/UPDATE must also update the index) and consume additional storage. Choosing the right indexes is one of the most impactful performance optimizations in any application.
Tradeoffs
Strengths
- Dramatic read performance: Turns O(n) full table scans into O(log n) lookups — often 1000–100,000x faster.
- Enables sorting: Indexes pre-sort data, eliminating expensive in-memory sorts.
- Constraint enforcement: Unique indexes enforce data integrity at the database level.
- Covering indexes: Can answer queries entirely from the index, avoiding heap access entirely.
- Partial indexes: Index only relevant subsets of data, saving space and improving performance.
Weaknesses
- Write overhead: Every INSERT, UPDATE, and DELETE must update all relevant indexes. Tables with many indexes have significantly slower writes.
- Storage cost: Indexes consume additional disk space — sometimes 50–100% of the base table size.
- Maintenance: Indexes can become bloated and fragmented, requiring periodic maintenance (VACUUM, REINDEX).
- Complexity: Choosing optimal indexes requires understanding query patterns, selectivity, and the query planner.
- Over-indexing risk: Too many indexes waste storage, slow writes, and increase backup size.
- Update-heavy tables: Frequently updated columns in an index cause high index churn.
Likely Follow-Up Questions
- How do you decide which columns to index?
- What is the difference between a clustered and a non-clustered index?
- How do composite indexes work with the leftmost prefix rule?
- What is index bloat and how do you address it?
- When would you use a GIN index vs. a B-tree index?
- How do you use EXPLAIN ANALYZE to optimize queries?
Related Concepts
Source: editorial — Synthesized from PostgreSQL and MySQL documentation, Use The Index Luke (by Markus Winand), and database performance engineering best practices.