Post

Index Trees - One Pager

Index Trees - One Pager

Overview

Index structures are fundamental to database performance, determining how efficiently systems can handle reads, writes, and range queries. The choice of index tree depends on your workload characteristics, concurrency requirements, and whether data lives primarily in memory or on disk.

For in-memory workloads, structures like skiplists and AVL/RB trees provide fast lookups with simple implementations. Skiplists are particularly attractive for concurrent scenarios due to their lock-free friendliness, making them ideal for Redis’s sorted sets and LSM-tree memtables.

For disk-based OLTP systems, the B+ tree remains dominant (MySQL, PostgreSQL) thanks to its shallow structure that minimizes disk I/O and excellent cache locality. However, it struggles with write-heavy workloads due to random I/O patterns and the need for latches under concurrency.

For high-concurrency systems, the Bw-tree offers lock-free operation through delta records and mapping tables, making it ideal for modern in-memory databases like Azure Cosmos DB and SQL Server’s Hekaton engine, especially on SSDs.

For write-intensive workloads, LSM-trees excel by converting random writes into sequential I/O through an append-only design with background compaction. This makes them the go-to choice for distributed databases (Cassandra, DynamoDB) and storage engines (RocksDB), though they trade off read performance and latency predictability.

For multidimensional queries, BKD-trees (used in Lucene/Elasticsearch) efficiently handle range and geospatial searches through space partitioning, though they’re optimized for bulk indexing rather than transactional updates.

The key insight: there is no universal best index structure. B+ trees optimize for read-heavy OLTP, LSM-trees for write-heavy distributed systems, Bw-trees for lock-free concurrency, and BKD-trees for spatial queries. Understanding these trade-offs is essential for system design interviews and production architecture decisions.


Comparison Table

StructureBest ForUsed InStrengthsWeaknesses 
SkiplistIn-memory sorted sets, range scansRedis, LevelDB/RocksDB (memtable)Simple, lock-free friendly, fast range scansNo strict worst-case bounds, more memory overhead 
AVL / RB TreeIn-memory ordered mapsC++ std::map, Java TreeMap, LinuxDeterministic O(log n) ops, fast lookups (AVL), fewer rotations (RB)Poor cache locality, slower under concurrency, outperformed by B+ in scale 
B+ TreeDisk-based point & range queriesMySQL, PostgreSQL, Oracle, NTFSVery fast range scans, shallow tree (few I/Os), cache-efficientRandom writes costly, fragmentation, needs latching 
Bw-TreeHigh-concurrency, in-memory + disk indexingAzure Cosmos DB, SQL Server (Hekaton)Lock-free, SSD-friendly, fast reads/writes with delta recordsComplex to implement, delta chain overhead, needs GC/consolidation 
LSM-TreeWrite-heavy storage enginesCassandra, RocksDB, HBase, ScyllaDB, DynamoDBHigh write throughput, sequential I/O, good compression, Bloom filtersRead/compaction amplification, less predictable latency 
BKD-TreeMultidimensional numeric & geo queriesLucene, Elasticsearch, CrateDBExcellent for range & geo queries, compact, SIMD-friendly leaf blocksNot for OLTP, bulk-write oriented, less dynamic 
This post is licensed under CC BY 4.0 by the author.