- Stoarge optimized for transactional workloads vs optimized for analytics
- Log: an append-only sequence of records
Index
An index is an additional structure that is derived from the primary data.
- Speed up the reads
- Slows down the writes, index needs to be updated when data is written
Hash indexes
Keep an in-memory hash map where every key is mapped to a byte offset (location) in the data file - the location at which the value can be found
- Example: Bitcask (the default storage enginein Riak) requires all keys fit in the available RAM
- Good for: situations where the value for each key is updated frequently
- i.e. key: URL of a cat video, value: number of times it has been played
Compaction: throwing away duplicate keys in the log, keeping only the most recent update for each key
Why this design
- Appending and segment mergeing are sequential write operations, which are generally much faster than random writes
- Concurrency and crash recovery are much simpler if segment files are appendonly or immutable
- Merging avoids data files getting fragmented over time
Limitations
- The hash table must fit in memory, couldn’t handle large amount of keys
- Range queries are not efficient