0
0
Data Structures Theoryknowledge~10 mins

LSM trees in write-heavy systems in Data Structures Theory - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - LSM trees in write-heavy systems
Write request arrives
Write to in-memory structure (MemTable)
MemTable full?
NoWait for next write
Yes
Flush MemTable to disk as SSTable
Merge/Compaction of SSTables
Optimized disk storage for reads
Next write request
Writes first go to fast memory, then flush to disk when full, followed by merging files to keep reads efficient.
Execution Sample
Data Structures Theory
write(data)
  memtable.add(data)
  if memtable.is_full():
    flush_to_disk(memtable)
    compact_sstables()
This shows how data is written to memory, flushed to disk when full, and then compacted.
Analysis Table
StepActionMemTable StateDisk SSTablesNotes
1Write data1data1emptyMemTable stores data1
2Write data2data1, data2emptyMemTable stores data2
3MemTable full checkdata1, data2emptyMemTable not full yet
4Write data3data1, data2, data3emptyMemTable stores data3
5MemTable full checkdata1, data2, data3emptyMemTable full now
6Flush MemTableemptySSTable1(data1, data2, data3)Data moved to disk
7Write data4data4SSTable1(data1, data2, data3)New MemTable starts
8Write data5data4, data5SSTable1(data1, data2, data3)MemTable stores data5
9MemTable full checkdata4, data5SSTable1(data1, data2, data3)MemTable not full
10Write data6data4, data5, data6SSTable1(data1, data2, data3)MemTable stores data6
11MemTable full checkdata4, data5, data6SSTable1(data1, data2, data3)MemTable full again
12Flush MemTableemptySSTable1(data1, data2, data3), SSTable2(data4, data5, data6)Second flush to disk
13CompactionemptySSTable_merged(data1..data6)Merge SSTables for efficiency
14Wait for next writeemptySSTable_merged(data1..data6)Ready for new writes
💡 Process repeats as new writes come in, keeping memory fast and disk organized.
State Tracker
VariableStartAfter Step 1After Step 5After Step 6After Step 12After Step 13Final
MemTableemptydata1data1, data2, data3emptyemptyemptyempty
Disk SSTablesemptyemptyemptySSTable1(data1, data2, data3)SSTable1(data1, data2, data3), SSTable2(data4, data5, data6)SSTable_merged(data1..data6)SSTable_merged(data1..data6)
Key Insights - 3 Insights
Why do writes go first to memory (MemTable) instead of directly to disk?
Writing to memory is much faster than disk. The execution_table shows steps 1-5 where data accumulates in MemTable before flushing, improving write speed.
What happens when the MemTable becomes full?
As seen in steps 5 and 11, once MemTable is full, it is flushed to disk as an SSTable. This prevents memory overflow and organizes data on disk.
Why is compaction needed after multiple SSTables are created?
Compaction merges multiple SSTables into one (step 13) to reduce disk reads and improve read performance by keeping data organized.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 6. What is the state of the MemTable?
AContains data4
BEmpty
CContains data1, data2, data3
DContains data4, data5, data6
💡 Hint
Check the 'MemTable State' column at step 6 in the execution_table.
At which step does the first flush of MemTable to disk happen?
AStep 6
BStep 5
CStep 12
DStep 13
💡 Hint
Look for the action 'Flush MemTable' in the execution_table.
If the MemTable size limit was larger, how would the execution_table change?
ACompaction would not be needed
BFlush steps would happen earlier with less data
CFlush steps would happen later with more data in MemTable
DMemTable would never flush
💡 Hint
Consider how MemTable fullness triggers flush in steps 5 and 11.
Concept Snapshot
LSM trees handle many writes by first storing data in fast memory (MemTable).
When full, MemTable is flushed to disk as sorted files (SSTables).
Multiple SSTables are merged (compacted) to keep reads efficient.
This design optimizes write speed and manages disk storage well.
Full Transcript
LSM trees are designed for systems with many writes. When data is written, it first goes into an in-memory structure called MemTable. This is fast and efficient. When the MemTable fills up, it is saved to disk as a sorted file called an SSTable. Over time, many SSTables accumulate, so the system merges them in a process called compaction. This keeps the disk organized and makes reading data faster. The process then repeats for new writes, balancing fast memory writes and efficient disk storage.