Write-Ahead Logging


Write-Ahead Logging, also known as WAL, is a technique to ensure data integrity after system failures and improve performance on storage systems. It is used by most of the database storage systems.

Imagine a naive implementation of a storage system where every single write operation goes directly to disk. On a large scale, performance will be awful. Instead, in-memory data structures (memtables) can be used to speed up read/write operations. When a certain memory size is reached, they are serialized to disk. But, what if system goes down? In-memory data that has not yet been flushed to disk is lost immediately. That's the reason WAL does exist.

WAL is an append-only log that registers all the commands executed in a storage system. A command is any operation that mutates the state of the system, i.e. a write operation.

When using WAL technique, every single command is appended to the log before memtables are updated. This way, when the system is restarted after a failure, all the commands are read and current system's state is re-created from them and loaded to memtables, similar to event-sourcing pattern.

As you might have noticed, in both cases (always-to-disk way and memtables + WAL) disk write is needed in every command. So, how WAL improves performance? Answer is that WAL write operations can be made asynchronously. However, this could be dangerous. You need to ensure that the log entry has been appended to the log. Otherwise, data will be inconsistent. Also, dulpicated entries must be managed: by avoiding them when WAL is read or making commands logic idempotent. That's an implementation decision.