What does it mean for a file system to be truly crash-safe? Ideally, it means that users who experience a computer crash while performing an operation would have an un-corrupted file system upon restarting. This is hard to achieve, because a power failure may happen at any time, and furthermore, several different operations are ongoing during the crash.
Even a single-core file system requires multiple verification steps to prove its correctness. To achieve scalability over many cores, a verification system is needed that can verify concurrent systems and address as many aspects of crash safety as the verified single-core systems.
Taking an existing verification system called Iris that already supports concurrency reasoning, we have developed a new framework called Perennial that extends the Iris concurrency framework with three techniques:
- Recovery leases: As a way to logically transfer exclusive access to durable resources to threads during recovery, we split capabilities for durable resources into a “master” copy that persists across crashes, and a temporary “lease” for a specific version number.
- Recovery helping: We introduce a capability representing the ongoing action of the crashed thread, which allows recovery to “help” a crashed thread and complete its operation.
- Versioned memory: We attach an execution version number to in-memory pointers and other capabilities for volatile resources, so that only capabilities for the current version are considered valid.
Furthermore, Perennial provides Goose, a subset of Go, that helps with application development and deployment as well as with reasoning about systems. In addition, we have used Perennial to verify a mail server written in Goose called Mailboat. The Goose approach with Perennial helped to improve the overall performance of Mailboat while staying crash-safe.
Perennial is the first framework that verifies concurrent, crash-safe systems with machine-checked proofs. Using the Perennial toolkit and techniques, we are working to continue to realistically scale the verification of concurrent, multi-core file systems.