The T-1000, a prototype system of a thousand realistic processors embedded throughout an ensemble of interconnected FPGAs, seeks to demonstrate the scalability of timestamp-based cache coherence protocols on distributed shared memory systems.

The traditional method of keeping caches coherent on a shared memory system, a directory of all sharers, incurs a space cost linear in the number of sharers. The Tardis cache coherence protocol shrinks this cost to logarithmic in the number of sharers, paving the way for thousand-core cache coherent systems. To demonstrate the versatility of this protocol, we seek to build T-1000, an ensemble of interconnected field-programmable gate arrays (FPGAs) configured to contain, in the aggregate, over a thousand RISC-V processors. To maximize bandwidth, we will connect these FPGAs in a three-dimensional coherent mesh. With T-1000, we can explore new parallel processing algorithms, operating systems with thousands of active processes, and datacenter-scale computing within a single address space.