Automatic Parallelization of Software Network Functions

Speaker

Instituto Superior Técnico (University of Lisbon) / CMU

Host

Baltasar Dinis
MIT CSAIL
Software network functions (NFs) trade-off flexibility and ease of deployment for an increased challenge of performance. The traditional way to increase NF performance is by distributing traffic to multiple CPU cores, but this poses a significant challenge: how to parallelize an NF without breaking its semantics? We propose Maestro, a tool that analyzes a sequential implementation of an NF and automatically generates an enhanced parallel version that carefully configures the NIC's Receive Side Scaling mechanism to distribute traffic across cores, while preserving semantics. When possible, Maestro orchestrates a shared-nothing architecture, with each core operating independently without shared memory coordination, maximizing performance. Otherwise, Maestro choreographs a fine-grained read-write locking mechanism that optimizes operation for typical Internet traffic. We parallelized 8 software NFs and show that they generally scale-up linearly until bottlenecked by PCIe when using small packets or by 100 Gbps line-rate with typical Internet traffic. Maestro further outperforms modern hardware-based transactional memory mechanisms, even for challenging parallel-unfriendly workloads.
This work was supported by the European Union (ACES project, 101093126), INESCID (via UIDB/50021/2020), and the SALAD-Nets CMUPortugal/FCT project (2022.15622.CMU).

==============

Francisco Pereira is a PhD student in Instituto Superior Tecnico, University of Lisbon, advised by Prof. Luis Pedrosa and Prof. Fernando Ramos. He is also affiliated with CMU, and is advised by Prof. Justine Sherry. His interests lie broadly between Systems and Networking, ranging from software network functions to programmable network accelerators such as programmable switches, FPGAs, and SmartNICs. Francisco's work focuses on automating improvements for network functions, from automatic parallelization to program partitioning across hererogeneous devices.