Organizations that run their data services through cloud providers must typically provision a cluster of computer nodes of a particular size to run queries, but because many analytical workloads are unpredictable and ad-hoc, provisioning can prove difficult. Recently, though, cloud functions have been introduced to address this provisioning issue and move complex workloads into the cloud.
Research has been done for some sets of workloads such as video encoding on cloud function platforms, but this research still indicated some limitations of this type of pattern for in-depth communication and more complex workloads. We have built a system called Starling that addresses a number of these problems in a cost-effective and performance-efficient way.
Starling manages resources by mapping tasks to function invocations that grow and shrink as needed, allowing users to pay for only what they need for their query. It then provides results in a format meant to lower cost in a pay-by-request model. In addition, the system mitigates the unpredictability of individual workers that leads to stragglers taking longer to run than other workers, and optimizes queries for latency by balancing the number of invocations at each stage. Using Amazon AWS, we have achieved efficient implementation of our Starling query execution engine.
This optimized system opens up new possibilities and challenges due to its rapid scaling and fine granularity, oftentimes involving cost and performance trade-offs when it comes to resources. At the planning stage of running queries, for example, Starling allows users to try a few separate plans and pick the plan that happens to be working better at the time of query, as well as change strategies at runtime or run multiple strategies in parallel. Users can make cost and performance trade-offs in a way that would be difficult with a fixed set of hardware. Traditional systems allow users to make such decisions at a macro level and provision hardware appropriately, but Starling lets users make decisions at a much finer granularity and even on a second-by-second basis. The system’s decision-making tools enable users to save time and money and gain a lot of performance.
We show through this research that running queries over big analytics workloads does not have to be prohibitive, but instead can be faster and less expensive, with a large number of people running more complex and interesting queries over substantial volumes of data.