This event has been cancelled

Joana Trindade: Systems and Techniques for Efficient Real-World Graph Analytics

Speaker

Joana M. F. da Trindade
MIT CSAIL

Host

Sam Madden
Abstract: Graphs are a natural way to model real-world entities and relationships between them, ranging from social networks and biological datasets to cloud computing infrastructure data lineage graphs. Queries over these large graphs often involve expensive sub-graph traversals and complex analytical computations. Furthermore, real-world graphs often encode relationships that evolve through time. These relationships are commonly modeled as a temporal graph, i.e., one in which edges are associated to time attributes (such as start time and duration), denoting their validity during a certain interval in time. Because of their ability to capture not only relationships, but also their evolution over time, temporal graphs are becoming increasingly relevant to a host of societal problems.

These real-world graphs are often substantially more structured than a generic vertex-and-edge model would suggest, but this insight has remained mostly unexplored by existing graph engines for graph query optimization purposes. In addition, most temporal graph processing systems remain inefficient as they rely on distributed processing even for graphs that fit well within a commodity server's available storage. To this end, we present two distinct systems tailored for optimizing large-scale real-world graph processing. The first one, Kaskade, targets the challenges of efficient query evaluation by leveraging structural properties of graphs and queries to infer materialized graph views. Materialized view inference is especially useful in this setting as it significantly speeds up query evaluation by rewriting input queries in terms of these views, guided by a cost model that estimates the benefit of such views based on input graph and query characteristics. The second system, Kairos, introduces selective indexing, a technique that chooses a subset of target vertices to index based on characteristics of the underlying temporal graphs and input queries. This system further employs a highly-specialized parallel data structure aimed at in-memory storage and fast retrieval of temporal edges. Finally, Kairos is built upon Ligra, the de facto benchmark system in shared-memory parallel graph processing, offering similar advantages and a familiar API to application programmers. Both systems offer speedups of up to 50-60x when compared with alternative baselines, and introduce novel classes of query optimization techniques aimed at efficient real-world graph analytics.

Zoom link: https://mit.zoom.us/j/99926376681