Fair and Scalable Algorithms on Massive Graphs

Thumbnail Image


Publication or External Link





The impact of massive data processing on our daily lives is becoming increasingly more clear. With this, we must guarantee that vital, data-driven decision-making systems mitigate the potential negative impacts of the technologies themselves and scale to the massive datasets we have access to today. We explore both of these facets by studying fairness and scalability for algorithms on large graphs.

In Part I, we focus on on fair hierarchical clustering. Our first work on this topic [Ahmadian et al., 2020b] initiates the study of fair hierarchical clustering by extending Chierichetti et al.’s [Chierichetti et al., 2017] notion of representationally proportional flat clustering to the hierarchical setting. From there, we introduce the first approximation algorithms for three well studied hierarchical clustering optimization problems in the fair context: cost [Dasgupta, 2016], revenue [Moseley and Wang, 2017], and value [Cohen-Addad et al., 2018]. Our initial work studies all three fair optimization problems, and our follow-up works [Knittel et al., 2023a, Knittel et al., 2023b] dive deeper into the notoriously difficult cost optimization problem.

Regarding scalability, we leverage the Massively Parallel Computation (MPC) model, as well as its recent successor Adaptive Massively Parallel Computation (AMPC), to develop efficient graph algorithms for big data. MPC, discussed in Part II, has been one of the most practically useful models for massively parallel algorithms in the past decade, influencing a number of major frameworks including MapReduce, Hadoop, Spark, and Flume. In this model, we present our work on edge coloring [Behnezhad et al., 2019b], hierarchical clustering [Hajiaghayi and Knittel, 2020], and tree embeddings [Ahanchi et al., 2023].

AMPC improves upon the MPC model by adding access to a distributed hash table while still remaining highly implementable in practice. This allows it to overcome some shortcomings proposed in MPC literature, notably, the 1vs2Cycle Conjecture (i.e., that differentiating between a single cycle and two cycles is difficult in MPC). In Part III, we introduce a highly efficient and general tool for executing tree contractions in AMPC [Hajiaghayi et al., 2022b] and additionally exhibit the power of AMPC on minimum cut problems [Hajiaghayi et al., 2022a].