Combinatorial Optimization for Barter Exchange and Tumor Phylogeny Inference
Files
(RESTRICTED ACCESS)
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
Combinatorial Optimization is a field with rich history encompassing problems that are fundamental to Computer Science with important applications. Despite the many decades-long maturity of Combinatorial Optimization, new important problems continue to arise in response to emerging technologies and datasets. Additionally, even the slightest variations in problem statements can render previous approaches unworkable. In this doctoral thesis we will focus on two such problems. First, we explore a centralized barter exchange problem, motivated by the rise of online marketplaces and social media. Second, we explore the problem of tumor phylogeny inference, motivated by recent advances in single-cell sequencing technologies and the continued demands of ever-growing datasets.
First, we study a centralized barter exchange model where agents enter with a set of items they have available to trade, and a set of items they want. Additionally, every item has some fixed positive value. For example, in an exchange between trading card collectors, agents looking to broaden their collections can enter aiming to exchange cards they have repeats of for cards that they do not yet have. The clearinghouse, i.e., the party facilitating the centralized barter exchange, is tasked with the goal of outputting a reallocation that maximizes the welfare of agents subject to no agent being unhappy (an unhappy agent gives away more value than they receive). We study two subproblems: Barter Exchange with Symmetric Valuations (BSV) and Barter Exchange with Asymmetric Valuations (BAV). BSV and BAV are, respectively, the settings when agents agree on item values and when agents may have their own individual item valuations. Both BSV and BAV are NP-hard. On the flip side, we design randomized algorithms for both of these problems that guarantee, in expectation, maximum welfare and that every agent receives at least as much value as they give away. Additionally, for BAV, with probability 1, every agent receives at least as much value as they give away, minus the value of two items. In the case of BSV, the guarantee is strengthened to, with probability 1, each agent receiving at least as much value as they give away minus the value of a single item. Finally, in BAV, agents can improve their utility by misreporting their valuations. We design a mechanism that is incentive compatible (i.e., an agent maximizes their utility by reporting their true valuation).
Second, we study tumor phylogeny reconstruction from single-cell sequencing (SCS) data. Recent advances in SCS allow for more a fine-grained look into tumor evolutionary history. Accurate understanding of such tumor phylogenies (evolutionary trees) can inform effective cancer treatments. However, SCS data is invariably noisy and this noise must be dealt with to produce a \textit{perfect phylogeny} (PP), which can then easily be mapped to an evolutionary tree using the well-known result of Gusfield, 1991. In particular, we want to find the most likely PP (i.e., the PP requiring the least number of corrections in the data). This is the Most Likely PP Reconstruction problem and it will be our focus. We draw connections between the Most Likely PP Reconstruction problem and the fundamental combinatorial optimization problem of Vertex Cover to build a branch-and-bound algorithm that provably returns the most likely PP. We benchmark our branch-and-bound algorithm against previous approaches on synthetically generated SCS data. In these experiments, our algorithm finds the most likely PP up to 300 times faster.
A common thread in our studied problems of Barter Exchange and Most Likely PP Reconstruction, is the Combinatorial Optimization paradigm of formulating problems as Linear Programs (LPs) to obtain a fractional solution in polynomial time.Then this fractional solution is then carefully rounded to an integral solution, in polynomial-time, while providing theoretical guarantees over the rounded solution. Among these results, we introduce a novel refinement of the randomized dependent rounding algorithm of Gandhi et al., 2006.