Computer Science Theses and Dissertations

Permanent URI for this collectionhttp://hdl.handle.net/1903/2756

Browse

Search Results

Now showing 1 - 7 of 7
  • Thumbnail Image
    Item
    Fair and Scalable Algorithms on Massive Graphs
    (2023) Knittel, Marina; Hajiaghayi, MohammadTaghi; Dickerson, John; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    The impact of massive data processing on our daily lives is becoming increasingly more clear. With this, we must guarantee that vital, data-driven decision-making systems mitigate the potential negative impacts of the technologies themselves and scale to the massive datasets we have access to today. We explore both of these facets by studying fairness and scalability for algorithms on large graphs. In Part I, we focus on on fair hierarchical clustering. Our first work on this topic [Ahmadian et al., 2020b] initiates the study of fair hierarchical clustering by extending Chierichetti et al.’s [Chierichetti et al., 2017] notion of representationally proportional flat clustering to the hierarchical setting. From there, we introduce the first approximation algorithms for three well studied hierarchical clustering optimization problems in the fair context: cost [Dasgupta, 2016], revenue [Moseley and Wang, 2017], and value [Cohen-Addad et al., 2018]. Our initial work studies all three fair optimization problems, and our follow-up works [Knittel et al., 2023a, Knittel et al., 2023b] dive deeper into the notoriously difficult cost optimization problem. Regarding scalability, we leverage the Massively Parallel Computation (MPC) model, as well as its recent successor Adaptive Massively Parallel Computation (AMPC), to develop efficient graph algorithms for big data. MPC, discussed in Part II, has been one of the most practically useful models for massively parallel algorithms in the past decade, influencing a number of major frameworks including MapReduce, Hadoop, Spark, and Flume. In this model, we present our work on edge coloring [Behnezhad et al., 2019b], hierarchical clustering [Hajiaghayi and Knittel, 2020], and tree embeddings [Ahanchi et al., 2023]. AMPC improves upon the MPC model by adding access to a distributed hash table while still remaining highly implementable in practice. This allows it to overcome some shortcomings proposed in MPC literature, notably, the 1vs2Cycle Conjecture (i.e., that differentiating between a single cycle and two cycles is difficult in MPC). In Part III, we introduce a highly efficient and general tool for executing tree contractions in AMPC [Hajiaghayi et al., 2022b] and additionally exhibit the power of AMPC on minimum cut problems [Hajiaghayi et al., 2022a].
  • Thumbnail Image
    Item
    Social Aspects of Algorithms: Fairness, Diversity, and Resilience to Strategic Behavior
    (2021) Ahmadi, Saba; Khuller, Samir; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    With algorithms becoming ubiquitous in our society, it is important to ensure that they are compatible with our social values. In this thesis, we study some of the social aspects of algorithms including fairness, diversity, and resilience to strategic behavior of individuals. Lack of diversity has a potential impact on discrimination against marginalized groups. Inspired by this issue, in the first part of this thesis, we study a notion of diversity in bipartite matching problems. Bipartite matching where agents on one side of a market are matched to one or more agents or items on the other side, is a classical model that is used in myriad application areas such as healthcare, advertising, education, and general resource allocation. In particular, we consider an application of matchings where a firm wants to hire, i.e. match, some workers for a number of teams. Each team has a demand that needs to be satisfied, and each worker has multiple features (e.g., country of origin, gender). We ask the question of how to assign workers to the teams in an efficient way, i.e. low-cost matching, while forming diverse teams with respect to all the features. Inspired by previous work, we balance whole-match diversity and economic efficiency by optimizing a supermodular function over the matching. Particularly, we show when the number of features is given as part of the input, this problem is NP-hard, and design a pseudo-polynomial time algorithm to solve this problem. Next, we focus on studying fairness in optimization problems. Particularly, in this thesis, we study two notions of fairness in an optimization problem called correlation clustering. In correlation clustering, given an edge-weighted graph, each edge in addition to a weight has a positive or negative label. The goal is to obtain a clustering of the vertices into an arbitrary number of clusters that minimizes disagreements which is defined as the total weight of negative edges trapped inside a cluster plus the sum of weights of positive edges between different clusters. In the first fairness notion, assuming each node has a color, i.e. feature, our aim is to generate clusters with minimum disagreements, where the distribution of colors in each cluster is the same as the global distribution. Next, we switch our attention to a min-max notion of fairness in correlation clustering. In this notion of fairness, we consider a cluster-wise objective function that asks to minimize the maximum number of disagreements of each cluster. In this notion, the goal is to respect the quality of each cluster. We focus on designing approximation algorithms for both of these notions. In the last part of this thesis, we take into consideration, the vulnerability of algorithms to manipulation and gaming. We study the problem of how to learn a linear classifier in presence of strategic agents that desire to be classified as positive and that are able to modify their position by a limited amount, making the classifier not be able to observe the true position of agents but rather a position where the agent pretends to be. We focus on designing algorithms with a bounded number of mistakes for a few different variations of this problem.
  • Thumbnail Image
    Item
    Online Decision Making via Prophet Setting
    (2017) Ehsani Banafati, Soheil; Hajiaghayi, MohammadTaghi; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In the study of online problems, it is often assumed that there exists an adversary who acts against the algorithm and generates the most challenging input for it. This worst-case assumption in addition to the complete uncertainty about future events in the traditional online setting sometimes leads to worst-case scenarios with super-constant approximation impossibilities. In this dissertation, we go beyond this worst-case analysis of problems by taking advantage of stochastic modeling. Inspired by the prophet inequality problem, we introduce the prophet setting for online problems in which the probability distributions of the future inputs are available. This modeling not only considers the availability of statistical data in the design of mechanisms but also results in significantly more efficient algorithms. To illustrate the improvements achieved by this setting, we study online problems within the contexts of auctions and networks. We begin our study with analyzing a fundamental online problem in optimal stopping theory, namely prophet inequality, in the special cases of iid and large markets, and general cases of matroids and combinatorial auctions and discuss its applications in mechanism design. The stochastic model introduced by this problem has received a lot of attention recently in modeling other real-life scenarios, such as online advertisement, because of the growing ability to fit distributions for user demands. We apply this model to network design problems with a wide range of applications from social networks to power grids and communication networks. In this dissertation, we give efficient algorithms for fundamental network design problems in the prophet setting and present a general framework that demonstrates how to develop algorithms for other problems in this setting.
  • Thumbnail Image
    Item
    Combinatorial Algorithms for the Active Time and Busy Time Problems
    (2016) Kumar, Saurabh; Khuller, Samir; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    In this thesis, we consider the problem of scheduling jobs in such a way that we minimize the energy consumption of the machines they are scheduled on. Job scheduling itself has a long and rich history in computer science both from theoretical and applied perspectives. A multitude of different objectives to optimize have been considered such as weighted completion time, penalties for missed deadlines, etc. However, traditional objective functions such as these do not capture or model the energy consumption of the machines these jobs run on. Energy consumption is an important facet of job scheduling to consider not only because of its relationship with the financial costs of scheduling (such as those related to cooling and the cost of powering the machines) but also due to its impact on the environment. This is especially true in the context of data centers as more and more processing is pushed to the cloud. We study two problems related to these issues - the active time problem and the busy time problem. First, we give a purely combinatorial algorithm for the active time problem which matches its best known approximation ratio (the existing algorithm is based on a rather involved LP rounding scheme). Second, we describe a local search based heuristic for the problem and also consider an experimental evaluation of these algorithms on artificially generated data. Finally, we describe two very simple algorithms which match the current best upper bounds for the busy time problem when all job lengths are equal.
  • Thumbnail Image
    Item
    Allocation Algorithms for Networks with Scarce Resources
    (2015) Sarpatwar, Kanthi Kiran; Khuller, Samir; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Many fundamental algorithmic techniques have roots in applications to computer networks. We consider several problems that crop up in wireless ad hoc networks, sensor networks, P2P networks, and cluster networks. The common challenge here is to deal with certain bottleneck resources that are crucial for performance of the underlying system. Broadly, we deal with the following issues. Data: The primary goal in resource replication problems is to replicate data objects on server nodes with limited storage capacities, so that the latency of client nodes needing these objects is minimized. Previous work in this area is heuristic and without guarantees. We develop tight (or nearly) approximation algorithms for several problems including basic resource replication - where clients need all objects and server can store at most one object, subset resource replication - where clients require different subsets of objects and servers have limited non-uniform capacity, and related variants. Computational resources: To facilitate packing of jobs needing disparate amounts of computational resources in cluster networks, an important auxiliary problem to solve is that of container selection. The idea is to select a limited number of ``containers'' that represent a given pool of jobs while minimizing ``wastage'' of resources. Subsequently, containers representing jobs can be packed instead of jobs themselves. We study this problem in two settings: continuous - where there are no additional restrictions on chosen containers, and discrete - where we must choose containers from a given set. We show that the continuous variant is NP-hard and admits a polynomial time approximation scheme. Contrastingly, the discrete variant is shown to be NP-hard to approximate. Therefore, we seek bi-approximation algorithms for this case. Energy resources: Wireless ad hoc networks contain nodes with limited battery life and it is crucial to design energy efficient algorithms. We obtain tight approximation (up to constant factors) guarantees for partial and budgeted versions of the connected dominating set problem, which is regarded as a good model for a virtual backbone of a wireless ad hoc network. Further, we will discuss approximation algorithms for some problems involving target monitoring in sensor networks and message propagation in radio networks. We will end with a discussion on future work.
  • Thumbnail Image
    Item
    Dynamic Data Structures for Geometric Search and Retrieval
    (2013) Park, Eunhui; Mount, David M.; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Data structures for geometric search and retrieval are of significant interest in areas such as computational geometry, computer graphics, and computer vision. The focus of this dissertation is on the design of efficient dynamic data structures for use in approximate retrieval problems in multidimensional Euclidean space. A number of data structures have been proposed for storing multidimensional point sets. We will focus on quadtree-based structures. Quadtree-based structures possess a number of desirable properties, and they have been shown to be useful in solving a wide variety of query problems, especially when approximation is involved. First, we introduce two dynamic quadtree-based data structures for storing a set of points in space, called the quadtreap and the splay quadtree. The quadtreap is a randomized variant of a quadtree that supports insertion and deletion and has logarithmic height with high probability. The splay quadtree is also a quadtree variant, but this data structure is self-adjusting, that is, it rebalances itself depending on the access pattern. It supports efficient insertion and deletion in the amortized sense. We also study how to dynamically maintain an important geometric structure, called the well-separated pair decomposition (WSPD). A set of n points defines O(n^2) pairs. Callahan and Kosaraju introduced the WSPD as a concise method for approximating the set of all pairs by using only O(n) subsets of pairs that are spatially well-separated from each other. We present the first output sensitive algorithm for maintaining a WSPD for a dynamic point set, that is, one in which the running time depends on the actual number of newly created (or newly destroyed) pairs. Finally, we consider maintaining a data structure for points in motion. Motion is often presented incrementally in discrete time steps, and future motion may not be predictable from the past. This is called the black-box model. We present efficient online algorithms for maintaining two important geometric data structures for moving points in the black-box model, namely nets and net trees. We establish the efficiency of these online algorithms through a competitive analysis.
  • Thumbnail Image
    Item
    Primal-Dual Algorithms for Combinatorial Optimization Problems
    (2007-08-10) Mestre, Julian Diego; Khuller, Samir; Computer Science; Digital Repository at the University of Maryland; University of Maryland (College Park, Md.)
    Combinatorial optimization problems such as routing, scheduling, covering and packing problems abound in everyday life. At a very high level, a combinatorial optimization problem amounts to finding a solution with minimum or maximum cost among a large number of feasible solutions. An algorithm for a given optimization problem is said to be exact if it always returns an optimal solution and is said to be efficient if it runs in time polynomial on the size of its input. The theory of NP-completeness suggests that exact and efficient algorithms are unlikely to exist for the class of NP-hard problems. Unfortunately, a large number of natural and interesting combinatorial optimization problems are NP-hard. One way to cope with NP-hardness is to relax the optimality requirement and instead look for solutions that are provably close to the optimum. This is the main idea behind approximation algorithms. An algorithm is said to be an r-approximation if it always returns a solution whose cost is at most an r factor away from the optimal cost. Arguably, one of the most important techniques in the design of combinatorial algorithms is the primal-dual schema in which the cost of the primal solution is compared to the cost of a dual solution. In this dissertation we study the primal-dual schema in the design of approximation algorithms for a number of covering and scheduling problems.