On Algorithms, Fairness, and Incentives

Loading...
Thumbnail Image

Publication or External Link

Date

2023

Citation

Abstract

Much of decision making is now rendered at least partly through algorithms which were originally designed to optimize an objective such as accuracy or revenue while mostly ignoring the possible unfairness or harm that could be caused. As a result several case studies have demonstrated empirically that deployed algorithmic decision making systems do in fact violate standard notions of fairness. This has made the issue of fairness an important consideration in algorithm design. In this thesis we will consider fair variants of fundamental and important problems in machine learning and operations research.

We start by considering clustering where we focus on a common group (demographic) fairness notion and address important variants of it: (I) We start with the frequent case where group memberships are imperfectly known. Based on stochastic optimization we propose probabilistic fair clustering which is a generalization of fair clustering that handles the case of unknown group memberships. We derive approximation algorithms for this setting and empirically test their performance. (II) As largely known in fair algorithms achieving fairness mostly comes at the expense of degrading the value of the optimization objective. In fact, in the case of fair clustering the degradation (price of fairness) can be unbounded. To handle this, we propose fair clustering under a bounded cost where we define a measure of unfairness and minimize this measure subject to a pre-set upper bound on the clustering objective. We derive lower bounds on the approximation ratio and give approximation algorithms as well. (III) We consider the downstream effects of clustering where the center (prototype) of each cluster is examined and each cluster is assigned a label of a specific quality based on its prototype. These labels are shared over a collection of clusters and as a result traditional fair clustering is too restrictive. We therefore propose fair labeled clustering where proportional demographic representation is to be preserved over the labels instead of the clusters and derive algorithms for it. (IV) Motivated by the fact that at least seven different fairness notions have been introduced so far in fair clustering, we take a step towards understanding how these notions relate to one another. Specifically, we consider two group representation-based fairness notions and show that an approximation algorithm for one can be used to satisfy both notions simultaneously at a bounded degradation to the approximation ratio. We further show how these two notions are incompatible with a collection of distance-based fairness notions in clustering.

We then move to another problem, namely online bipartite matching which in its most common form involves three interacting entities: two sides (buyers and sellers) to be matched and the platform operator. Unlike the existing literature we derive online algorithms with competitive ratio guarantees for the operator’s revenue as well as fairness guarantees for the two sides to be matched, thus providing utility guarantees for all sides of the market.

Finally, we consider a problem where the incentives of the individuals and organizations involved is a major consideration. Specifically, we consider the problem of redistricting and gerrymandering. Inspired by the Kemeny rule for rank aggregation, we introduce a simple and interpretable family of distances over redistricting maps and define the medoid map which mirrors the Kemeny ranking. Interestingly, we show that a by-product of our framework is that it can detect some gerrymandered instances. Specifically, the 2011 and 2016 enacted maps of North Carolina and the 2011 enacted map of Pennsylvania (all considered to be gerrymandered) are all shown to be at least in the 99th distance percentile in comparison to an ensemble of redistricting maps. This gives a significant advantage in gerrymandering detection since the previous methods relied on election outcomes whereas our method is purely distance-based.

Notes

Rights