Can Querying for Bias Leak Protected Attributes? Achieving Privacy With Smooth Sensitivity

Existing regulations often prohibit model developers from accessing protected attributes (gender, race, etc.) during training. This leads to scenarios where fairness assessments might need to be done on populations without knowing their memberships in protected groups. In such scenarios, institutions often adopt a separation between the model developers (who train their models with no access to the protected attributes) and a compliance team (who may have access to the entire dataset solely for auditing purposes). However, the model developers might be allowed to test their models for disparity by querying the compliance team for group fairness metrics. In this paper, we first demonstrate that simply querying for fairness metrics, such as, statistical parity and equalized odds can leak the protected attributes of individuals to the model developers. We demonstrate that there always exist strategies by which the model developers can identify the protected attribute of a targeted individual in the test dataset from just a single query. Furthermore, we show that one can reconstruct the protected attributes of all the individuals from queries when Nk ≪ n using techniques from compressed sensing (n is the size of the test dataset and Nk is the size of smallest group therein). Our results pose an interesting debate in algorithmic fairness: Should querying for fairness metrics be viewed as a neutral-valued solution to ensure compliance with regulations? Or, does it constitute a violation of regulations and privacy if the number of queries answered is enough for the model developers to identify the protected attributes of specific individuals? To address this supposed violation of regulations and privacy, we also propose Attribute-Conceal, a novel technique that achieves differential privacy by calibrating noise to the smooth sensitivity of our bias query function, outperforming naive techniques such as the Laplace mechanism. We also include experimental results on the Adult dataset and synthetic dataset (broad range of parameters).


INTRODUCTION
The ethical goal of algorithmic fairness [6,62] is closely tied to the legal frameworks of both anti-discrimination and privacy.For instance, Title VII of the Civil Rights Act of 1964 [6] introduces two different notions of unfairness, namely, disparate impact [60], and disparate treatment [76], which are often at odds with each other [6].It is widely believed that a machine learning model can avoid violating disparate treatment (and privacy concerns) if the model does not explicitly use the protected attributes [6].However, it has been demonstrated that even if no protected attributes are explicitly used during training, a model might still be held liable for disparate impact, due to proxies of the protected attributes among other attributes in the dataset [18].In fact, existing literature on algorithmic fairness demonstrates that leveraging protected attributes during training can essentially prevent disparate impact, e.g., by minimizing a fairness metric as a regularizer with the loss function during training [38,45].Thus, mitigating disparate impact often seems to be at odds with disparate treatment [51] (and privacy), depending on whether the protected attribute is being explicitly used or not.
One potential resolution (still debated [47]) is to use the protected attributes only during training to mitigate disparate treatment but not after deployment.Nonetheless, the use of protected attributes during model training remains to be a source of active debate and contention for various applications [51,53].On one hand, using protected attributes during training enables one to actively audit and account for biases, as well as understand how specific groups of people are affected.On the other hand, these protected attributes can also be used maliciously, e.g., to exacerbate discrimination [59].An interesting example arises where the protected attributes can even be used to "mask" discrimination [23,28,29,46], e.g., an expensive housing Ad is shown to only high-income White individuals and low-income Black individuals but not to low-income White individuals and high-income Black individuals (assuming an equal proportion of all these four sub-groups) [29].The decision is clearly discriminatory against high-income Black individuals for whom the Ad is relevant and yet they do not get to see it.This discrimination is masked since the decision might still satisfy statistical independence between the two races.
In several applications, e.g., in finance, anti-discrimination and privacy regulations adopt a stance that completely prohibits the use of protected attributes during training.In the finance domain, institutions cannot ask about an individual's race for credit decisioning, while at the same time having to prove that their decisions are non-discriminatory [16].The Apple Card credit card was recently accused of discriminatory credit decisioning since women received lower credit limits than equally qualified men, despite not using the gender explicitly during training [70].
Fairness assessment of these models is extremely challenging when the protected attributes are unavailable 1 .To address this, institutions often adopt a separation between the model developers and the compliance team [16].The compliance team is responsible for ensuring methods do not violate anti-discrimination and privacy laws.As a result, the compliance team has access to the entire dataset, including the attributes protected by law (i.e., race, gender, etc.) [15,16].Only a subset of the data fields is visible to the model developers who train these models.The compliance team determines which attributes the model developers are allowed to see and use to train their models.Clearly, the model developers would not have access to the protected attributes.For fairness assessment, the model developers may however query the compliance team for certain group fairness metrics, e.g., statistical parity, equalized odds, etc.The model developers can then choose which model to deploy or discard based on the query responses.
In this paper, we first demonstrate that simply querying for fairness metrics (bias) can also leak the protected attributes of targeted individuals.Furthermore, we demonstrate that there exist strategies by which the model developers can always identify the protected attributes of all the individuals in the test dataset.We collectively refer to these strategies as Attribute-Reveal.Our finding poses an interesting debate in the policy aspects of fairness and privacy: Should querying for fairness metrics be viewed as a neutralvalued solution to ensure compliance with regulations?Or, does it constitute a violation of regulations and privacy, particularly if the number of queries answered is enough for the model developers to identify the protected attributes of specific individuals?To address this supposed violation of regulations and privacy, we also propose Attribute-Conceal, a novel differentially-private technique to answer queries without leaking the protected attributes.
To summarize, our main contributions are as follows: 1. Demonstrate that querying for bias can leak protected attributes: We first demonstrate that querying for fairness metrics, e.g., statistical parity, or equalized odds, can indeed leak the protected attributes of individuals.In Theorem 1, we provide the general criterion for reconstructing the protected attributes of all the individuals in the test dataset by querying for the statistical parity of several models (reduces to a linear system of equations).2. Leverage compressed sensing to reconstruct protected attributes with fewer queries: Building on our initial result (Theorem 1), we then demonstrate how protected attributes of individuals can be leaked using a much smaller number of statistical parity queries, provided the size of one group is much smaller than the other (see Theorem 2).Our findings also extend to the absolute value of statistical parity (see Section 3.3), as well as, other fairness metrics, e.g., equalized odds or their absolute values.We collectively refer to these proposed reconstruction strategies as Attribute-Reveal.3. Propose Attribute-Conceal, a novel technique that achieves differential privacy by calibrating noise to the smooth sensitivity of our bias query function: To avoid leaking protected attributes, we propose Attribute-Conceal, a technique that answers fairness queries in an −differentially-private manner (see Section 4.2).Since calibrating noise to global sensitivity (e.g., using the Laplace mechanism) can often hurt the utility of the answered query (because the noise becomes too high), we employ the smooth sensitivity framework [58], which adds dataset-specific additive noise to achieve differential privacy (see Theorem 6 in Section 4.2). 4. Experiments: To complement our theoretical results, we also provide experimental results on the Adult dataset [27] as well as perform simulations on synthetic data for a broad range of parameters.We demonstrate how Attribute-Reveal reconstructs the protected attributes, and how Attribute-Conceal prevents the reconstruction.We also compare Attribute-Conceal to other naive differential privacy techniques such as the Laplace mechanism.
Related Works: Algorithmic fairness is an active area of research [3, 7, 16, 19, 20, 23, 28-30, 32, 35, 37, 39, 45, 46, 48-50, 57, 65, 66, 68, 71, 74] that is receiving increasing attention.Many of these techniques assume that the protected attributes are available during training, which is not always allowed in practice.In several applications, such as credit or loan decisioning [15,16], the use of protected attributes during training is restricted by law.Our work is closely related to a body of work that addresses fairness without access to protected attributes [16,19,37,63,68], often using proxies to estimate the protected attributes.Our work lies in an area that relies on trusted third parties who have access to protected data necessary for improving fairness.For instance, [40,47,67] assume that the model has access to the protected attributes in an encrypted form via secure multi-party computation.Later, [42] noted that secure multi-party computation technique does not protect protected attributes from leaking, employing differential privacy [33] to learn fair models.[24] uses a fully homomorphic encryption scheme, allowing model developers to train models and test them for bias without revealing the protected attributes.More recent works [2,4,22,72] provide schemes that allow the release of protected attributes privately for learning non-discriminatory predictors.[43] proposes a differentially private mechanism to measure differences in performance across groups while protecting the privacy of group membership in a federated setting.
Our work instead addresses a novel problem statement: Can querying for fairness metrics leak protected attributes, and if so, how can we leverage smooth sensitivity to prevent this leakage.Our problem setup also differs from existing works in attribute inference attacks [21,36,41,52] where the focus is on learning protected information in training data from model outputs using supervised learning (we do not use group membership labels from past data).An interesting related work [73] studies query-based auditing algorithms to estimate the statistical parity of ML models.Another related work is [61], which focuses on the issue of fair washing, where manipulation techniques are utilized to mask unfairness when presenting the model's explanations to an auditor.

PROBLEM SETUP 2.1 Preliminaries
Note that  1 +  0 = , the size of the test dataset.

Review of Relevant Group Fairness Metrics:
Definition 2.1 (Statistical Parity Gap (  )).Statistical parity gap (  ) is defined as the difference in expected outcome between the advantaged and disadvantaged groups, i.e., Definition 2.2 (Eqal Opportunity Gap (  )).Equal opportunity gap (  ) is defined as: where We also denote the absolute values of statistical parity and equal opportunity gap as |  | and |  | respectively.Remark 1.We note that although we only define statistical parity and equal opportunity, our techniques can be extended to several other group fairness measures, such as equalized odds, predictive parity, etc. as well as their absolute values.We further discuss this in Remark 5.

Problem Statement
Institutions often adopt a separation between the model developers and the compliance team to ensure anti-discrimination and privacy laws are met.The model developers do not have access to protected attributes and therefore cannot use them for training.The compliance team, however, has access to the entire dataset, but only for auditing purposes.In our setting, the model developers train  different classifiers ℎ  (•) with  ∈ [] on the training dataset.For the fairness assessment of these models before deployment, the model developers are allowed to test for algorithmic bias by querying the compliance team for certain fairness metrics on the test dataset S = (, , ).Note that the classifier ℎ  (•) is only a function of the input   and not the protected attributes   .The fairness metrics that the model developers can query for includes the statistical parity gap, and equal opportunity gap as well as their absolute values (see system model in Figure 1).The main question that we ask in this work is: Is this technique of querying for fairness metrics effective in keeping the protected attributes hidden from the model developers?Or in general, does querying for fairness leak protected attributes?And if so, how can one answer queries without leaking protected attributes?
Remark 2. The approach outlined in this paper can be extended to a scenario with an institution (that trains a model without access to protected attributes) and an external fairness auditing team, e.g., in [73].The auditing team has access to the entire data for the purpose of evaluating the bias of the models and is responsible for informing the model developers on whether their deployed model passes fairness tests based on some fairness metric.In this setting, our concern is whether auditing for fairness compromise and leak the protected attributes to the institution.

ATTRIBUTE-REVEAL: QUERYING FOR BIAS LEAKS PROTECTED ATTRIBUTES 3.1 Demonstrating Leakage From Querying
Here, we show that simply querying for fairness metrics such as statistical parity can reveal the protected attribute of any targeted individual to the model developers.In fact, there exist strategies (that we collectively refer to as Attribute-Reveal) that can reveal the protected attributes of all the individuals in the test dataset.We begin with a simple toy example.
Example 1 (Single Query).The model developers train only one binary classifier ℎ 1 (•) and query the compliance team for statistical parity gap.Suppose, the model developers want to find the protected attribute of the first individual.They can choose a classifier that accepts only the first individual, i.e., ℎ 1 ( 1 ) = 1 and ℎ 1 (  ) = 0 for  = 2, 3, . . ., .Observe that the statistical parity gap of this model will reveal the protected attribute of the first individual as follows: Thus, a positive statistical parity gap  1 would give away that the individual belongs to the advantaged group, whereas a negative gap indicates that the individual belongs to the disadvantaged group.The query also reveals the sizes of these groups  1 and  0 .
We note that such a model might seem contrived; it might also have low accuracy.Thus, in Example 2, we demonstrate a more realistic scenario that can occur more commonly in practice: two models of comparable accuracy can be used to reveal the protected attribute of a targeted individual.
Example 2 (Double Query With Realistic Models).Consider two models ℎ 1 (•) and ℎ 2 (•).The model ℎ 1 (•) is trained by model developers (to maximize accuracy) and accepts several individuals in the dataset.The model developers also use a second classifier, ℎ 2 (•), that provides the same prediction as ℎ 1 (•) except for one targeted individual, i.e., ℎ 2 ( 1 ) = 1 − ℎ 1 ( 1 ) and ℎ 2 (  ) = ℎ 1 (  ) for  = 2, 3, . . ., .Notice that ℎ 1 (•) and ℎ 2 (•) differ only in the first prediction.Their accuracies are almost similar.If one queries for statistical parity of these two models, they can identify the protected attribute of the first individual as follows: A similar approach can be adopted to reveal the protected attributes of all the individuals in the test dataset.Our next result provides the general criterion for reconstructing the protected attribute of all the individuals in the test dataset using the statistical parity queries (see Appendix A for proof).
Theorem 1 (Reveal From Linear System of Equations).Let  =  1  2 . . .   be a vector of statistical parity gap queries for  models, ℎ  (  ) denote the -th model's prediction for the -th individual, and H be an  ×  matrix where each row represents the binary or logistic predictions of the -th model.If rank(H) = , then the protected attributes  = ( 1 ,  2 , ...,   , ...,   ) of the entire dataset can be identified by solving a linear system of equations: This can also be expressed as, where v is the unknown vector with elements taking values: Strictly speaking, one needs  =  − 1 queries since the last individual can be identified using group sizes  1 and  0 .However, one may encounter numerical errors when solving for  1 from 1/ 1 or,  0 from 1/ 0 .This can happen when  1 ,  0 >> 1 and  0 ≈  1 .
Algorithm 1 provides a more realistic strategy by which model developers can choose practical models with comparable accuracy to reveal the protected attributes of all the individuals.Essentially, one base model ℎ 0 (•) could be trained, and several similar models Query for Statistical Parity Gap for each model and store in  Create matrix: ℎ  (  ) could be chosen so the prediction of ℎ  (  ) is flipped only when  = .The accuracy of these models would remain comparable to the base model ℎ 0 (•) since they differ in only one prediction.
We note that if the size of the test dataset is large, it may not be desirable to have as many models as the size of the dataset.This motivates our next question: Is it possible to obtain the protected attributes of individuals in the dataset with fewer models and queries?

Leaking Protected Attributes with Fewer
Queries using Compressed Sensing (CS) In this section, we demonstrate that compressed sensing (CS) techniques can be used to obtain the protected attributes of individuals using a significantly smaller number of queries ().First, we provide a brief background on compressed sensing in Section 3.2.1.
Readers already familiar with this topic may skip this subsection.

Brief
Background on Compressed Sensing.The goal [26,55] is to recover a vector  ∈ R  from a set of linear measurements  = Φ, where  is an  × 1 measurement vector, Φ ∈ R × is the sensing matrix.CS relies on the sparsity of .A vector  is -sparse if it has only  non-zero entries (typically  ≪ ).The measurements  are typically much smaller than , making this an under-determined system of equations, having many solutions for .CS focuses on finding the sparsest solution for .This can be expressed as an optimization problem: min Here, || || 0 is the number of nonzero entries2 of .By minimizing an  1 norm instead, this problem can be relaxed into a convex optimization problem which can be solved using linear programming or other CS algorithms, e.g., Orthogonal Matching Pursuit [56].
The  1 -norm || || 1 is the absolute sum of all entries of .We refer the reader to an excellent survey [14] for more information on CS.
For accurate recovery of -sparse vector  from measurements , the sensing matrix Φ has to satisfy a necessary and sufficient condition called Restricted Isometry Property (RIP) [10].

(𝑘-Restricted Isometry Property).
A matrix Φ ∈ R × satisfies the Restricted Isometry Property of order  if for all -sparse vector  ∈ R  , and for some constant   ∈ (0, 1), we have For a matrix Φ that satisfies the RIP condition of order 2 with  2 < √ 2−1 (see [11]), the vector  can be reconstructed from  and Φ by solving (5).Random matrices satisfy RIP of any order  with high probability provided that  = O ( log(/)) [5,12].Therefore, provided  is sufficiently sparse, smaller measurements  suffice to ensure a high-quality reconstruction of .It is also known that any CS algorithm will require at least  = Ω( log(/)) for reconstruction [17].
In general, designing or checking whether a sensing matrix satisfies the RIP condition is computationally difficult.RIP only gives a condition on whether a matrix can be used as a sensing matrix but does not necessarily mention how to design one in practice.There are certain random matrices that are known to satisfy the RIP condition with high probability [13,55].The most common is the Gaussian matrix, i.e., Φ × consists of  independent samples from a zero-mean Gaussian distribution with a variance of 1/ [13].The random binary matrix is another well-studied sensing matrix that is known to satisfy the RIP condition [75].

Reveal from Compressed Sensing.
Theorem 2 (Reveal From Compressed Sensing).Assume that  0 ≪  1 , i.e., the size of the disadvantaged group in the dataset is much smaller than the advantaged group, and H × is a random matrix strongly concentrated around its mean.Then, the protected attribute vector  = ( 1 ,  2 , ...,   , ...,   ) of the entire test dataset can be obtained using  = O ( 0 log(/ 0 )) statistical-parity-gap queries.
Proof.To prove Theorem 2, we convert (4) into a compressed sensing problem (we refer the reader to Section 3.2.1 for a background on compressed sensing).Recall from the proof of Theorem 1 that the reconstruction of the protected attributes reduces to solving the linear system of equations in (4), i.e.,  = H × v.
Compressed sensing allows the number of queries  to be much less than  by exploiting the sparsity of one group in the dataset.
We have,  = H(r − s), leading to  − Hr = Hs.Now, let  =  − Hr .Then, we have, By simple manipulations, we converted (4) into a standard compressed sensing problem (7) where,  is the measurement vector, H is the sensing matrix, and s is a sparse vector with  0 non-zero entries.From Theorem 1.2 in [11], the unknown vector s can be recovered if H satisfies RIP (see Definition 3.1) of order 2 0 with constant  2 0 < √ 2 − 1.Furthermore, Theorem 8 in Appendix B shows that a random matrix that is strongly concentrated around its expectation satisfies RIP of order 2 0 with high probability provided  0 ≤ /log(/ 0 ) (implies 2 0 ≤  ′ /log(/(2 0 ))) for constants ,  ′ > 0. Therefore  = O ( 0 log(/ 0 )) statistical parity gap queries suffice to successfully reconstruct the protected attribute vector  of the entire test dataset.□ Remark 4. For the model developers to use the CS technique, they also might need to know  1 and  0 .This can be found by querying using a model that only accepts one individual and first checking the sign of  1 .If  1 > 0, then we know  1 = 1  1 and  1 = 1, and hence we can get  1 .We can also obtain  0 =  −  1 .Alternatively, if  1 < 0, we know  1 = − 1  0 (and  1 = 0), and we can get  0 .We can also obtain  1 =  −  0 .
To effectively apply CS, the vector s must be sparse, meaning the size of one group (advantaged or disadvantaged) in the dataset must be significantly smaller than the other group.The sparsity requirement does not have a specific strict threshold.However, the smaller the minority group, the better CS performs with  = O ( 0 log(/ 0 )) models.As the size of the minority group increases, more models are needed.
The sensing matrix H should satisfy the Restricted Isometry Property (RIP) for CS to work (see Definition 3.1 in Section 3.2.1).Random binary matrices are well known to satisfy this property with high probability [75].Therefore, choosing models that predict {0, 1} randomly would work.However, this might lead to unrealistic models that have low accuracy (since essentially it means that model developers are choosing models with random predictions).Gaussian noise is also proven to satisfy the RIP condition but it may result in a model prediction that lies outside the range of [0, 1], making the models unrealistic.Even if we clip the values to lie between [0, 1], the Gaussian noise may lead to a large variation in accuracy from the base model.
Thus, in Lemma 1, we show that a sensing matrix whose entries are independently sampled from a uniform distribution with variance 1/ will satisfy the RIP property needed for CS.In practice, this motivates us to use small bounded noise so that the output values deviate as little as possible (Algorithm 2).Lemma 1.Let Φ ∈ R × be a random sensing matrix whose entries are drawn from an i.i.d Uniform − √︁ 3/, √︁ 3/ distribution, then the matrix Φ satisfies the Restricted Isometry Property (RIP) of order  ≤  1 /log(/) with at least probability 1 − 2 − 2  , for some constant  1 ,  2 > 0.
See proof in Appendix B. This motivates Algorithm 2, a novel and realistic strategy by which model developers can choose practical models with comparable accuracy to reveal the protected attributes of all the individuals.Essentially, one base model ℎ 0 (•) could be trained.For the other  models, small noise sampled from a uniform distribution is added to each output of the base model.If an output value goes outside [0, 1], clip the value to lie between [0, 1].

Extension to Absolute Statistical Parity Gap
With absolute statistical parity, it is still possible to partition individuals into two groups but no longer possible to determine which group represents the advantaged or disadvantaged populations with certainty.Being able to partition individuals in the test dataset based on their protected attributes is still a privacy infringement.This is mostly due to the ease with which the advantaged and The partitioned sizes can also be used to determine which group is which.In many cases, the disadvantaged group is often known to be significantly smaller than the advantaged group.
Theorem 3. Given  =  absolute-statistical-parity-gap queries, there exists a strategy that partitions individuals in the test dataset into two different groups based on their protected attributes.
Proof.We discuss such a strategy in the proof.Let us use  and  to represent the two partitions of the dataset, i.e.,  ∈ {, }  .Let   and   denote the size of  and  partitions respectively.Note that   +   = .
First, obtain   and   .This can be done by querying a model that accepts only one individual.The query will return | 1 | = 1/  or 1/  , revealing the size of the partitions.Now, consider the two cases.Case 1:   ≠   .If the size of the two groups is not equal, query a model ℎ 1 ( ) that accepts only the first individual in the dataset  1 .Assume that the individual belongs to the  partition, i.e.,  1 = .| 1 | would therefore be 1/  .Then, query a second model, ℎ 2 ( ), that accepts only the second individual Continue this procedure for every individual until everyone is classified into   =  or .Case 2:   =   .If the size of the two groups is the same, it would not be possible to differentiate between 1/  and 1/  .Hence, a slightly different approach is taken.First, query a model ℎ 1 ( ) that only accepts  1 and assume  1 = , resulting in Next, query a second model ℎ 2 ( ) that accepts only  1 and  2 .The protected attribute of  2 can be obtained using the query | 2 |, i.e., In general, to obtain the group of the -th individual   , select a model that accepts only  1 and   .To partition the whole dataset using this technique, the model developers would need at most  =  models and queries.□ Remark 5. Our results extend to other fairness metrics, such as equalized odds, equal opportunity, and predictive rate parity3 .However, when querying for measures like equal opportunity, the model developers can only identify the protected attributes of individuals with true label  = 1.Since equal opportunity conditions on  = 1, one does not get any information about individuals with  = 0.

DIFFERENTIALLY-PRIVATE APPROACHES TO BIAS ASSESSMENTS
In this section, we discuss approaches to prevent the problem of leaking protected attributes.The main goal is to answer fairness queries as accurately as possible but without leaking the protected attributes of any individual in the test dataset.This motivates us to leverage differential privacy [31,33].
The notion of -differential privacy was introduced in [31,33].The definition of differential privacy used in this work focuses on keeping the protected attributes private.Because the model developers already have access to a portion of the test dataset (,  ), we define neighboring datasets as datasets that differ only on one individual's protected attribute .For ,  ′ ∈ {0, 1}  , S = (,  , ) and S ′ = (, ,  ′ ) are neighboring if ∥ −  ′ ∥ 1 = 1.Let D denote a universe of all possible datasets.Definition 4.1 ((, )-Differential privacy).Consider any two test datasets S = (, , ) and S ′ = (, ,  ′ ), where  and  ′ differ on the protected attribute  of one individual.We say that a randomized mechanism M is (, )-differentially private if, for all neighbouring S, S ′ , and all  ⊆  (M), we have: where the randomness is over the choices made by M and  > 0 is the privacy budget parameter.
A smaller  introduces greater noise, resulting in enhanced privacy but reduced output accuracy.On the other hand, a larger  incorporates less noise, leading to weaker privacy guarantees but increased output accuracy.Here,  is the probability of information being accidentally leaked.If  = 0, M is -differentially private.A popular mechanism that achieves -differential privacy is the Laplace mechanism [34].The Gaussian mechanism achieves (, )-DP for numeric queries (details in [34, Theorem A.1]).

Laplace Mechanism for Answering Bias Queries Using Global Sensitivity
We first introduce the definition of global sensitivity for a set of bias queries, e.g., SP queries for a set of  models.
Definition 4.2 ( 1 -Global sensitivity [34]).The  1 -sensitivity of a query function  for all neighboring S, S ′ ∈ D is: A naive differentially private technique the compliance team could employ is the Laplace mechanism.Laplace Mechanism: Given a query function  : D → R  , the Laplace mechanism releases queries as follows: , or [0, 1] for absolute value metrics).In general, these mechanisms do not automatically deal with bounding constraints.Some choose to ignore them and release the raw outputs of the mechanisms since it still satisfies DP's privacy and accuracy guarantees.In our case, probabilities of out-of-bounds values are often small unless  is chosen to be very small.If one insists on having bounded outputs, there are recent approaches [54], such as the truncated and boundary-inflated truncation approaches.Other approaches map out-of-bounds outputs to the boundaries of the metric.

Attribute-Conceal: Our Proposed Technique Using Smooth Sensitivity
We have focused on adding noise to the query calibrated to its global sensitivity.However, this might be excessive in many cases, that is, the frameworks would add so much noise that the output would be meaningless.Since we are interested in a particular test dataset S, we define the local sensitivity of a query function  and test dataset S in  1 as: The challenge of calibrating noise to the local sensitivity is that it might leak information about the test dataset and therefore not sufficient to guarantee DP [58].To address this, we investigate the idea of smooth sensitivity introduced in [58].This is an intermediate notion between local and global sensitivity that allows dataset-specific additive noise to be added to achieve DP.Definition 4.3 (Smooth sensitivity [58]).For  > 0, the smooth sensitivity of f is where  (S, S) denotes the number of entries in which protected attribute vectors  and Ā disagree.
Proof.We first find the -smooth sensitivity of statistical parity gap Δ ℎ , (S).
Δ ℎ , (S) Here, (a) is from the definition of smooth sensitivity in (4.To achieve pure differential privacy, noise is introduced following a Cauchy distribution.This motivates Algorithm 3 (Attribute-Conceal), a differentially private technique to answer statistical parity gap queries based on Theorem 6.
We also note that the behavior of the Cauchy distribution can sometimes be unusual, as it does not have an expected value and has heavy tails that decay polynomially, compared to the exponential

𝜖
is (, )-differentially private, where the smooth sensitivity is given by: Here,  0 and  1 are sizes of disadvantaged and advantaged groups in dataset S, such that  0 ≤  1 ,  0 +  1 = , and  =

EXPERIMENTS
We include experimental results on the Adult dataset (see Table 1 and Figure 2) and simulations on synthetic dataset (see Figure 3, Table 2 and Table 3).For the Adult dataset, the protected attribute is race (assumed binary).We restrict ourselves to only White and Black with the latter being relatively sparse (10.4%).We first demonstrate how querying using Attribute-Reveal can leak the protected attributes.Then, we show that Attribute-Conceal effectively prevents this leakage (also outperforming the naive Laplace mechanism).
Our performance metrics of interest are (1) Average error in answering the Statistical Parity query (Avg.SP Err); and (2) Accuracy of correctly recovering (essentially leaking) the protected attribute balanced across both races (Leakage, formally defined in Definition 5.1).To observe the tradeoff between Avg.SP Err (query error) and Leakage over a broader range of parameters (privacy parameter , sparsity  0 , and test size ), we also perform simulations on a synthetic dataset.We provide additional experimental results on the German Credit dataset [27] in Appendix E.
Definition 5.1 (Leakage(%)).Let   be number of individuals in the advantaged group whose protected attribute was correctly predicted and   be number of individuals in the disadvantaged group whose protected attribute was correctly predicted.The leakage is defined as: The leakage is the balanced accuracy of recovery.This is used to deal with imbalanced data, i.e., when one target class appears a lot more than the other.

Experiments with Adult Dataset
The Adult dataset has 14 attributes for 48842 loan applicants.The classification task is to predict whether an individual's income is more or less that 50K [1].The feature "race" is chosen as the protected attribute.This feature is excluded from training and only used for statistical parity evaluation.We restrict ourselves to only White and Black (binary) with the latter being relatively sparse (10.4%).We compare Attribute-Conceal with a naive differential privacy technique, Laplace mechanism.We experiment with different test sizes and show our results in Figure 2 and Table 1.
Given an input, our base model ℎ 0 (•) outputs a probability value between 0 and 1.For the other  models, we add a small noise sampled from Uniform(−0.1, 0.1) distribution to each output of the base model.
We observe that the accuracy of the other models is quite close to the original.We created 40 models from the base model: they had a mean accuracy of 86.23% and a standard deviation of 0.2583.
Interestingly, our experiments demonstrate that with the uniform noise, we can still recover the protected attributes with far fewer models than the full-rank case.As shown in Figure 2, we are able to recover all the protected attributes using  = 40 models.Notice that, this is roughly O ( 0 log(/ 0 )).
Our recovery of protected attributes is based on the values of the s vector in Algorithm 2. Ideally, it should be 0 if   = 1 and 1/ 1 + 1/ 0 if   = 0.In our practical implementation, the compressed sensing solution is not always exact but still good enough to infer the protected attribute.Due to this, we use a threshold between 0 and 1/ 1 + 1/ 0 to identify the protected attribute.

Experiments with Synthetic Dataset
We perform simulations on synthetic data to observe the trade-off between Avg.SP Err and Leakage over a broader range of parameters (privacy parameter , sparsity  0 , and test size ).In Figure 3, we show this trade-off with Attribute-Conceal for test size  = 100, 1000, and 10000.Each point represents an  ∈ [10, 500] averaged over 50 runs.We show results for varying sparsity  0 and  = O ( 0 log / 0 ).Table 2 and Table 3 provide additional experimental results highlighting the Avg.SP err and Leakage for test size  = 1000 and  = 10000 for different sparsity  0 , the model number , and the privacy parameter .A clear trend observed is that Attribute-Conceal results in a significantly lower Avg.SP error compared to the Laplace mechanism, for a similar level of protected attribute leakage.

CONCLUSION AND FUTURE WORK
This work highlights a major concern with fairness assessments in scenarios where protected attributes such as gender or race cannot be accessed during model training.Showing that simply querying for fairness metrics can leak sensitive information to model developers raises important questions about the ethical implications of these assessments.As a remedy, we also propose a novel technique, Attribute-Conceal, which achieves differential privacy by calibrating noise to the smooth sensitivity of our bias query.
The results of this study have important implications for regulations and privacy in the field of algorithmic fairness and provide a new approach to protect the sensitive information of individuals in fairness assessments.This also provides a potential resolution to the continuing debate about whether protected attributes should be used in training.Future research could look into expanding the framework to include other fairness metrics or incorporating these techniques into training or post-processing to directly reduce bias without leaking protected attributes.
Our current approach assumes that both model developers and the compliance (or auditing) team work with the same test set.However, this might not hold true in every context.The compliance/auditing team may choose to use a different test set.However, note that a different test set may not adequately represent the true training distribution, which could potentially affect generalization.
We note that while our focus is on leakage from bias queries, future work could also look into inferring the protected attributes from the other available features using alternate techniques [8,9].For example, if one has prior knowledge that a feature such as hours-worked-per-week is strongly correlated with gender, one might just be able to infer gender with reasonable accuracy from that feature.However, it remains debatable if such indirect inferring of protected attribute from correlated features would legally constitute a violation of disparate treatment (or privacy).On the other hand, asking the compliance team for bias assessments actually accesses the protected attributes using queries.We do make a distinction between leaking and inferring protected attributes here.An interesting scenario would arise if one exploits a synergy of both bias queries as well as inference mechanisms to obtain even more accurate predictions of protected attributes than using either of them individually, and if such techniques would constitute a violation of anti-discrimination and privacy.

A APPENDIX TO SECTION 3.1 A.1 Proof of Theorem 1
Theorem 1 (Reveal From Linear System of Equations).Let  =  1  2 . . .   be a vector of statistical parity gap queries for  models, ℎ  (  ) denote the -th model's prediction for the -th individual, and H be an  ×  matrix where each row represents the binary or logistic predictions of the -th model.If rank(H) = , then the protected attributes  = ( 1 ,  2 , ...,   , ...,   ) of the entire dataset can be identified by solving a linear system of equations: This can also be expressed as, ) where v is the unknown vector with elements taking values: According to Theorem 1, if there are as many linearly independent model predictions ℎ  ( ),  ∈ [] as individuals in the dataset, then with their corresponding statistical parity queries,  =  1  2 ...    , one can always get the protected attributes of all individuals in the dataset by solving for v in (4).The protected attribute   of the -th individual is revealed by the value of each entry   of the vector v.If we assume the worst-case scenario, where the model developers can directly choose the output predictions of these models, then all they have to do is to use  =  models, and have each model's output prediction to be linearly independent, making rank(H) = .As an example, suppose you have  =  models, and the -th model accepts only the -th individual for  =  and rejects everyone else.

B APPENDIX TO SECTION 3.2 B.1 Background: Relevant Results in Compressed Sensing
To prove Lemma 1, we will use some results from existing literature on RIP for compressed sensing (restated in Theorem 8, Theorem 9, and Lemma 2).Definition B. 1 ([5]).Given a random matrix Φ ∈ R × and  ∈ R  , we say that Φ is strongly concentrated around its expectation if , where E(∥Φ ∥ 2 ) = ∥ ∥ 2 and  0 () > 0 depends only on .Theorem 8 ([5], Theorem 5.2).Suppose that ,  ∈ N and  ∈ (0, 1) are given.If Φ ∈ R × is a random matrix that is strongly concentrated around its expectation, then there exist  1 ,  2 > 0 depending only on  such that the RIP holds for Φ with the prescribed  and is of order  obeying  ≤  1 /log(/) with probability at least 1 − 2 − 2  .
Next, we demonstrate that a specific class of random variables called subgaussian random variables satisfy the requirements of Theorem 8. [64]).A random variable  is said to be -subgaussian, i.e.,  ∼ Sub( 2 ), if there exist some  > 0 such that for every  ∈ R,

Definition B.2 (Subgaussian
Theorem 9 ([25], Lemma 6.1).Suppose that Φ ∈ R × is a random matrix with i.i.d.entries from a -subgaussian distribution with variance 1  .Then for all  ∈ R  , we have for some  > 0 and  ∈ (0, 1).Now, we demonstrate that a uniform distribution satisfies the requirements of Theorem 8 by first showing that it is subgaussian.Lemma 2. Suppose random variable  is uniformly distributed over the interval [−, ] for some fixed  > 0. Then  is -subgaussian.
Proof.For a uniform distribution, where the inequality is from the fact that (2 + 1)! ≥ !2  .□
Proof.We show a random sensing matrix H × , whose entries are drawn from an i.i.d.Uniform distribution in the range − √︁ 3/, √︁ 3/ , will be strongly concentrated around its expectation.According to Lemma 2, the Uniform distribution is a subgaussian distribution.From Theorem 9, matrices from subgaussian distributions with variance 1/ are strongly concentrated around their expectations.Finally, by Theorem 8, matrix H × which is strongly concentrated around its expectation, will satisfy the Restricted Isometry Property with at least probability 1 − 2 ).We assume without loss of generality that  0 ≤  1 , and there is at least 1 individual in the disadvantaged group, i.e.,  0 ≥ 1 in all possible datasets.A neighboring dataset is one such that the protected attribute differs for only one individual.Consider these two cases.Case 1: An individual in the disadvantaged group, which the model predicted an unfavorable outcome, differs in the neighboring dataset in its protected attribute, i.e.,   = 0 with ℎ(  ) = 0 becomes  ′  = 1 with ℎ(  ) = 0. .
where (a) holds since the size of the disadvantaged group decreased by one and that of the advantaged group increased.,  was not incremented because ℎ(  ) = 0. Next, (b) holds since  is upper bounded by  1 , while  is upper bounded by  0 − 1 since ℎ(  ) = 0. Notice, this is the local sensitivity expression for the reference dataset.Hence, (c) is a maximization over all possible datasets to find the global sensitivity.The maximum occurs when  0 = 2 and  1 =  − 2, meaning, this dataset and the neighboring dataset with  0 = 1 and  1 =  − 1 have the largest sensitivity.Case 2: An individual in the disadvantaged group, which the model predicted a favorable outcome, differs in the neighboring dataset in its protected attribute, i.e.,   = 0 and ℎ(  ) = 1 becomes  ′  = 1 and ℎ(  ) = 1.Steps follow similar arguments with case 1. ).We assume without loss of generality that  0 ≤  1 , and there is at least 1 individual in the disadvantaged group, i.e.,  0 ≥ 1, in all possible datasets.A neighboring dataset is one such that the protected attribute differ by one individual.Consider these two cases.
Here, (a) holds since the protected attribute of the individual was flipped from   = 0 to  ′  = 1,  0 decreased by one, and  1 increased by one.,  were not incremented since ℎ(  ) = 0. To obtain the upper bound in (b),  is lower bounded by 0, while  is upper bounded by  0 − 1 since ℎ(  ) = 0. Notice, this is the local sensitivity expression for the reference dataset.Hence, (c) is a maximization over all possible datasets to find the global sensitivity.The maximum occurs when  0 = 2. Case 2: An individual in the disadvantaged group, which the model predicted a favorable outcome, differs in the neighboring dataset in its protected attribute, i.e.,   = 0 with ℎ(  ) = 1 becomes  ′  = 1 with ℎ(  ) = 1.Steps follow similar arguments with case 1.A probability distribution on R  , given by a density function , is (, )-admissible (with respect to ℓ 1 ) if, for  =  (, ),  =  (, ), the following two conditions hold for all Δ ∈ R  and  ∈ R satisfying ∥Δ∥ 1 ≤  and || ≤ , and for all measurable subsets  ⊆ R  : Sliding Property: Next, we show how an admissible noise distribution helps achieve differential privacy.

D.2 Proofs of Theorems 6 and 7
The following two results will help prove Theorem 6 and 7 in the paper.Lemma 4 ([58], Lemma 2.7.).For any  > 1, the distribution with density ( +1) -admissible (with  = 0).Moreover, the m-dimensional product of independent copies of  is 2 ) quantile of random variable ∥ ∥ 1 .For Theorem 7, we use Lemma 5 to show that the Laplace distribution is also an admissible noise distribution and hence can be used to achieve (, )-differential privacy from Lemma 3.

D.3 Additional Lemmas aiding proof of Theorem 6
Lemma 6.Let dataset S ∈ D, containing two groups with sizes  0 (S) and  1 (S), such that  0 (S) ≤  1 (S).The maximum local sensitivity at distance  can be expressed as follows: For one query, the maximum  change between S and any neighboring dataset S ′ is upper bounded by 1/( 1 + 1) + 1/ 0 .Therefore with  queries, we have the local sensitivity Δ   ( S) =   1 ( S)+1 +   0 ( S) .Now, we find the maximum local sensitivity over datasets that differ by  entries (maximum local sensitivity at distance ).The last equality (a) holds since the maximum when considering datasets at distance  is attained when the protected attribute of  individuals in the disadvantaged group is moved to the advantaged group.□ Remark 7. Similarly, when  0 (S) ≥  1 (S), the maximum local sensitivity at distance  can be expressed as follows:  to rather ensure that the protected attribute of no targeted individual is leaked with certainty.We are therefore interested in privatizing the protected attributes so that the performance is not better than random guessing, i.e., getting 50% recovery accuracy.
To demonstrate the effectiveness Attribute-Conceal in protecting the protected attribute, we use the mechanism in Theorem 6 to achieve -DP (see Algorithm 3). Figure 4 (a) shows the number of protected attributes recovered using CS when noise is added to the statistical parity queries.Unlike the noiseless case, recovery is no longer possible even with more models.In Figure 4 (b) and (d), we plot the recovered s vector and the answered statistical parity gap queries with  = 1000.We experiment with different values of  to study the trade-off between average statistical parity query error and recovery performance.Our results are summarized in Table 4.By increasing , there is a decrease in the average SP query error without a significant increase in number of protected attributes recovered.We show this for  = 800 and  = 1000 models.The model developers generally do not have control over the size of the advantaged or disadvantaged group.This dataset has a 70 : 30 male to female ratio.Despite the fact that  0 is not substantially less than  1 , and our sensing matrix is uniformly distributed, full recovery of protected attributes needed fewer models.

Figure 1 :
Figure 1: Illustrates an institutional separation between model developers and compliance team for ensuring fair and privacy-compliant machine learning models.Model developers train classifiers on the training dataset, but without access to protected attributes.Compliance team has access to the entire dataset for auditing purposes and is queried by model developers for fairness metrics.

Figure 2 :
Figure 2: Experimental Results on Adult dataset for test size  = 100: (a) Leakage as a function of No. of queries in noisy ( = 100) and noiseless case ( = ∞).Attribute-Conceal prevents Leakage even with an increase in the number of queries.Note that random guessing achieves a Leakage of about 50%, meaning no individual's protected attribute is recovered with certainty.(b) Answered SP queries for  = 40 achieves a low Avg.SP err of 7.41 × 10 −4 .(c) Reconstructed s vector with Attribute-Conceal varies a lot from the original (s vector reveals an individual's protected attribute, see equation (7)).Trained models have an Avg.accuracy of 86.23%, and std of 0.2583.

Figure 3 :
Figure 3: Experimental Results on Synthetic data for test size  = 100, 1000, and 10000: Avg.SP err and Leakage trade-off with Attribute-Conceal.Each point represents an  ∈ [10, 500] averaged over 50 runs.Results for varying sparsity  0 and  = O ( 0 log / 0 ).Table 2: Experimental Results on Synthetic data: Avg.SP err and Leakage for test dataset size  = 1000 and  = 10000 for Attribute-Conceal varying test dataset sparsity and model number  with  = 100.

Figure 4 :
Figure 4: (a) Number of recovered protected attributes as a function of number of models with and without Attribute-Conceal ( = 1000).Perfect recovery can be achieved by using 800 or more models for noiseless case.(b) The original and answered statistical parity (SP) queries for  = 800 models under  = 1000 with an average error of 4.78 × 10 −4 .(c) Recovered s vector used to infer the protected attribute  for  = 800.There is an overlap between original and recovered s vector.(d) The original and reconstructed s vector with Attribute-Conceal,  = 1000.(First 100 individuals and models are shown for clarity).

Table 1 :
Detailed Experimental Results on Adult dataset for test size  = 100 and  = 1000: Attribute-Conceal (Ours) has much lower Avg.SP err (query error) than Laplace Mechanism (Lap.) for the same privacy parameter  (and similar Leakage).  − ∥ ∥ 1 .For statistical parity gap query (), and ,  ∈ (0, 1), the mechanism M (S) =  (S) + decay observed in Laplace and Gaussian distributions.In Theorem 7, we therefore also provide a relaxed (, )-differentially private mechanism that introduces noise from a Laplace distribution.Theorem 7. Let  be random noise samples from a -dimensional Laplace distribution  () = 1 2

2 .
The maximum  1 difference between any two neighboring datasets is upper bounded by 1/2.Therefore with  queries, we have the global sensitivity Δ | | = /2.

Table 4 :
Avg. SP err and No. of recovered protected attribute for different values of  and . Avg.SP err (×10 −3 ) No of recovered protected attribute  = 800  = 1000