-
Notifications
You must be signed in to change notification settings - Fork 1
Description
We're going to use this to attack the Uber anonymization system. I'm not sure what queries that system allows, but @rbh-93 is working on it, so he can answer questions about that or give you access to an implementation.
In our attack, we want to make a query that has exactly one user in the answer with some reasonable probability. In the attack, we find out if that is the case or not. If it is the case, then we make a singling-out claim for that user. If not, then we don't make a claim.
The first step is to find sets of column values or value ranges that have a good chance of identifying a single user. If you know the number of distinct users associated with any given column value, and you know the number of users in the table, then prob_user1 = col_val_users1/total_users is the probability that any given user has that column value. Then you want to find cases where:
total_users * prob_user1 * prob_user2 * ... = 1 (roughly)
In other words, the expected number of users with column/value 1 and column/value 2 and ... is one.
You can learn the total users with:
select count(distinct uid)
from table
To learn these probabilities for any given column, you can query the raw database with this query:
select column, count(distinct uid)
from table
order by 2 desc
limit 200
Use the askExplore() call on the raw database (rawDb) to do these.
Once you have a set of columns and values where this is the case, you can make a query like this:
select count(distinct uid)
from table
where col1 = val1 and col2 = val2 and ...
For the Uber system, each time you repeat the query, you get a new noise value with mean zero. So if you take X answers and take the average, you'll get the true answer with some probability.
After X queries, we predict that the true answer is 1 if the averaged answer is between 0.5 and 1.5.
We repeat the above X times and make a guess. For this query, use the askAttack() call, so that the system records it as an attack query. Once you have a guess, use the askClaim() call to record the guess. You can see examples of how these are used for other attacks in code/attacks.