Skip to content
This repository was archived by the owner on Mar 16, 2023. It is now read-only.
This repository was archived by the owner on Mar 16, 2023. It is now read-only.

Systems of equations attack to recover visited websites? #112

@lknik

Description

@lknik

I'm building on this issue by @johnwilander.

In https://github.com/WICG/floc/issues/99, it is stated that "FLoC is not useful for tracking." I don't think that's accurate.

As far as I know, the user's cohort will not be partitioned per first party site so multiple sites can observe the cohort ID in sync as it changes week after week. A hash of the cohorts seen so far will likely get more and more unique as the weeks go by.

Websites or tracker scripts on websites can expose arrays of the cohorts they've seen to help all trackers identify the user, like this:

let cohortCollectionForWebsiteA = [
  "week01_2022" : "0666",
  "week03_2022" : "A566",
  "week04_2022" : "2111",
  "week05_2022" : "1171",
  "week07_2022" : "749B",
]

let cohortCollectionForWebsiteB = [
  "week01_2022" : "0666",
  "week02_2022" : "0030",
  "week05_2022" : "1171",
  "week06_2022" : "7311",
  "week07_2022" : "749B",
]

Trackers send these to a server for matching across websites, in the example above, resulting in the intersection [ "week01_2022", "week05_2022", "week07_2022" ].

Just a slight thought.

What if we consider a slightly different threat/exploitation scenario (unless it's simply a flavour of tge remarks quoted n the above, which is why I retain them here), which are linked to the risks I already pointed to. Specifically reversing the cohort ID to obtain the actually visited websites?.

So the idea to hypothetically improve such reversal could follow a reasoning where we know that the sets of visited sites in week_i (i = given week of the year) corresponds to a specific ID_i. The computation of the FloC is based on input (website addresses). So I wonder if it would be possible to mathematically construct a system of equations of the form:

FloC(sites_1) = ID_1,
 
...,
 
FloC(sites_i) = ID_i

And then use the properties of SimHash to obtain the visited websites or at least improve the inference of the visited websites. Note, I did not focus on the analytical solution so I do not know the circumstances when such a system of equations would be solvable. It would be interesting to consider it in the threat model, though, so I leave a proof exercise to the proponents. Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions