Skip to content

race condition in the locking code of the certificate storage #557

@amwolff

Description

@amwolff

The algorithm that we implement is buggy (the implementation is buggy, not necessarily the pseudocode there; it was just taken too literally). It uses if-metageneration with metageneration numbers everywhere, which are meaningless without generation numbers.

So here's a race condition that I can think of now:

  1. There's an expired lock
  2. linksharing 1 tries to lock
  3. Another linksharing 2 tries to lock
  4. linksharing 1 deletes the expired lock because it's expired
  5. linksharing 1 recreates the lock immediately after that
  6. linksharing 2 just checked that the deleted lock was expired
  7. linksharing 2 erroneously deletes the new lock thinking it deletes the expired lock because it only checks metageneration numbers
  8. linksharing 2 recreates the lock
  9. Now both linksharing 1 & 2 have a shared lock they individually think is theirs

Chances of this happening are thin (maybe something in the logs to check if this ever happened?). Even if this happens 1 in 10 times, it will/should work anyway. But I think we should probably do something for testing to ensure there are no other race conditions there.

Most of the implementations don't even have locking. For us, the biggest issue would be a deadlock or starvation, but I don't think these can happen in this implementation.

I believe this wouldn't happen if we used the official Google library for interacting with GCS for certificate storage. We should probably drop the custom client and do just this; adjusting the algorithm will be necessary this way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions