Prevent persistent storage directory collisions for similiar repository URLs#72
Prevent persistent storage directory collisions for similiar repository URLs#72phenaproxima wants to merge 3 commits intomainfrom
Conversation
| // names easily distinguishable. | ||
| $hashedUrl = hash('sha256', $url); | ||
|
|
||
| return $escapedUrl . '-' . substr($hashedUrl, 0, 8); |
There was a problem hiding this comment.
what is the purpose of only using the first 8 chars? Presumably we are using sha256 because we want a long hash that is very unlikely to conflict? doesn't using only part of it defeat that purpose. Why use 1 of the hash_algos() that just produces a shorter string. I don't think we worried about attacks where people try take advantage of the possible matches, correect?
There was a problem hiding this comment.
My feeling is that we want just enough of the hash to virtually guarantee uniqueness, but to keep the final generated directory name understandable by humans. Maybe crc32b would suffice? (I'd certainly prefer to remove the string manipulation if possible.)
I'll wait for @davidstrauss to make the final call here.
There was a problem hiding this comment.
I feel like I'll regret suggesting we can truncate the SHA-256 output substantially. It's the kind of decision that I'd want to do deliberately if we do it at all, and I'm not certain how to compute the bytes of collision prevention we need here. On the other hand, I know it's safe to use the mangled name + SHA2, as the SHA2 alone would be defensibly unique.
If we're worried about an unwieldy file name, there are other serializations of SHA-256 output that have greater density than hex. So, maybe get the hash in binary and base64_encode() it? The output should be about 34-40 characters.
Finally, I'd consider just rawurlencode() and no hash at all. That escaping is RFC-compliant to the point where you shouldn't see the implementation's output drift. Percent encoding isn't particularly obnoxious to use for files/shell purposes, as the percent character isn't as special in that context as other escaping/quoting options.
You're welcome to run forward with any of these options (full hex SHA256, base64 SHA256, or robust escaping) without waiting on further input from me. :-)
Addresses #24 by implementing this approach: