fix: handle unpadded base32/base64 digests and improve defaults#91
fix: handle unpadded base32/base64 digests and improve defaults#91
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses WARC digest interoperability by accepting unpadded Base32/Base64 digests during decoding and updating default digest generation to avoid padding characters (per current WARC community guidance), while also switching defaults to stronger hashing.
Changes:
- Strip Base32 padding on encoding and accept missing padding on Base32/Base64 decoding.
- Update default digest algorithm to
sha256and default encoding behavior to “spec-recommended per algorithm” (SHA-1 → Base32, others → Base16). - Refresh/extend tests to reflect the new digest defaults and unpadded Base32 support.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
digest.go |
Implements no-padding Base32 encoding, tolerant Base32/Base64 decoding, and algorithm-based default encoding selection when unset. |
options.go |
Changes default digest algorithm to sha256 and default digest encoding to “unknown” (resolved later). |
digest_test.go |
Updates expected Base32 outputs (no padding) and adds tests for unpadded Base32 behavior. |
record_test.go |
Updates expected revisit-record digests to match new default digest algorithm/encoding. |
unmarshaler_test.go |
Updates expected computed digest headers when missing digests are auto-added under new defaults. |
Comments suppressed due to low confidence (1)
digest_test.go:172
- The new Base64 unpadding support in
DigestEncoding.decode(and the updateddetectEncodinglogic for no-padding Base64) isn't covered by tests here. Adding avalid Base64 unpaddedcase (and ideally adetectEncodingcase for an unpadded Base64 digest length) would prevent regressions in the new behavior.
{
name: "valid Base32 unpadded",
encoding: Base32,
input: "BT7V3XGGA4OVPYPVNTNJUMTXGY",
wantError: false,
},
{
name: "valid Base64",
encoding: Base64,
input: "CY9rzUYh03PK3k6DJie09g==",
wantError: false,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5559f29 to
346b9ee
Compare
346b9ee to
ffde1b2
Compare
88474c4 to
5147141
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f14a522 to
eb307de
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
eb307de to
f2394d3
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
digest_test.go:125
- Typo in test case names: "lovercase" should be "lowercase".
{"lovercase base32 encoding", "Some content", "sha1:t4ng5t3u5h43dlss5dvvqhkcbzr6qrj2", true},
{"lovercase base64 encoding", "Some content", "sha1:nxpuz3tp+bguuujrwb1cdmporto=", false},
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- accept unpadded Base32/Base64 digests when decoding/validating - use no-padding Base32/Base64 encoding for digest output - normalize and resolve digest defaults in default option construction - default missing algorithm fallback to sha256 - update and expand tests for encoding detection/validation and options
f2394d3 to
3fb3e77
Compare
Fixes #90