Skip to content

fix: handle unpadded base32/base64 digests and improve defaults#91

Merged
maeb merged 1 commit intomasterfrom
fix/digest-encoding
Mar 24, 2026
Merged

fix: handle unpadded base32/base64 digests and improve defaults#91
maeb merged 1 commit intomasterfrom
fix/digest-encoding

Conversation

@maeb
Copy link
Copy Markdown
Member

@maeb maeb commented Mar 24, 2026

Fixes #90

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses WARC digest interoperability by accepting unpadded Base32/Base64 digests during decoding and updating default digest generation to avoid padding characters (per current WARC community guidance), while also switching defaults to stronger hashing.

Changes:

  • Strip Base32 padding on encoding and accept missing padding on Base32/Base64 decoding.
  • Update default digest algorithm to sha256 and default encoding behavior to “spec-recommended per algorithm” (SHA-1 → Base32, others → Base16).
  • Refresh/extend tests to reflect the new digest defaults and unpadded Base32 support.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
digest.go Implements no-padding Base32 encoding, tolerant Base32/Base64 decoding, and algorithm-based default encoding selection when unset.
options.go Changes default digest algorithm to sha256 and default digest encoding to “unknown” (resolved later).
digest_test.go Updates expected Base32 outputs (no padding) and adds tests for unpadded Base32 behavior.
record_test.go Updates expected revisit-record digests to match new default digest algorithm/encoding.
unmarshaler_test.go Updates expected computed digest headers when missing digests are auto-added under new defaults.
Comments suppressed due to low confidence (1)

digest_test.go:172

  • The new Base64 unpadding support in DigestEncoding.decode (and the updated detectEncoding logic for no-padding Base64) isn't covered by tests here. Adding a valid Base64 unpadded case (and ideally a detectEncoding case for an unpadded Base64 digest length) would prevent regressions in the new behavior.
		{
			name:      "valid Base32 unpadded",
			encoding:  Base32,
			input:     "BT7V3XGGA4OVPYPVNTNJUMTXGY",
			wantError: false,
		},
		{
			name:      "valid Base64",
			encoding:  Base64,
			input:     "CY9rzUYh03PK3k6DJie09g==",
			wantError: false,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

This comment was marked as outdated.

@maeb maeb force-pushed the fix/digest-encoding branch 2 times, most recently from 5559f29 to 346b9ee Compare March 24, 2026 18:22
@maeb maeb requested a review from Copilot March 24, 2026 18:23

This comment was marked as outdated.

This comment was marked as outdated.

@maeb maeb force-pushed the fix/digest-encoding branch 2 times, most recently from 88474c4 to 5147141 Compare March 24, 2026 19:37
@maeb maeb requested a review from Copilot March 24, 2026 19:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@maeb maeb force-pushed the fix/digest-encoding branch 2 times, most recently from f14a522 to eb307de Compare March 24, 2026 20:11
@maeb maeb requested a review from Copilot March 24, 2026 20:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

digest_test.go:125

  • Typo in test case names: "lovercase" should be "lowercase".
		{"lovercase base32 encoding", "Some content", "sha1:t4ng5t3u5h43dlss5dvvqhkcbzr6qrj2", true},
		{"lovercase base64 encoding", "Some content", "sha1:nxpuz3tp+bguuujrwb1cdmporto=", false},

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- accept unpadded Base32/Base64 digests when decoding/validating
- use no-padding Base32/Base64 encoding for digest output
- normalize and resolve digest defaults in default option construction
- default missing algorithm fallback to sha256
- update and expand tests for encoding detection/validation and options
@maeb maeb force-pushed the fix/digest-encoding branch from f2394d3 to 3fb3e77 Compare March 24, 2026 20:45
@maeb maeb merged commit 0db7d67 into master Mar 24, 2026
2 checks passed
@maeb maeb deleted the fix/digest-encoding branch March 24, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Base32 encoded sha256 digest without padding breaks warc validation

2 participants