Fix: Encode b1 from raw bytes to avoid byte corruption #75

Cloxl · 2025-12-12T04:54:14Z

Cherry-picked from PR #74. Only includes the b1 encoding fix commit.

Summary by Sourcery

直接从原始字节而不是 JSON 字符串对指纹元数据进行编码，以防止在生成 b1 时发生 Base64 损坏。

Bug 修复：

通过直接对底层字节数组进行编码，而不是对其 JSON 字符串表示进行编码，修复 b1 指纹生成问题，从而避免字节损坏。

增强功能：

扩展 Base64 编码器接口，使其在接受字符串的基础上，也能接受字节和类字节（byte-like）输入。

Original summary in English

Summary by Sourcery

Encode fingerprint metadata directly from raw bytes instead of JSON strings to prevent Base64 corruption in b1 generation.

Bug Fixes:

Fix b1 fingerprint generation by encoding the underlying byte array directly rather than its JSON string representation, avoiding byte corruption.

Enhancements:

Extend the Base64 encoder interface to accept bytes and byte-like inputs in addition to strings.

…byte corruption during encoding. This restores parity with browser/VM output: the base64 result must start with I38r…. Prevents downstream requests from being rejected due to an invalid b1 value.

sourcery-ai · 2025-12-12T04:54:26Z

审查者指南（在小型 PR 上折叠）

审查者指南

调整 Base64 编码逻辑，使其直接对原始字节序列进行操作，并在生成指纹 b1 时传递字节而非 JSON 文本，以避免 UTF‑8 字符串编码带来的字节损坏问题。

原始字节版 b1 指纹编码的时序图

sequenceDiagram
    participant FingerprintGenerator
    participant Base64Encoder
    participant base64

    FingerprintGenerator->>FingerprintGenerator: generate_b1(fp: dict)
    FingerprintGenerator->>FingerprintGenerator: build list b from fp
    FingerprintGenerator->>FingerprintGenerator: b1_bytes = bytearray(b)
    FingerprintGenerator->>Base64Encoder: encode(b1_bytes)
    activate Base64Encoder
    Base64Encoder->>Base64Encoder: data_bytes = data_to_encode
    Base64Encoder->>base64: b64encode(data_bytes)
    base64-->>Base64Encoder: standard_encoded_bytes
    Base64Encoder->>Base64Encoder: standard_encoded_string = decode("utf-8")
    Base64Encoder-->>FingerprintGenerator: b1 (custom alphabet Base64 string)
    deactivate Base64Encoder
    FingerprintGenerator-->>FingerprintGenerator: return b1

更新后的 Base64Encoder 与指纹 b1 生成的类图

classDiagram
    class CryptoConfig {
        <<config>>
        STANDARD_BASE64_ALPHABET: str
        CUSTOM_BASE64_ALPHABET: str
    }

    class Base64Encoder {
        -config: CryptoConfig
        -translation_table: dict
        +Base64Encoder(config: CryptoConfig)
        +encode(data_to_encode: bytes | str | Iterable~int~): str
    }

    class FingerprintGenerator {
        -_encoder: Base64Encoder
        +generate_b1(fp: dict): str
    }

    CryptoConfig <.. Base64Encoder : uses
    Base64Encoder <.. FingerprintGenerator : used by

文件级变更

变更	详情	文件
扩展 Base64Encoder.encode 的输入类型以接受原始字节/可迭代对象，并在处理类字节输入时跳过 UTF-8 重新编码。	更新 encode 函数签名，以接受 bytes、str 或 Iterable[int]。修改内部逻辑，使输入默认被视为“已是字节”，仅在输入不是 bytearray 时调用 .encode('utf-8')。在标准 base64 编码之后，保留现有的自定义字母表重新映射逻辑不变。	`src/xhshow/utils/encoder.py`
从原始字节而非 JSON 序列化文本生成 b1 指纹载荷，以防止数据损坏。	将 b1 生成逻辑改为使用 bytearray(b) 调用编码器，而不是 json.dumps(b, separators=(",", ":"))。保留上游构建指纹数据字节值列表的逻辑不变。	`src/xhshow/generators/fingerprint.py`

技巧与命令

与 Sourcery 交互

触发新的审查： 在 pull request 中评论 @sourcery-ai review。
继续讨论： 直接回复 Sourcery 的审查评论。
从审查评论生成 GitHub Issue： 在审查评论下回复，请求 Sourcery 从该评论创建一个 issue。你也可以在审查评论中回复 @sourcery-ai issue 来创建 issue。
生成 pull request 标题： 在 pull request 标题的任意位置写上 @sourcery-ai，即可随时生成标题。你也可以在 pull request 中评论 @sourcery-ai title 来（重新）生成标题。
生成 pull request 摘要： 在 pull request 正文的任意位置写上 @sourcery-ai summary，即可在你想要的位置生成 PR 摘要。你也可以在 pull request 中评论 @sourcery-ai summary 来随时（重新）生成摘要。
生成审查者指南： 在 pull request 中评论 @sourcery-ai guide，即可随时（重新）生成审查者指南。
一次性解决所有 Sourcery 评论： 在 pull request 中评论 @sourcery-ai resolve，将标记所有 Sourcery 评论为已解决。如果你已经处理了所有评论且不再希望看到它们，这会很有用。
忽略所有 Sourcery 审查： 在 pull request 中评论 @sourcery-ai dismiss，以忽略所有现有的 Sourcery 审查。特别适合在你想“清零”并重新开始新的审查时使用——别忘了再评论 @sourcery-ai review 来触发新的审查！

自定义你的使用体验

访问你的控制面板以：

启用或禁用诸如 Sourcery 生成的 pull request 摘要、审查者指南等审查功能。
更改审查语言。
添加、删除或编辑自定义审查指令。
调整其他审查设置。

获取帮助

如有问题或反馈，请联系支持团队。
访问我们的文档，获取详细指南和信息。
通过关注我们的 X/Twitter、LinkedIn 或 GitHub 与 Sourcery 团队保持联系。

Original review guide in English

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adjusts Base64 encoding to operate directly on raw byte sequences and updates fingerprint b1 generation to pass bytes instead of JSON text, in order to avoid byte corruption from UTF‑8 string encoding.

Sequence diagram for raw-byte b1 fingerprint encoding

sequenceDiagram
    participant FingerprintGenerator
    participant Base64Encoder
    participant base64

    FingerprintGenerator->>FingerprintGenerator: generate_b1(fp: dict)
    FingerprintGenerator->>FingerprintGenerator: build list b from fp
    FingerprintGenerator->>FingerprintGenerator: b1_bytes = bytearray(b)
    FingerprintGenerator->>Base64Encoder: encode(b1_bytes)
    activate Base64Encoder
    Base64Encoder->>Base64Encoder: data_bytes = data_to_encode
    Base64Encoder->>base64: b64encode(data_bytes)
    base64-->>Base64Encoder: standard_encoded_bytes
    Base64Encoder->>Base64Encoder: standard_encoded_string = decode("utf-8")
    Base64Encoder-->>FingerprintGenerator: b1 (custom alphabet Base64 string)
    deactivate Base64Encoder
    FingerprintGenerator-->>FingerprintGenerator: return b1

Class diagram for updated Base64Encoder and fingerprint b1 generation

classDiagram
    class CryptoConfig {
        <<config>>
        STANDARD_BASE64_ALPHABET: str
        CUSTOM_BASE64_ALPHABET: str
    }

    class Base64Encoder {
        -config: CryptoConfig
        -translation_table: dict
        +Base64Encoder(config: CryptoConfig)
        +encode(data_to_encode: bytes | str | Iterable~int~): str
    }

    class FingerprintGenerator {
        -_encoder: Base64Encoder
        +generate_b1(fp: dict): str
    }

    CryptoConfig <.. Base64Encoder : uses
    Base64Encoder <.. FingerprintGenerator : used by

File-Level Changes

Change	Details	Files
Broaden Base64Encoder.encode input types to accept raw bytes/iterables and skip UTF-8 re-encoding for byte-like inputs.	Updated encode signature to accept bytes, str, or an Iterable[int]. Changed internal logic to treat the input as already-bytes by default and only call .encode('utf-8') when the input is not a bytearray. Left the existing custom alphabet remapping logic intact after the standard base64 encoding.	`src/xhshow/utils/encoder.py`
Generate b1 fingerprint payload from raw bytes rather than JSON-serialized text to prevent corruption.	Switched b1 generation to call the encoder with bytearray(b) instead of json.dumps(b, separators=(",", ":")). Kept the upstream logic that builds the list of byte values from the fingerprint data unchanged.	`src/xhshow/generators/fingerprint.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - 我已经查看了你的更改，发现有一些需要解决的问题。

更新后的 encode 方法签名声明支持 bytes | str | Iterable[int]，但实现中会对所有非 bytearray 的输入调用 .encode()，这在处理像 list[int] 这样的通用可迭代对象时会失败，对 bytes 来说也比较奇怪；建议要么将类型标注收窄为 str | bytes | bytearray 并检查 isinstance(data_to_encode, (bytes, bytearray))，要么在编码前显式地把 Iterable[int] 转为 bytes/bytearray。
目前的 if not isinstance(data_to_encode, bytearray) 分支会把 bytes 当作文本重新编码，这可能会破坏已经编码好的字节序列；请更新条件以检测 bytes/bytearray（例如 if isinstance(data_to_encode, (bytes, bytearray))），并且只对真正的 str 输入调用 .encode('utf-8')。

供 AI 代理使用的提示（Prompt）

Please address the comments from this code review:

## Overall Comments
- The updated `encode` signature advertises support for `bytes | str | Iterable[int]`, but the implementation calls `.encode()` on any non-`bytearray` input, which will fail for general iterables like `list[int]` and is odd for `bytes`; consider either narrowing the type annotation to `str | bytes | bytearray` and checking `isinstance(data_to_encode, (bytes, bytearray))`, or explicitly converting an `Iterable[int]` to `bytes/bytearray` before encoding.
- The current `if not isinstance(data_to_encode, bytearray)` branch will treat `bytes` as text and re-encode them, which can corrupt already-encoded byte sequences; update the condition to detect `bytes`/`bytearray` (e.g. `if isinstance(data_to_encode, (bytes, bytearray))`) and only call `.encode('utf-8')` for true `str` inputs.

## Individual Comments

### Comment 1
<location> `src/xhshow/utils/encoder.py:33-36` </location>
<code_context>
         )

-    def encode(self, data_to_encode: str) -> str:
+    def encode(self, data_to_encode: bytes | str | Iterable[int]) -> str:
         """
         Encode a string using custom Base64 alphabet
</code_context>

<issue_to_address>
**issue (bug_risk):** The implementation doesn’t match the type annotation and will break for `bytes` and generic `Iterable[int]` inputs.

The body assumes all non-`bytearray` inputs have `.encode`, which breaks for `bytes` (which should be used as-is) and other `Iterable[int]` types (e.g. `list[int]`) that don’t have `.encode`. Either:
- Handle each case explicitly (`bytes`/`bytearray` → pass through; `str` → `.encode('utf-8')`; `Iterable[int]` → `bytes(...)`/`bytearray(...)`), or
- Restrict the type annotation to the actually supported input types.
</issue_to_address>

### Comment 2
<location> `src/xhshow/utils/encoder.py:44` </location>
<code_context>
         """
-        data_bytes = data_to_encode.encode("utf-8")
+        data_bytes = data_to_encode
+        if not isinstance(data_to_encode, bytearray):
+            data_bytes = data_to_encode.encode("utf-8")
         standard_encoded_bytes = base64.b64encode(data_bytes)
</code_context>

<issue_to_address>
**issue (bug_risk):** The `isinstance` check should likely include `bytes`, not just `bytearray`.

With the current logic, passing `bytes` will hit the `else` branch and call `data_to_encode.encode('utf-8')`, which will fail. If `bytes` and `bytearray` should both be treated as already-encoded, update the condition to something like `if not isinstance(data_to_encode, (bytes, bytearray)):` and then explicitly handle `str` and any other supported types accordingly.
</issue_to_address>

Sourcery 对开源项目是免费的——如果你觉得我们的评审有帮助，欢迎分享 ✨

_{帮我变得更有用！请对每条评论点 👍 或 👎，我会根据你的反馈改进后续的评审。}

Original comment in English

Hey there - I've reviewed your changes and found some issues that need to be addressed.

The updated encode signature advertises support for bytes | str | Iterable[int], but the implementation calls .encode() on any non-bytearray input, which will fail for general iterables like list[int] and is odd for bytes; consider either narrowing the type annotation to str | bytes | bytearray and checking isinstance(data_to_encode, (bytes, bytearray)), or explicitly converting an Iterable[int] to bytes/bytearray before encoding.
The current if not isinstance(data_to_encode, bytearray) branch will treat bytes as text and re-encode them, which can corrupt already-encoded byte sequences; update the condition to detect bytes/bytearray (e.g. if isinstance(data_to_encode, (bytes, bytearray))) and only call .encode('utf-8') for true str inputs.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The updated `encode` signature advertises support for `bytes | str | Iterable[int]`, but the implementation calls `.encode()` on any non-`bytearray` input, which will fail for general iterables like `list[int]` and is odd for `bytes`; consider either narrowing the type annotation to `str | bytes | bytearray` and checking `isinstance(data_to_encode, (bytes, bytearray))`, or explicitly converting an `Iterable[int]` to `bytes/bytearray` before encoding.
- The current `if not isinstance(data_to_encode, bytearray)` branch will treat `bytes` as text and re-encode them, which can corrupt already-encoded byte sequences; update the condition to detect `bytes`/`bytearray` (e.g. `if isinstance(data_to_encode, (bytes, bytearray))`) and only call `.encode('utf-8')` for true `str` inputs.

## Individual Comments

### Comment 1
<location> `src/xhshow/utils/encoder.py:33-36` </location>
<code_context>
         )

-    def encode(self, data_to_encode: str) -> str:
+    def encode(self, data_to_encode: bytes | str | Iterable[int]) -> str:
         """
         Encode a string using custom Base64 alphabet
</code_context>

<issue_to_address>
**issue (bug_risk):** The implementation doesn’t match the type annotation and will break for `bytes` and generic `Iterable[int]` inputs.

The body assumes all non-`bytearray` inputs have `.encode`, which breaks for `bytes` (which should be used as-is) and other `Iterable[int]` types (e.g. `list[int]`) that don’t have `.encode`. Either:
- Handle each case explicitly (`bytes`/`bytearray` → pass through; `str` → `.encode('utf-8')`; `Iterable[int]` → `bytes(...)`/`bytearray(...)`), or
- Restrict the type annotation to the actually supported input types.
</issue_to_address>

### Comment 2
<location> `src/xhshow/utils/encoder.py:44` </location>
<code_context>
         """
-        data_bytes = data_to_encode.encode("utf-8")
+        data_bytes = data_to_encode
+        if not isinstance(data_to_encode, bytearray):
+            data_bytes = data_to_encode.encode("utf-8")
         standard_encoded_bytes = base64.b64encode(data_bytes)
</code_context>

<issue_to_address>
**issue (bug_risk):** The `isinstance` check should likely include `bytes`, not just `bytearray`.

With the current logic, passing `bytes` will hit the `else` branch and call `data_to_encode.encode('utf-8')`, which will fail. If `bytes` and `bytearray` should both be treated as already-encoded, update the condition to something like `if not isinstance(data_to_encode, (bytes, bytearray)):` and then explicitly handle `str` and any other supported types accordingly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

src/xhshow/utils/encoder.py

Encode b1 from raw bytes (bytearray) instead of string/text to avoid …

aa2f1a2

…byte corruption during encoding. This restores parity with browser/VM output: the base64 result must start with I38r…. Prevents downstream requests from being rejected due to an invalid b1 value.

Cloxl mentioned this pull request Dec 12, 2025

Feat/fp #74

Closed

sourcery-ai bot reviewed Dec 12, 2025

View reviewed changes

src/xhshow/utils/encoder.py Show resolved Hide resolved

src/xhshow/utils/encoder.py Outdated Show resolved Hide resolved

fix(encoder): improve type handling and fix type checking errors

1304007

Cloxl merged commit 89a1e54 into master Dec 12, 2025
4 checks passed

Cloxl mentioned this pull request Dec 12, 2025

Feat: xsc、x-b3-traceid #70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Encode b1 from raw bytes to avoid byte corruption #75

Fix: Encode b1 from raw bytes to avoid byte corruption #75

Uh oh!

Cloxl commented Dec 12, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Dec 12, 2025 •

edited

Loading

审查者指南

原始字节版 b1 指纹编码的时序图

更新后的 Base64Encoder 与指纹 b1 生成的类图

文件级变更

与 Sourcery 交互

自定义你的使用体验

获取帮助

Reviewer's Guide

Sequence diagram for raw-byte b1 fingerprint encoding

Class diagram for updated Base64Encoder and fingerprint b1 generation

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: Encode b1 from raw bytes to avoid byte corruption #75

Fix: Encode b1 from raw bytes to avoid byte corruption #75

Uh oh!

Conversation

Cloxl commented Dec 12, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

审查者指南

原始字节版 b1 指纹编码的时序图

更新后的 Base64Encoder 与指纹 b1 生成的类图

文件级变更

与 Sourcery 交互

自定义你的使用体验

获取帮助

Reviewer's Guide

Sequence diagram for raw-byte b1 fingerprint encoding

Class diagram for updated Base64Encoder and fingerprint b1 generation

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Cloxl commented Dec 12, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Dec 12, 2025 •

edited

Loading