Skip to content

Conversation

@Cloxl
Copy link
Owner

@Cloxl Cloxl commented Dec 12, 2025

Cherry-picked from PR #74. Only includes the b1 encoding fix commit.

Summary by Sourcery

直接从原始字节而不是 JSON 字符串对指纹元数据进行编码,以防止在生成 b1 时发生 Base64 损坏。

Bug 修复:

  • 通过直接对底层字节数组进行编码,而不是对其 JSON 字符串表示进行编码,修复 b1 指纹生成问题,从而避免字节损坏。

增强功能:

  • 扩展 Base64 编码器接口,使其在接受字符串的基础上,也能接受字节和类字节(byte-like)输入。
Original summary in English

Summary by Sourcery

Encode fingerprint metadata directly from raw bytes instead of JSON strings to prevent Base64 corruption in b1 generation.

Bug Fixes:

  • Fix b1 fingerprint generation by encoding the underlying byte array directly rather than its JSON string representation, avoiding byte corruption.

Enhancements:

  • Extend the Base64 encoder interface to accept bytes and byte-like inputs in addition to strings.

…byte corruption during encoding.

This restores parity with browser/VM output: the base64 result must start with I38r….

Prevents downstream requests from being rejected due to an invalid b1 value.
@sourcery-ai
Copy link

sourcery-ai bot commented Dec 12, 2025

审查者指南(在小型 PR 上折叠)

审查者指南

调整 Base64 编码逻辑,使其直接对原始字节序列进行操作,并在生成指纹 b1 时传递字节而非 JSON 文本,以避免 UTF‑8 字符串编码带来的字节损坏问题。

原始字节版 b1 指纹编码的时序图

sequenceDiagram
    participant FingerprintGenerator
    participant Base64Encoder
    participant base64

    FingerprintGenerator->>FingerprintGenerator: generate_b1(fp: dict)
    FingerprintGenerator->>FingerprintGenerator: build list b from fp
    FingerprintGenerator->>FingerprintGenerator: b1_bytes = bytearray(b)
    FingerprintGenerator->>Base64Encoder: encode(b1_bytes)
    activate Base64Encoder
    Base64Encoder->>Base64Encoder: data_bytes = data_to_encode
    Base64Encoder->>base64: b64encode(data_bytes)
    base64-->>Base64Encoder: standard_encoded_bytes
    Base64Encoder->>Base64Encoder: standard_encoded_string = decode("utf-8")
    Base64Encoder-->>FingerprintGenerator: b1 (custom alphabet Base64 string)
    deactivate Base64Encoder
    FingerprintGenerator-->>FingerprintGenerator: return b1
Loading

更新后的 Base64Encoder 与指纹 b1 生成的类图

classDiagram
    class CryptoConfig {
        <<config>>
        STANDARD_BASE64_ALPHABET: str
        CUSTOM_BASE64_ALPHABET: str
    }

    class Base64Encoder {
        -config: CryptoConfig
        -translation_table: dict
        +Base64Encoder(config: CryptoConfig)
        +encode(data_to_encode: bytes | str | Iterable~int~): str
    }

    class FingerprintGenerator {
        -_encoder: Base64Encoder
        +generate_b1(fp: dict): str
    }

    CryptoConfig <.. Base64Encoder : uses
    Base64Encoder <.. FingerprintGenerator : used by
Loading

文件级变更

变更 详情 文件
扩展 Base64Encoder.encode 的输入类型以接受原始字节/可迭代对象,并在处理类字节输入时跳过 UTF-8 重新编码。
  • 更新 encode 函数签名,以接受 bytes、str 或 Iterable[int]。
  • 修改内部逻辑,使输入默认被视为“已是字节”,仅在输入不是 bytearray 时调用 .encode('utf-8')。
  • 在标准 base64 编码之后,保留现有的自定义字母表重新映射逻辑不变。
src/xhshow/utils/encoder.py
从原始字节而非 JSON 序列化文本生成 b1 指纹载荷,以防止数据损坏。
  • 将 b1 生成逻辑改为使用 bytearray(b) 调用编码器,而不是 json.dumps(b, separators=(",", ":"))。
  • 保留上游构建指纹数据字节值列表的逻辑不变。
src/xhshow/generators/fingerprint.py

技巧与命令

与 Sourcery 交互

  • 触发新的审查: 在 pull request 中评论 @sourcery-ai review
  • 继续讨论: 直接回复 Sourcery 的审查评论。
  • 从审查评论生成 GitHub Issue: 在审查评论下回复,请求 Sourcery 从该评论创建一个 issue。你也可以在审查评论中回复 @sourcery-ai issue 来创建 issue。
  • 生成 pull request 标题: 在 pull request 标题的任意位置写上 @sourcery-ai,即可随时生成标题。你也可以在 pull request 中评论 @sourcery-ai title 来(重新)生成标题。
  • 生成 pull request 摘要: 在 pull request 正文的任意位置写上 @sourcery-ai summary,即可在你想要的位置生成 PR 摘要。你也可以在 pull request 中评论 @sourcery-ai summary 来随时(重新)生成摘要。
  • 生成审查者指南: 在 pull request 中评论 @sourcery-ai guide,即可随时(重新)生成审查者指南。
  • 一次性解决所有 Sourcery 评论: 在 pull request 中评论 @sourcery-ai resolve,将标记所有 Sourcery 评论为已解决。如果你已经处理了所有评论且不再希望看到它们,这会很有用。
  • 忽略所有 Sourcery 审查: 在 pull request 中评论 @sourcery-ai dismiss,以忽略所有现有的 Sourcery 审查。特别适合在你想“清零”并重新开始新的审查时使用——别忘了再评论 @sourcery-ai review 来触发新的审查!

自定义你的使用体验

访问你的控制面板以:

  • 启用或禁用诸如 Sourcery 生成的 pull request 摘要、审查者指南等审查功能。
  • 更改审查语言。
  • 添加、删除或编辑自定义审查指令。
  • 调整其他审查设置。

获取帮助

Original review guide in English
Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adjusts Base64 encoding to operate directly on raw byte sequences and updates fingerprint b1 generation to pass bytes instead of JSON text, in order to avoid byte corruption from UTF‑8 string encoding.

Sequence diagram for raw-byte b1 fingerprint encoding

sequenceDiagram
    participant FingerprintGenerator
    participant Base64Encoder
    participant base64

    FingerprintGenerator->>FingerprintGenerator: generate_b1(fp: dict)
    FingerprintGenerator->>FingerprintGenerator: build list b from fp
    FingerprintGenerator->>FingerprintGenerator: b1_bytes = bytearray(b)
    FingerprintGenerator->>Base64Encoder: encode(b1_bytes)
    activate Base64Encoder
    Base64Encoder->>Base64Encoder: data_bytes = data_to_encode
    Base64Encoder->>base64: b64encode(data_bytes)
    base64-->>Base64Encoder: standard_encoded_bytes
    Base64Encoder->>Base64Encoder: standard_encoded_string = decode("utf-8")
    Base64Encoder-->>FingerprintGenerator: b1 (custom alphabet Base64 string)
    deactivate Base64Encoder
    FingerprintGenerator-->>FingerprintGenerator: return b1
Loading

Class diagram for updated Base64Encoder and fingerprint b1 generation

classDiagram
    class CryptoConfig {
        <<config>>
        STANDARD_BASE64_ALPHABET: str
        CUSTOM_BASE64_ALPHABET: str
    }

    class Base64Encoder {
        -config: CryptoConfig
        -translation_table: dict
        +Base64Encoder(config: CryptoConfig)
        +encode(data_to_encode: bytes | str | Iterable~int~): str
    }

    class FingerprintGenerator {
        -_encoder: Base64Encoder
        +generate_b1(fp: dict): str
    }

    CryptoConfig <.. Base64Encoder : uses
    Base64Encoder <.. FingerprintGenerator : used by
Loading

File-Level Changes

Change Details Files
Broaden Base64Encoder.encode input types to accept raw bytes/iterables and skip UTF-8 re-encoding for byte-like inputs.
  • Updated encode signature to accept bytes, str, or an Iterable[int].
  • Changed internal logic to treat the input as already-bytes by default and only call .encode('utf-8') when the input is not a bytearray.
  • Left the existing custom alphabet remapping logic intact after the standard base64 encoding.
src/xhshow/utils/encoder.py
Generate b1 fingerprint payload from raw bytes rather than JSON-serialized text to prevent corruption.
  • Switched b1 generation to call the encoder with bytearray(b) instead of json.dumps(b, separators=(",", ":")).
  • Kept the upstream logic that builds the list of byte values from the fingerprint data unchanged.
src/xhshow/generators/fingerprint.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@Cloxl Cloxl mentioned this pull request Dec 12, 2025
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - 我已经查看了你的更改,发现有一些需要解决的问题。

  • 更新后的 encode 方法签名声明支持 bytes | str | Iterable[int],但实现中会对所有非 bytearray 的输入调用 .encode(),这在处理像 list[int] 这样的通用可迭代对象时会失败,对 bytes 来说也比较奇怪;建议要么将类型标注收窄为 str | bytes | bytearray 并检查 isinstance(data_to_encode, (bytes, bytearray)),要么在编码前显式地把 Iterable[int] 转为 bytes/bytearray
  • 目前的 if not isinstance(data_to_encode, bytearray) 分支会把 bytes 当作文本重新编码,这可能会破坏已经编码好的字节序列;请更新条件以检测 bytes/bytearray(例如 if isinstance(data_to_encode, (bytes, bytearray))),并且只对真正的 str 输入调用 .encode('utf-8')
供 AI 代理使用的提示(Prompt)
Please address the comments from this code review:

## Overall Comments
- The updated `encode` signature advertises support for `bytes | str | Iterable[int]`, but the implementation calls `.encode()` on any non-`bytearray` input, which will fail for general iterables like `list[int]` and is odd for `bytes`; consider either narrowing the type annotation to `str | bytes | bytearray` and checking `isinstance(data_to_encode, (bytes, bytearray))`, or explicitly converting an `Iterable[int]` to `bytes/bytearray` before encoding.
- The current `if not isinstance(data_to_encode, bytearray)` branch will treat `bytes` as text and re-encode them, which can corrupt already-encoded byte sequences; update the condition to detect `bytes`/`bytearray` (e.g. `if isinstance(data_to_encode, (bytes, bytearray))`) and only call `.encode('utf-8')` for true `str` inputs.

## Individual Comments

### Comment 1
<location> `src/xhshow/utils/encoder.py:33-36` </location>
<code_context>
         )

-    def encode(self, data_to_encode: str) -> str:
+    def encode(self, data_to_encode: bytes | str | Iterable[int]) -> str:
         """
         Encode a string using custom Base64 alphabet
</code_context>

<issue_to_address>
**issue (bug_risk):** The implementation doesn’t match the type annotation and will break for `bytes` and generic `Iterable[int]` inputs.

The body assumes all non-`bytearray` inputs have `.encode`, which breaks for `bytes` (which should be used as-is) and other `Iterable[int]` types (e.g. `list[int]`) that don’t have `.encode`. Either:
- Handle each case explicitly (`bytes`/`bytearray` → pass through; `str``.encode('utf-8')`; `Iterable[int]``bytes(...)`/`bytearray(...)`), or
- Restrict the type annotation to the actually supported input types.
</issue_to_address>

### Comment 2
<location> `src/xhshow/utils/encoder.py:44` </location>
<code_context>
         """
-        data_bytes = data_to_encode.encode("utf-8")
+        data_bytes = data_to_encode
+        if not isinstance(data_to_encode, bytearray):
+            data_bytes = data_to_encode.encode("utf-8")
         standard_encoded_bytes = base64.b64encode(data_bytes)
</code_context>

<issue_to_address>
**issue (bug_risk):** The `isinstance` check should likely include `bytes`, not just `bytearray`.

With the current logic, passing `bytes` will hit the `else` branch and call `data_to_encode.encode('utf-8')`, which will fail. If `bytes` and `bytearray` should both be treated as already-encoded, update the condition to something like `if not isinstance(data_to_encode, (bytes, bytearray)):` and then explicitly handle `str` and any other supported types accordingly.
</issue_to_address>

Sourcery 对开源项目是免费的——如果你觉得我们的评审有帮助,欢迎分享 ✨
帮我变得更有用!请对每条评论点 👍 或 👎,我会根据你的反馈改进后续的评审。
Original comment in English

Hey there - I've reviewed your changes and found some issues that need to be addressed.

  • The updated encode signature advertises support for bytes | str | Iterable[int], but the implementation calls .encode() on any non-bytearray input, which will fail for general iterables like list[int] and is odd for bytes; consider either narrowing the type annotation to str | bytes | bytearray and checking isinstance(data_to_encode, (bytes, bytearray)), or explicitly converting an Iterable[int] to bytes/bytearray before encoding.
  • The current if not isinstance(data_to_encode, bytearray) branch will treat bytes as text and re-encode them, which can corrupt already-encoded byte sequences; update the condition to detect bytes/bytearray (e.g. if isinstance(data_to_encode, (bytes, bytearray))) and only call .encode('utf-8') for true str inputs.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The updated `encode` signature advertises support for `bytes | str | Iterable[int]`, but the implementation calls `.encode()` on any non-`bytearray` input, which will fail for general iterables like `list[int]` and is odd for `bytes`; consider either narrowing the type annotation to `str | bytes | bytearray` and checking `isinstance(data_to_encode, (bytes, bytearray))`, or explicitly converting an `Iterable[int]` to `bytes/bytearray` before encoding.
- The current `if not isinstance(data_to_encode, bytearray)` branch will treat `bytes` as text and re-encode them, which can corrupt already-encoded byte sequences; update the condition to detect `bytes`/`bytearray` (e.g. `if isinstance(data_to_encode, (bytes, bytearray))`) and only call `.encode('utf-8')` for true `str` inputs.

## Individual Comments

### Comment 1
<location> `src/xhshow/utils/encoder.py:33-36` </location>
<code_context>
         )

-    def encode(self, data_to_encode: str) -> str:
+    def encode(self, data_to_encode: bytes | str | Iterable[int]) -> str:
         """
         Encode a string using custom Base64 alphabet
</code_context>

<issue_to_address>
**issue (bug_risk):** The implementation doesn’t match the type annotation and will break for `bytes` and generic `Iterable[int]` inputs.

The body assumes all non-`bytearray` inputs have `.encode`, which breaks for `bytes` (which should be used as-is) and other `Iterable[int]` types (e.g. `list[int]`) that don’t have `.encode`. Either:
- Handle each case explicitly (`bytes`/`bytearray` → pass through; `str``.encode('utf-8')`; `Iterable[int]``bytes(...)`/`bytearray(...)`), or
- Restrict the type annotation to the actually supported input types.
</issue_to_address>

### Comment 2
<location> `src/xhshow/utils/encoder.py:44` </location>
<code_context>
         """
-        data_bytes = data_to_encode.encode("utf-8")
+        data_bytes = data_to_encode
+        if not isinstance(data_to_encode, bytearray):
+            data_bytes = data_to_encode.encode("utf-8")
         standard_encoded_bytes = base64.b64encode(data_bytes)
</code_context>

<issue_to_address>
**issue (bug_risk):** The `isinstance` check should likely include `bytes`, not just `bytearray`.

With the current logic, passing `bytes` will hit the `else` branch and call `data_to_encode.encode('utf-8')`, which will fail. If `bytes` and `bytearray` should both be treated as already-encoded, update the condition to something like `if not isinstance(data_to_encode, (bytes, bytearray)):` and then explicitly handle `str` and any other supported types accordingly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@Cloxl Cloxl merged commit 89a1e54 into master Dec 12, 2025
4 checks passed
@Cloxl Cloxl mentioned this pull request Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants