-
Notifications
You must be signed in to change notification settings - Fork 169
Fix: Encode b1 from raw bytes to avoid byte corruption #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…byte corruption during encoding. This restores parity with browser/VM output: the base64 result must start with I38r…. Prevents downstream requests from being rejected due to an invalid b1 value.
审查者指南(在小型 PR 上折叠)审查者指南调整 Base64 编码逻辑,使其直接对原始字节序列进行操作,并在生成指纹 b1 时传递字节而非 JSON 文本,以避免 UTF‑8 字符串编码带来的字节损坏问题。 原始字节版 b1 指纹编码的时序图sequenceDiagram
participant FingerprintGenerator
participant Base64Encoder
participant base64
FingerprintGenerator->>FingerprintGenerator: generate_b1(fp: dict)
FingerprintGenerator->>FingerprintGenerator: build list b from fp
FingerprintGenerator->>FingerprintGenerator: b1_bytes = bytearray(b)
FingerprintGenerator->>Base64Encoder: encode(b1_bytes)
activate Base64Encoder
Base64Encoder->>Base64Encoder: data_bytes = data_to_encode
Base64Encoder->>base64: b64encode(data_bytes)
base64-->>Base64Encoder: standard_encoded_bytes
Base64Encoder->>Base64Encoder: standard_encoded_string = decode("utf-8")
Base64Encoder-->>FingerprintGenerator: b1 (custom alphabet Base64 string)
deactivate Base64Encoder
FingerprintGenerator-->>FingerprintGenerator: return b1
更新后的 Base64Encoder 与指纹 b1 生成的类图classDiagram
class CryptoConfig {
<<config>>
STANDARD_BASE64_ALPHABET: str
CUSTOM_BASE64_ALPHABET: str
}
class Base64Encoder {
-config: CryptoConfig
-translation_table: dict
+Base64Encoder(config: CryptoConfig)
+encode(data_to_encode: bytes | str | Iterable~int~): str
}
class FingerprintGenerator {
-_encoder: Base64Encoder
+generate_b1(fp: dict): str
}
CryptoConfig <.. Base64Encoder : uses
Base64Encoder <.. FingerprintGenerator : used by
文件级变更
技巧与命令与 Sourcery 交互
自定义你的使用体验访问你的控制面板以:
获取帮助Original review guide in EnglishReviewer's guide (collapsed on small PRs)Reviewer's GuideAdjusts Base64 encoding to operate directly on raw byte sequences and updates fingerprint b1 generation to pass bytes instead of JSON text, in order to avoid byte corruption from UTF‑8 string encoding. Sequence diagram for raw-byte b1 fingerprint encodingsequenceDiagram
participant FingerprintGenerator
participant Base64Encoder
participant base64
FingerprintGenerator->>FingerprintGenerator: generate_b1(fp: dict)
FingerprintGenerator->>FingerprintGenerator: build list b from fp
FingerprintGenerator->>FingerprintGenerator: b1_bytes = bytearray(b)
FingerprintGenerator->>Base64Encoder: encode(b1_bytes)
activate Base64Encoder
Base64Encoder->>Base64Encoder: data_bytes = data_to_encode
Base64Encoder->>base64: b64encode(data_bytes)
base64-->>Base64Encoder: standard_encoded_bytes
Base64Encoder->>Base64Encoder: standard_encoded_string = decode("utf-8")
Base64Encoder-->>FingerprintGenerator: b1 (custom alphabet Base64 string)
deactivate Base64Encoder
FingerprintGenerator-->>FingerprintGenerator: return b1
Class diagram for updated Base64Encoder and fingerprint b1 generationclassDiagram
class CryptoConfig {
<<config>>
STANDARD_BASE64_ALPHABET: str
CUSTOM_BASE64_ALPHABET: str
}
class Base64Encoder {
-config: CryptoConfig
-translation_table: dict
+Base64Encoder(config: CryptoConfig)
+encode(data_to_encode: bytes | str | Iterable~int~): str
}
class FingerprintGenerator {
-_encoder: Base64Encoder
+generate_b1(fp: dict): str
}
CryptoConfig <.. Base64Encoder : uses
Base64Encoder <.. FingerprintGenerator : used by
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - 我已经查看了你的更改,发现有一些需要解决的问题。
- 更新后的
encode方法签名声明支持bytes | str | Iterable[int],但实现中会对所有非bytearray的输入调用.encode(),这在处理像list[int]这样的通用可迭代对象时会失败,对bytes来说也比较奇怪;建议要么将类型标注收窄为str | bytes | bytearray并检查isinstance(data_to_encode, (bytes, bytearray)),要么在编码前显式地把Iterable[int]转为bytes/bytearray。 - 目前的
if not isinstance(data_to_encode, bytearray)分支会把bytes当作文本重新编码,这可能会破坏已经编码好的字节序列;请更新条件以检测bytes/bytearray(例如if isinstance(data_to_encode, (bytes, bytearray))),并且只对真正的str输入调用.encode('utf-8')。
供 AI 代理使用的提示(Prompt)
Please address the comments from this code review:
## Overall Comments
- The updated `encode` signature advertises support for `bytes | str | Iterable[int]`, but the implementation calls `.encode()` on any non-`bytearray` input, which will fail for general iterables like `list[int]` and is odd for `bytes`; consider either narrowing the type annotation to `str | bytes | bytearray` and checking `isinstance(data_to_encode, (bytes, bytearray))`, or explicitly converting an `Iterable[int]` to `bytes/bytearray` before encoding.
- The current `if not isinstance(data_to_encode, bytearray)` branch will treat `bytes` as text and re-encode them, which can corrupt already-encoded byte sequences; update the condition to detect `bytes`/`bytearray` (e.g. `if isinstance(data_to_encode, (bytes, bytearray))`) and only call `.encode('utf-8')` for true `str` inputs.
## Individual Comments
### Comment 1
<location> `src/xhshow/utils/encoder.py:33-36` </location>
<code_context>
)
- def encode(self, data_to_encode: str) -> str:
+ def encode(self, data_to_encode: bytes | str | Iterable[int]) -> str:
"""
Encode a string using custom Base64 alphabet
</code_context>
<issue_to_address>
**issue (bug_risk):** The implementation doesn’t match the type annotation and will break for `bytes` and generic `Iterable[int]` inputs.
The body assumes all non-`bytearray` inputs have `.encode`, which breaks for `bytes` (which should be used as-is) and other `Iterable[int]` types (e.g. `list[int]`) that don’t have `.encode`. Either:
- Handle each case explicitly (`bytes`/`bytearray` → pass through; `str` → `.encode('utf-8')`; `Iterable[int]` → `bytes(...)`/`bytearray(...)`), or
- Restrict the type annotation to the actually supported input types.
</issue_to_address>
### Comment 2
<location> `src/xhshow/utils/encoder.py:44` </location>
<code_context>
"""
- data_bytes = data_to_encode.encode("utf-8")
+ data_bytes = data_to_encode
+ if not isinstance(data_to_encode, bytearray):
+ data_bytes = data_to_encode.encode("utf-8")
standard_encoded_bytes = base64.b64encode(data_bytes)
</code_context>
<issue_to_address>
**issue (bug_risk):** The `isinstance` check should likely include `bytes`, not just `bytearray`.
With the current logic, passing `bytes` will hit the `else` branch and call `data_to_encode.encode('utf-8')`, which will fail. If `bytes` and `bytearray` should both be treated as already-encoded, update the condition to something like `if not isinstance(data_to_encode, (bytes, bytearray)):` and then explicitly handle `str` and any other supported types accordingly.
</issue_to_address>帮我变得更有用!请对每条评论点 👍 或 👎,我会根据你的反馈改进后续的评审。
Original comment in English
Hey there - I've reviewed your changes and found some issues that need to be addressed.
- The updated
encodesignature advertises support forbytes | str | Iterable[int], but the implementation calls.encode()on any non-bytearrayinput, which will fail for general iterables likelist[int]and is odd forbytes; consider either narrowing the type annotation tostr | bytes | bytearrayand checkingisinstance(data_to_encode, (bytes, bytearray)), or explicitly converting anIterable[int]tobytes/bytearraybefore encoding. - The current
if not isinstance(data_to_encode, bytearray)branch will treatbytesas text and re-encode them, which can corrupt already-encoded byte sequences; update the condition to detectbytes/bytearray(e.g.if isinstance(data_to_encode, (bytes, bytearray))) and only call.encode('utf-8')for truestrinputs.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The updated `encode` signature advertises support for `bytes | str | Iterable[int]`, but the implementation calls `.encode()` on any non-`bytearray` input, which will fail for general iterables like `list[int]` and is odd for `bytes`; consider either narrowing the type annotation to `str | bytes | bytearray` and checking `isinstance(data_to_encode, (bytes, bytearray))`, or explicitly converting an `Iterable[int]` to `bytes/bytearray` before encoding.
- The current `if not isinstance(data_to_encode, bytearray)` branch will treat `bytes` as text and re-encode them, which can corrupt already-encoded byte sequences; update the condition to detect `bytes`/`bytearray` (e.g. `if isinstance(data_to_encode, (bytes, bytearray))`) and only call `.encode('utf-8')` for true `str` inputs.
## Individual Comments
### Comment 1
<location> `src/xhshow/utils/encoder.py:33-36` </location>
<code_context>
)
- def encode(self, data_to_encode: str) -> str:
+ def encode(self, data_to_encode: bytes | str | Iterable[int]) -> str:
"""
Encode a string using custom Base64 alphabet
</code_context>
<issue_to_address>
**issue (bug_risk):** The implementation doesn’t match the type annotation and will break for `bytes` and generic `Iterable[int]` inputs.
The body assumes all non-`bytearray` inputs have `.encode`, which breaks for `bytes` (which should be used as-is) and other `Iterable[int]` types (e.g. `list[int]`) that don’t have `.encode`. Either:
- Handle each case explicitly (`bytes`/`bytearray` → pass through; `str` → `.encode('utf-8')`; `Iterable[int]` → `bytes(...)`/`bytearray(...)`), or
- Restrict the type annotation to the actually supported input types.
</issue_to_address>
### Comment 2
<location> `src/xhshow/utils/encoder.py:44` </location>
<code_context>
"""
- data_bytes = data_to_encode.encode("utf-8")
+ data_bytes = data_to_encode
+ if not isinstance(data_to_encode, bytearray):
+ data_bytes = data_to_encode.encode("utf-8")
standard_encoded_bytes = base64.b64encode(data_bytes)
</code_context>
<issue_to_address>
**issue (bug_risk):** The `isinstance` check should likely include `bytes`, not just `bytearray`.
With the current logic, passing `bytes` will hit the `else` branch and call `data_to_encode.encode('utf-8')`, which will fail. If `bytes` and `bytearray` should both be treated as already-encoded, update the condition to something like `if not isinstance(data_to_encode, (bytes, bytearray)):` and then explicitly handle `str` and any other supported types accordingly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Cherry-picked from PR #74. Only includes the b1 encoding fix commit.
Summary by Sourcery
直接从原始字节而不是 JSON 字符串对指纹元数据进行编码,以防止在生成 b1 时发生 Base64 损坏。
Bug 修复:
增强功能:
Original summary in English
Summary by Sourcery
Encode fingerprint metadata directly from raw bytes instead of JSON strings to prevent Base64 corruption in b1 generation.
Bug Fixes:
Enhancements: