fix(provider): handle UTF-8 BOM in JSON and CSV parsing by oritwoen · Pull Request #24 · oritwoen/vusi

oritwoen · 2026-03-11T14:15:48Z

Summary

I hit this while validating fixture ingestion from exported files. detect_format already tolerated UTF-8 BOM, but actual parsing still used raw content, so BOM-prefixed input could be detected correctly and then fail in serde_json or CSV deserialization.

Changes

Strip UTF-8 BOM at the start of parse_signatures before dispatching to JSON/CSV parsers
Add a regression test for BOM-prefixed JSON input
Add a regression test for BOM-prefixed CSV input
Add direct detect_format tests for BOM-prefixed JSON and CSV input

Testing

cargo test --all-features
cargo clippy --all-features -- -D warnings
cargo build --release

Related Issues

Closes #25

codeant-ai · 2026-03-11T14:15:53Z

CodeAnt AI is reviewing your PR.

coderabbitai · 2026-03-11T14:16:10Z

📝 Walkthrough

Walkthrough

parse_signatures now strips UTF-8 BOM prefixes before format detection to prevent JSON/CSV parsing failures. Two new tests validate BOM handling for both formats while existing behavior for non-BOM inputs remains unchanged.

Changes

Cohort / File(s)	Summary
BOM handling in signature parsing `src/provider.rs`	Added BOM stripping logic to `parse_signatures` before format detection. Includes test coverage for JSON and CSV inputs with UTF-8 BOM prefixes.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

A tiny mark at the start of the line,
Breaks JSON parsing—oh what a sign!
Strip it away with a whisper so clean,
Now signatures flow where BOM's never been. 🎯

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title accurately describes the main change: fixing UTF-8 BOM handling in JSON and CSV parsing within the provider module.
Description check	✅ Passed	PR description covers all required template sections with clear details about the BOM handling fix, specific changes made, and comprehensive testing evidence.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/provider-bom-parsing

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch fix/provider-bom-parsing

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

augmentcode · 2026-03-11T14:16:35Z

🤖 Augment PR Summary

Summary: Ensures signature parsing correctly handles UTF-8 BOM-prefixed inputs.
Changes: Strips a leading BOM before format detection/deserialization and adds regression tests for BOM-prefixed JSON and CSV fixtures.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

codeant-ai · 2026-03-11T14:17:10Z

Sequence Diagram

This PR updates the signature loading flow to strip a UTF-8 BOM before format detection and parsing so that JSON and CSV inputs with a BOM are parsed correctly.

sequenceDiagram
    participant Caller
    participant Provider

    Caller->>Provider: load signatures from input
    Provider->>Provider: Read file or stdin into string
    Provider->>Provider: Strip UTF-8 BOM prefix if present
    Provider->>Provider: Detect format as JSON or CSV

    alt JSON format
        Provider->>Provider: Parse JSON and normalize inputs
    else CSV format
        Provider->>Provider: Parse CSV and normalize inputs
    end

    Provider-->>Caller: Return parsed signatures list

Generated by CodeAnt AI

cubic-dev-ai

No issues found across 1 file

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

_{Auto-approved: Trivial fix to handle UTF-8 BOM in input parsing with regression tests.}

Architecture diagram

sequenceDiagram
    participant Client
    participant P as Provider (parse_signatures)
    participant D as detect_format()
    participant Parser as JSON/CSV Parsers

    Note over Client,Parser: Signature Ingestion Flow

    Client->>P: parse_signatures(raw_content)
    
    P->>P: NEW: strip_prefix(BOM)
    Note right of P: Removes \u{FEFF} if present at start
    
    P->>D: detect_format(stripped_content)
    D-->>P: returns Format (Json | Csv)

    alt Format::Json
        P->>Parser: CHANGED: parse_json(stripped_content)
        Parser-->>P: Vec<Signature>
    else Format::Csv
        P->>Parser: CHANGED: parse_csv(stripped_content)
        Parser-->>P: Vec<Signature>
    end

    alt Success
        P-->>Client: Ok(signatures)
    else Parse Error (e.g. Malformed)
        P-->>Client: Err(Error)
    end

codeant-ai · 2026-03-11T14:18:34Z

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review Behavior consistency for CSV reader CSV parsing uses `csv::Reader::from_reader(content.as_bytes())` after BOM removal. Ensure BOM removal always occurs before any direct CSV parsing path (including callers that may call `parse_csv` directly) to avoid headers containing a leading BOM. Consider documenting or centralizing this expectation. Test gap: format detection with BOM New tests assert that `parse_signatures` accepts BOM-prefixed inputs, but there is no direct assertion that `detect_format` correctly classifies BOM-prefixed JSON/CSV inputs. Add tests for `detect_format` with BOM input to prevent regressions. Redundant BOM handling The BOM is now stripped in `parse_signatures`, but `detect_format` also strips the BOM when inspecting content. This duplication is harmless but confusing and may lead to subtle inconsistencies. Consider centralizing BOM removal so detection and parsing share a single behavior.

codeant-ai · 2026-03-11T14:18:37Z

CodeAnt AI finished reviewing your PR.

fix(provider): strip UTF-8 BOM before parsing signatures

9af1d78

augmentcode bot reviewed Mar 11, 2026

View reviewed changes

cubic-dev-ai bot approved these changes Mar 11, 2026

View reviewed changes

coderabbitai bot approved these changes Mar 11, 2026

View reviewed changes

test(provider): cover BOM format detection paths

483c0d5

oritwoen self-assigned this Mar 11, 2026

oritwoen merged commit 4197b9b into main Mar 11, 2026
4 checks passed

oritwoen deleted the fix/provider-bom-parsing branch March 11, 2026 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(provider): handle UTF-8 BOM in JSON and CSV parsing#24

fix(provider): handle UTF-8 BOM in JSON and CSV parsing#24
oritwoen merged 2 commits intomainfrom
fix/provider-bom-parsing

oritwoen commented Mar 11, 2026 •

edited

Loading

Uh oh!

codeant-ai bot commented Mar 11, 2026

Uh oh!

coderabbitai bot commented Mar 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

augmentcode bot commented Mar 11, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

codeant-ai bot commented Mar 11, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

codeant-ai bot commented Mar 11, 2026

Uh oh!

codeant-ai bot commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oritwoen commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Related Issues

Uh oh!

codeant-ai bot commented Mar 11, 2026

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

augmentcode bot commented Mar 11, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

codeant-ai bot commented Mar 11, 2026

Sequence Diagram

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codeant-ai bot commented Mar 11, 2026

Nitpicks 🔍

Uh oh!

codeant-ai bot commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

oritwoen commented Mar 11, 2026 •

edited

Loading

coderabbitai bot commented Mar 11, 2026 •

edited

Loading