Skip to content

Conversation

@anonymou0719
Copy link

Summary

This PR syncs a commit ported from an upstream fork into this repo to fix locale-dependent JSON float parsing.

Original commit

  • URL: http://github.com/asg017/sqlite-vec
  • What it does: Replaces locale-sensitive strtod() usage when parsing JSON numeric tokens with a locale-independent strtod_c() implementation, ensuring JSON floats are always parsed using . as the decimal separator (per JSON spec).

Why it fits this repo’s guidelines/policies

  • Correctness & spec compliance: JSON numbers must use . as the decimal separator; parsing should not vary based on LC_NUMERIC.
  • Portability & minimal dependencies: Avoids reliance on strtod_l() / global locale switching, which can be inconsistent across platforms and tricky for thread-safety.
  • Low-risk change: Localized to JSON numeric parsing, with a small self-contained implementation and regression coverage.

Notes (if any adaptations):

  • Kept the parser implementation small and deterministic to match repo portability expectations; added a targeted regression test to prevent future locale regressions.

Why it’s useful here

  • Benefit: Bug fix for JSON numeric parsing under non-C locales (e.g. French/German).
  • Impact: Prevents incorrect float values / JSON parse failures in environments where comma is the decimal separator, improving user experience and reliability across locales.

Testing

  • Existing tests: ✅ uv sync --directory tests (pass)
  • Existing tests: ✅ make test-loadable python=./tests/.venv/bin/python (pass)
  • CI link (if any): N/A
  • Extra tests (if any): Added a regression test that forces a non-C numeric locale and verifies JSON float parsing remains correct.

strtod() respects LC_NUMERIC locale, causing JSON parsing to fail in
non-C locales (French, German, etc.) where comma is the decimal separator.

Implemented custom locale-independent strtod_c() parser:
- Always uses '.' as decimal separator per JSON spec
- Handles sign, integer, fractional, and exponent parts
- No platform dependencies or thread-safety issues
- Simple and portable (~87 lines)

Added test_vec0_locale_independent() to verify parsing works under
non-C locales. All tests pass (73 passed, 4 skipped).

Fixes asg017#241 and asg017#168
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants