Skip to content

Conversation

@kargig
Copy link
Owner

@kargig kargig commented Jan 1, 2026

Summary

This PR optimizes backend performance by replacing the standard Python json library with orjson in critical paths (large API responses and heavy serialization tasks). orjson is significantly faster and handles native types like datetime and numpy arrays automatically, removing the need for manual serialization logic. Additionally, this PR fixes a critical bug in eventbridge_service.py where a variable was undefined, and refactors validation schemas for better memory usage.

Changes Made

Performance Optimizations

  • Replaced json with orjson: Switched to orjson for JSON serialization/deserialization in the following key areas:
    • Newsletters Router (backend/app/routers/newsletters.py): Removed manual recursive datetime serialization logic in get_parsed_trips. orjson now handles the serialization of nested ParsedDiveTrip models natively, significantly reducing CPU overhead for large lists.
    • Users Router (backend/app/routers/users.py): Refactored list_all_users to bypass FastAPI's slower jsonable_encoder and JSONResponse. It now directly dumps Pydantic models to JSON bytes using orjson and returns a raw Response, which is much faster for large datasets.
    • Privacy Router (backend/app/routers/privacy.py): Refactored export_user_data to remove manual .isoformat() calls on datetime objects. The raw dictionary with native datetime objects is now serialized directly by orjson, simplifying the code and improving export speed.
    • Settings Router (backend/app/routers/settings.py): Updated to use orjson for storing and retrieving JSON blobs.
    • Dive Sites & Centers (backend/app/routers/dive_sites.py, backend/app/routers/diving_centers.py): Updated JSON dumping for geolocation and logging to use orjson.
  • Schema Refactoring: In backend/app/schemas.py, moved constant lists (e.g., ALLOWED_SOCIAL_PLATFORMS, SOCIAL_PLATFORM_DOMAINS) to the module level. This prevents them from being re-allocated on every single validation call, reducing memory churn.

Bug Fixes

  • EventBridge Service (backend/app/services/eventbridge_service.py): Fixed a critical NameError where the targets variable was referenced in target_params but was undefined in the create_scheduled_rule method. The variable is now correctly defined before use.

Documentation

  • Updated GEMINI.md: Added a new "Performance Tuning Guidelines" section detailing the usage of orjson and best practices for memory management in the project.

Testing

  • Automated Tests:
    • Ran the full backend test suite using ./docker-test-github-actions.sh.
    • Result: 1307/1307 tests passed.
  • Specific Verification:
    • Verified orjson's behavior with a standalone script to confirm it correctly serializes mixed date, time, and datetime objects (including None values) into ISO 8601 strings, matching the API's previous output format.
    • Verified that backend/tests/test_users.py and backend/tests/test_privacy.py passed, confirming that the optimized endpoints return the correct structure and data types.

Related Issues

  • None specified.

Additional Notes

  • Portability: The backend/lambda/email_processor.py and utils/import_subsurface_dives.py scripts intentionally retain standard json usage. This is to ensure portability (no C-extension dependencies like orjson needed) for AWS Lambda environments and standalone CLI usage.
  • Dependency: orjson has been added to backend/requirements.txt.

kargig added 2 commits January 1, 2026 19:53
Switch from the standard library `json` module to `orjson` for JSON
serialization and deserialization in performance-critical paths.
`orjson` is significantly faster and handles datetime objects natively,
removing the need for manual serialization helpers.

Refactor `backend/app/schemas.py` to move constant lists and
dictionaries (e.g., `ALLOWED_SOCIAL_PLATFORMS`) to the module level,
preventing re-allocation on every validation call.

Fix a critical bug in `backend/app/services/eventbridge_service.py`
where the `targets` variable was referenced but not defined.

Update `GEMINI.md` with new performance tuning guidelines regarding
JSON processing and memory management.
Refactor `list_all_users` in `backend/app/routers/users.py` to use `orjson`
for JSON serialization, bypassing the slower `jsonable_encoder` and
`JSONResponse` defaults. This improves performance for large user lists.

Refactor `export_user_data` in `backend/app/routers/privacy.py` to use
`orjson`, removing manual `.isoformat()` datetime conversions since
`orjson` handles them natively. This optimizes large data exports and
simplifies the codebase.
@kargig kargig merged commit 73d0b0c into main Jan 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants