Skip to content

perf(i18n): parallelism + OpenAPI fix#43

Open
01programs wants to merge 16 commits intomainfrom
fix/i18n-code-fence-and-nav-labels
Open

perf(i18n): parallelism + OpenAPI fix#43
01programs wants to merge 16 commits intomainfrom
fix/i18n-code-fence-and-nav-labels

Conversation

@01programs
Copy link
Copy Markdown
Collaborator

Summary

  • Increase MDX translation parallelism: 5 → 30 files concurrent
  • Fix invalid JSON in crawls.json (trailing comma)
  • Parallelize OpenAPI translation: 5 files × 7 languages
  • Single-line per-file logging for cleaner output
  • Async file writes

Log output format

[api-reference/scrapes/create.mdx] es✓ nl✓ fr✓ de✓ zh✓ it✓ ja✓
[batches.json] (75 strings) es✓ nl✓ fr✓ de✓ zh✓ it✓ ja✓

Frederic Schulz and others added 16 commits March 17, 2026 13:44
- Add GitHub Actions workflow for auto-translating docs on push to main
- Translate to 6 languages: Spanish, Dutch, French, Chinese, Italian, Japanese
- Two-phase translation: catch-up (missing) + incremental (changed)
- Deploy translations to prod branch for Mintlify
- Zero-tolerance error policy with retry logic for transient failures
- Validate OpenAI API key before starting translations
feat(i18n): Add automated translation workflow
- Catch-up phase: continue on errors, failed files picked up next run
- Incremental phase: still fails on errors (user-changed files must succeed)
- Add 400 status as retryable (OpenAI transient JSON parse issues)
- setup: checkout, install deps, detect changed files
- translate-catchup: Phase 1 - translate missing pages
- translate-incremental: Phase 2 - translate changed files
- deploy: push to prod branch

Each phase shows as separate box in GitHub Actions UI.
Uses artifacts to pass translations between jobs.
Two critical fixes for translation quality:

1. MDX code fence issue: LLM was wrapping translations in ```mdx blocks
   - Updated prompt to explicitly forbid code fences
   - Added post-processing to strip fences if they appear
   - Added validation that output starts with frontmatter

2. Navigation labels: Sidebar group/tab names were staying in English
   - Updated update-docs-config.ts to translate labels via OpenAI
   - Added translation cache to avoid redundant API calls

Both issues affected 63% of translated files (417/660).
Re-run workflow to regenerate all translations.
Translates OpenAPI specification files (descriptions, summaries, titles)
to all target languages while preserving technical fields.

Changes:
- Add translate-openapi.ts: Programmatically extracts translatable fields,
  batch translates them, preserves all technical fields exactly
- Update workflow: Add translate-openapi job (runs parallel to MDX translation)
- Update update-docs-config.ts: Add language-specific OpenAPI to API tabs

This ensures API field descriptions (the gray text) are translated
while property names, types, and technical values stay in English.
The explicit keepEnglish/brandTerms lists were unnecessary - GPT-4o
naturally recognizes technical terms and preserves them. The prompt
now provides structural guidance (what categories to preserve) rather
than brittle hardcoded lists.

Removed:
- glossary.json
- Glossary interface and loading code
- Explicit term lists in prompt

The prompt now describes WHAT to preserve (code blocks, inline code,
URLs, technical acronyms) rather than listing specific terms.
- PARALLEL_FILES: 5 → 30 (up to 210 concurrent OpenAI calls)
- Async file writes using fs.promises.writeFile
- Single-line per-file logging: [file.mdx] es✓ nl✓ fr✓ de✓ zh✓ it✓ ja✓
- No interleaved logs - all results collected before logging
- Fix trailing comma in openapi/crawls.json causing JSON parse error
- Parallelize OpenAPI translation: 5 files × 7 languages concurrently
- Async file writes using fs.promises.writeFile
- Single-line per-file logging: [file.json] (N strings) es✓ nl✓ fr✓...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant