Skip to content

feat(transcription): add offline voice transcription via Whisper#336

Open
Fadhili5 wants to merge 3 commits intofireform-core:mainfrom
Fadhili5:feat/voice-transcription
Open

feat(transcription): add offline voice transcription via Whisper#336
Fadhili5 wants to merge 3 commits intofireform-core:mainfrom
Fadhili5:feat/voice-transcription

Conversation

@Fadhili5
Copy link

Summary

Stacked on #335 and #332

  • Add src/transcriber.py: wraps OpenAI Whisper for fully local, offline audio transcription. Model is lazy-loaded on first use. Size configurable via WHISPER_MODEL env var (tiny/base/small/medium/large, default: base). Supports WAV, MP3, M4A, MP4, OGG, FLAC. No audio data leaves the machine
  • Add POST /transcribe endpoint: accepts multipart audio file upload, returns {text, model_used, audio_filename}. Returns 415 for unsupported formats, 500 for transcription errors
  • Add api/schemas/transcribe.py: TranscribeResponse schema
  • Register /transcribe router in api/main.py
  • Add 17 tests: model size validation, whitespace stripping, missing file, unsupported format, temp file cleanup, endpoint success/error, all 6 supported formats (parametrized)
  • Add openai-whisper and python-multipart to requirements.txt

Test plan

  • python -m pytest tests/test_transcribe.py -v — all 17 tests pass
  • python -m pytest tests/ -v — full suite (19 tests) passes
  • Confirm POST /transcribe returns 415 for .txt upload
  • Confirm temp files are cleaned up after transcribe_bytes()

Fadhili5 and others added 3 commits March 24, 2026 14:36
- Add src/schemas/incident_report.py: canonical Pydantic model covering
  all fields needed across Cal Fire FIRESCOPE, EMS, and law enforcement
  forms (identity, location, timestamps, personnel, casualties, wildfire,
  narrative, law enforcement sections)
- Add model_validator that auto-populates requires_review for any core
  field left null after extraction, so responders can spot gaps before
  PDF submission
- Add llm_schema_hint() classmethod that returns the JSON schema minus
  requires_review, used to build the structured Ollama system prompt
- Refactor LLM class: replace per-field prompt loop with a single
  structured request using Ollama format="json" and Mistral instruction
  format ([INST]...[/INST])
- LLM now returns IncidentReport via get_report() in addition to the
  existing get_data() dict accessor for backward compatibility
- Fix test_submit_form: replace stub with a working integration test
  that creates a template then mocks Controller to assert the full
  POST /forms/fill response shape
- Add src/template_mapper.py: TemplateMapper loads a YAML agency mapping
  file and resolves IncidentReport field values to PDF form field names.
  Supports optional per-field condition expressions
- Add safe AST-based condition evaluator: permits only Compare, BoolOp,
  UnaryOp, Name, Constant nodes — rejects function calls and arbitrary code
- Refactor src/filler.py: replace positional answers_list[i] with explicit
  {pdf_field_name: value} dict so values land in the correct field
  regardless of page layout
- Update src/file_manipulator.py: new _fill_with_mapper() path uses
  LLM -> IncidentReport -> TemplateMapper -> Filler; legacy positional
  path preserved for backward compatibility
- Add templates/employee_form.yaml: sample mapping for src/inputs/file.pdf
- Add pyyaml to requirements.txt

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add src/transcriber.py: wraps OpenAI Whisper for fully local, offline
  audio transcription. Model is lazy-loaded on first use. Size is
  configurable via WHISPER_MODEL env var (tiny/base/small/medium/large,
  default: base). Supports WAV, MP3, M4A, MP4, OGG, FLAC
- Add POST /transcribe endpoint: accepts multipart audio file upload,
  returns {text, model_used, audio_filename}. Returns 415 for unsupported
  formats, 500 for transcription errors
- Add api/schemas/transcribe.py: TranscribeResponse schema
- Register /transcribe router in api/main.py
- Add 17 tests covering: model size validation, whitespace stripping,
  missing file, unsupported format, temp file cleanup, endpoint success,
  endpoint error handling, all 6 supported formats (parametrized)
- Add openai-whisper and python-multipart to requirements.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant