[AGENT] Harden transcription final pass and replay tooling#160
[AGENT] Harden transcription final pass and replay tooling#160
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 23d8397d30
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR hardens the transcription pipeline by adding a finalized-audio “final pass” verification step, strengthening prompt vs no-prompt voting against low-information outputs, and adding replay-oriented eval tooling to compare prompt/no-prompt behavior with extra telemetry and docs coverage.
Changes:
- Add
runTranscriptionFinalPassto verify transcripts against finalized mixed audio, apply high-confidence edits, reject trivial replacements, and filter repeated low-information segments. - Improve transcription vote selection by penalizing trivial/punctuation-only candidates and recording new “trivial text” telemetry.
- Extend transcription eval tooling with
--compare-no-prompt, and update tests/config/docs/prompts to support the new pipeline.
Reviewed changes
Copilot reviewed 37 out of 38 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/services/transcriptionPromptService.test.ts | Adds coverage for final-pass prompt formatting and Langfuse prompt name. |
| test/services/transcriptionFinalPassService.test.ts | Adds integration-style tests for chunking, boundary handling, and guardrail fallback. |
| test/services/imageCaptionService.test.ts | Updates test runtime config shape with finalPassEnabled. |
| test/services/contextService.test.ts | Updates default transcription config in tests to include finalPassEnabled. |
| test/services/autoRecordCancellationService.test.ts | Updates test runtime config shape with finalPassEnabled. |
| test/commands/endMeeting.test.ts | Mocks and asserts end-meeting flow invokes the final pass when transcription is enabled. |
| test/audio/startProcessingSnippet.test.ts | Updates test runtime config shape with finalPassEnabled. |
| src/utils/transcriptionVote.ts | Hardens vote scoring/selection using trivial-text detection and a no-prompt quality floor. |
| src/utils/transcriptionText.ts | Introduces transcript text-quality helpers (trivial, punctuation-only, low-information, near-duplicate). |
| src/utils/tests/transcriptionVote.test.ts | Adds tests for punctuation-only/trivial vote behavior. |
| src/utils/tests/transcriptionText.test.ts | Adds tests for punctuation-only detection and normal speech classification. |
| src/types/audio.ts | Adds finalPassTranscript field to AudioFileData. |
| src/services/uploadService.ts | Prefers finalPassTranscript when building transcript JSON output. |
| src/services/transcriptionService.ts | Adds trivial-text telemetry and logs candidate quality details. |
| src/services/transcriptionPromptService.ts | Adds Langfuse prompt builder for final-pass reconciliation and baseline segment formatting. |
| src/services/transcriptionFinalPassService.ts | Implements finalized-audio chunk transcription, reconciliation, edit application, and repetition filtering. |
| src/services/meetingTimelineService.ts | Prefers finalPassTranscript for live timeline segments. |
| src/services/meetingConfigService.ts | Plumbs transcription.finalPass.enabled into runtime config resolution. |
| src/services/configService.ts | Adds Langfuse prompt name config for final-pass prompt (LANGFUSE_PROMPT_TRANSCRIPTION_FINAL_PASS). |
| src/services/tests/transcriptionFinalPassService.test.ts | Adds focused unit tests for trivial replacement rejection and repetition filtering. |
| src/evals/transcriptionEval.ts | Adds trivial-text counts and a prompt vs no-prompt comparison mode in eval output. |
| src/constants.ts | Adds final-pass chunking, confidence, and guardrail constants. |
| src/config/types.ts | Extends runtime config typing with finalPassEnabled. |
| src/config/registry.ts | Registers transcription.finalPass.enabled config entry (global toggle). |
| src/config/keys.ts | Adds config key constant for transcription.finalPass.enabled. |
| src/commands/endMeeting.ts | Runs final transcription pass after mixed audio is built and before compiling transcript/notes. |
| src/audio.ts | Prefers finalPassTranscript when compiling full meeting transcription. |
| prompts/chronote-transcription-final-pass-chat.md | Adds a Langfuse-managed prompt for reconciliation edits JSON output. |
| docs/feature-toggles.md | Documents the new transcription.finalPass.enabled toggle and updates vote description. |
| docs/audio-transcription.md | Documents punctuation-only handling and final-pass behaviors. |
| docs/audio-transcription-plan.md | Updates plan with replay tooling and repeated-phrase cleanup. |
| apps/docs-site/docs/core-concepts/meeting-lifecycle.md | Updates meeting lifecycle docs to include finalized-audio verification stage. |
| README.md | Mentions the finalized-audio verification pass at meeting end. |
| AGENTS.md | Updates agent context for final-pass toggle and prompt override env var. |
| .opencode/skills/pr-review-recycle/SKILL.md | Updates PR review recycle checklist to include a 5-minute post-push wait. |
| .opencode/skills/pr-post-push-sop/SKILL.md | Adds a new post-push SOP skill document. |
| .github/copilot-instructions.md | Mirrors AGENTS guidance updates, including post-push SOP and final-pass notes. |
Greptile SummaryThis PR hardens the transcription pipeline against two families of low-quality output: punctuation-only results (e.g. a bare Key changes:
Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[runTranscriptionFinalPass] --> B[buildBaselineSegments]
B --> C[processAllChunks]
C --> D{LLM reconcile batch}
D -->|edit| E[applyBatchEdits]
E --> F{replace edit?}
F -->|yes| G{isTrivialTranscriptionText?}
G -->|yes| H[rejectedTrivialEdits++\nskip edit]
G -->|no| I[updateAcceptedEdit]
F -->|no - drop| I
C --> J{acceptedEdits.size == 0?}
J -->|yes| K[applyRepeatedLowInformationFilter]
J -->|no| L{guardrail thresholds exceeded?}
L -->|yes| M[return fallback result\nskip repetition filter]
L -->|no| N[applyAcceptedEdits]
N --> K
K --> O{isLowInformationTranscriptionText?}
O -->|no| P[skip segment]
O -->|yes| Q{areLowInformationTextsNearDuplicates\nwithin 180s window?}
Q -->|yes| R[finalPassTranscript = ''\nrepetitionFilteredSegments++]
Q -->|no| S[add to speaker window\nlastKeptBySpeaker]
K --> T[return result with applied=true]
|
|
[AGENT] Addressed the concrete bot review items locally before this push: added the missing final-pass mock fields in |
Visual regression reportNo visual diffs detected. Run: https://github.com/Chronote-gg/Chronote/actions/runs/23422584614 |
Visual regression reportNo visual diffs detected. Run: https://github.com/Chronote-gg/Chronote/actions/runs/23422620451 |
Visual regression reportNo visual diffs detected. Run: https://github.com/Chronote-gg/Chronote/actions/runs/23422804157 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/utils/transcriptionVote.ts:111
calculateCandidateQualityusesgetTranscriptionTextQuality(Unicode-aware) but still normalizes words vianormalizeWords, which strips non-ASCII letters/numbers. This means repetition/unique-word heuristics won’t work for non-Latin transcripts and can skew scoring/telemetry whenwordCountcomes from non-ASCII text. Consider reusing the Unicode-aware token normalization fromtranscriptionText(or switchingnormalizeWordsto\p{L}/\p{N}) so quality scoring is consistent across languages.
const textQuality = getTranscriptionTextQuality(candidate.text);
const trimmed = textQuality.trimmed;
const wordCount = textQuality.wordCount;
const normalizedWords = normalizeWords(trimmed);
const uniqueWordRatio =
normalizedWords.length === 0
? 1
: new Set(normalizedWords).size / normalizedWords.length;
const maxConsecutiveRepeats = calculateMaxConsecutiveRepeats(normalizedWords);
Visual regression reportNo visual diffs detected. Run: https://github.com/Chronote-gg/Chronote/actions/runs/23423069771 |
Visual regression reportNo visual diffs detected. Run: https://github.com/Chronote-gg/Chronote/actions/runs/23423323835 |
Visual regression reportNo visual diffs detected. Run: https://github.com/Chronote-gg/Chronote/actions/runs/23423501705 |
|
[AGENT] Addressed the last minor cleanup items from the latest Greptile pass: the no-space text test now has an accurate description, and the guardrail-threshold fallback in |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
src/evals/transcriptionEval.ts:467
--compare-no-promptcan be combined with dataset mode (--dataset/LANGFUSE_EVAL_DATASET), butrunLangfuseEvalcurrently ignorescompareNoPrompt(it only runs a single batch per item). This makes the flag silently ineffective in dataset runs. Consider rejecting--compare-no-promptwhenuseDatasetis true, or extending the dataset task/output schema to run and record both prompt modes.
const compareNoPrompt = hasFlag("--compare-no-prompt");
if (compareNoPrompt && provider !== "openai") {
throw new Error(
"--compare-no-prompt currently supports only --provider openai.",
);
}
const datasetFlag =
readFlagValue("--langfuse-dataset") ?? readFlagValue("--dataset");
const experimentFlag =
readFlagValue("--langfuse-experiment") ?? readFlagValue("--experiment");
const envDataset = process.env.LANGFUSE_EVAL_DATASET;
const useDataset = Boolean(datasetFlag ?? (!file && envDataset));
const langfuseDataset = datasetFlag ?? (useDataset ? envDataset : undefined);
Visual regression reportNo visual diffs detected. Run: https://github.com/Chronote-gg/Chronote/actions/runs/23423671051 |
Visual regression reportNo visual diffs detected. Run: https://github.com/Chronote-gg/Chronote/actions/runs/23423718230 |
Visual regression reportNo visual diffs detected. Run: https://github.com/Chronote-gg/Chronote/actions/runs/23423938912 |
Summary
Testing
yarn test --runTestsByPath src/utils/__tests__/transcriptionText.test.ts src/utils/__tests__/transcriptionVote.test.ts src/services/__tests__/transcriptionFinalPassService.test.tsyarn lint:check src/evals/transcriptionEval.ts src/services/transcriptionFinalPassService.ts src/services/transcriptionService.ts src/utils/transcriptionVote.ts src/utils/transcriptionText.ts src/utils/__tests__/transcriptionText.test.ts src/utils/__tests__/transcriptionVote.test.ts src/services/__tests__/transcriptionFinalPassService.test.tsyarn buildnpx prettier --check src/evals/transcriptionEval.ts src/services/transcriptionFinalPassService.ts src/services/transcriptionService.ts src/utils/transcriptionVote.ts src/utils/transcriptionText.ts src/utils/__tests__/transcriptionText.test.ts src/utils/__tests__/transcriptionVote.test.ts src/services/__tests__/transcriptionFinalPassService.test.ts docs/audio-transcription.md docs/feature-toggles.md docs/audio-transcription-plan.mdNotes
yarn markdownlint:checkis not runnable in this environment becausemarkdownlint-cli2is not installed.