Comparisons

This is a simple repo to collect prompts, examples, benchmarks and other useful resources for LLM-based transcription and diarization.

Comparisons

To run the comparisons, run bun comparisons/compare.ts - or look at comparisons/results.txt. The WER estimation is based on using Myer's Diff to get references points and then compare. This isn't the proper way to do this, but the relative differences should hold. YMMV and IANA Specialist in this field :)

Prompts

Transcription

Here's the simple prompt we're using for transcription:

Transcribe this audio file in its entirety.

This prompt significantly improves the word error rate on some audio files:

Here is audio of someone speaking. Transcribe the whole thing for me, correcting for possible mistranscriptions based on your understanding.

Likely could be made a lot better, but this is a good start.

Diarization

This is Google's suggested prompt.

Generate audio diarization, including transcriptions and speaker information for each transcription, for this interview. Organize the transcription by the time they happened.

Works really well, but hard to clean out the non-transcription parts or use as structured data.

From @zhanghaoxxxx on Twitter:

Please transcribe the audio file into lrc format, separating each speaker’s dialogue by labeling them as Speaker 1, Speaker 2, etc. Ensure that all spoken content is clearly segmented and each speaker is identified.

This works much better, but you can modify it for video files to identify speaker names (if it's in the video):

Please transcribe the video file into lrc format, separating each speaker’s dialogue by labeling them as with the appropriate names from the video. Ensure that all spoken content is clearly segmented and each speaker is identified.

If you want proper structured data, you can use the JSON prompt below with the schema. This is good at identifying speakers and returning good structured data, but for some reason punctuation keeps getting left out compared to the other methods.

Transcribe this for me, following this typespec:

\`\`\`typescript
type Transcript = {
  title: string;
  speakerNames: string[];
  transcript: {
    speaker: name;
    startTime: number;
    text: string;
  }[];
}
\`\`\````

Respond only in JSON.

Use this Schema:

{
  "type": "object",
  "properties": {
    "title": {
      "type": "string"
    },
    "speakerNames": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "transcript": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "speaker": {
            "type": "string"
          },
          "startTime": {
            "type": "number"
          },
          "text": {
            "type": "string"
          }
        },
        "required": ["speaker", "startTime", "text"]
      }
    }
  },
  "required": ["title", "speakerNames", "transcript"]
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
comparisons		comparisons
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparisons

Prompts

Transcription

Diarization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Comparisons

Prompts

Transcription

Diarization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages