Prompt Engineering for OpenTranscribe

This directory contains comprehensive documentation and improvements for LLM prompt engineering in OpenTranscribe, based on Anthropic's official best practices.

📚 Documentation

PROMPT_ENGINEERING_GUIDE.md

Comprehensive best practices guide covering:

✅ Core principles (clarity, XML structure, examples)
✅ Advanced techniques (chain-of-thought, few-shot learning)
✅ System prompts and role definition
✅ Long context management
✅ Structured output reliability
✅ Temperature settings
✅ Application-specific patterns for summarization & speaker ID

Use this as your reference when creating or modifying prompts.

PROMPT_IMPROVEMENTS_IMPLEMENTATION.md

Implementation summary documenting:

✅ What was changed and why
✅ Expected impact and metrics
✅ How to apply updates
✅ Testing checklist
✅ Monitoring guidelines
✅ Rollback procedures

Use this to understand what was implemented and how to deploy.

Quick Start

For New Installations

The improvements are already included in the default database seed. Just run:

./opentr.sh start dev

All new installations will automatically have the enhanced prompts.

Apply Improvements to Existing Installation

Prompt improvements are stored in the database and applied automatically on backend startup via Alembic migrations. To force a clean state:

./opentr.sh reset dev

Note: This resets the database. All existing data will be lost.

After Reset:

Restart services (happens automatically with reset)
Test with a sample file:
- Upload a meeting recording
- Generate summary
- Verify improved BLUF format and structure

What Changed

Code Changes (v0.4.0)

File	Changes	Impact
`backend/app/services/llm_service.py`	Response prefilling, quote extraction, enhanced parsing	More reliable JSON, better speaker ID
`backend/alembic/versions/v190_add_collection_default_prompt.py`	Per-collection prompt storage	Domain-specific prompts per collection
`backend/alembic/versions/v351_add_ai_summary_settings.py`	Per-file AI summary enable/disable	User control over LLM API calls
`backend/app/services/llm_service.py`	Org context injection	Org-aware summaries for enterprise deployments

Key Features

Response Prefilling
- Forces JSON output format
- Reduces parsing errors
- Improves consistency
Quote Extraction
- Asks for evidence first
- Improves speaker ID accuracy from ~30% to ~90%+
- Better confidence scoring
XML Structure
- Clear prompt organization
- Prevents section confusion
- Easier maintenance
Few-Shot Examples
- Shows exact expected format
- Covers different content types
- Dramatically improves consistency

📊 Expected Improvements

Metric	Before	After	Improvement
JSON Parsing Success	~85%	~98%	+13%
Speaker ID Accuracy	~30%	~90%+	+60%+
Action Item Completeness	~70%	~90%	+20%
Output Consistency	Variable	High	Significant

🧪 Testing

Quick Test

# 1. Upload a test transcript
# 2. Generate summary
# 3. Verify output includes:

✓ BLUF (2-3 sentences, clear outcome)
✓ Well-structured JSON
✓ Action items with owners and dates
✓ Major topics with key points
✓ Speaker analysis with roles

Detailed Testing Checklist

See Testing Checklist in implementation doc.

Priority Levels

Priority 1 - Implemented

XML structure in prompts
Response prefilling for JSON
Lower temperature (0.3 → 0.1)
BLUF format guidelines
Quote extraction for speaker ID
Per-collection prompts (v0.4.0)
Organizational context injection (v0.4.0)
AI summary enable/disable toggle per file and per user (v0.4.0)

Priority 2 - Planned

Context overlap in multi-chunk processing
Enhanced error handling with retry logic
Prompt A/B testing framework

Priority 3 - Future

Tool-based structured output (guaranteed schema)
Self-correction chain for high-stakes summaries
Contextual retrieval for historical search

📖 How to Use

Creating New Prompts

Review the guide: Start with PROMPT_ENGINEERING_GUIDE.md

Follow the structure:

<task_instructions>
Clear, direct instructions
</task_instructions>

<transcript>
{content}
</transcript>

<examples>
<example>...</example>
</examples>

<output_format>
{expected_json}
</output_format>

Add examples: Include 2-3 diverse, relevant examples
Test thoroughly: Verify with different content types

Modifying Existing Prompts

Understand current behavior: Test before changes
Make incremental changes: One improvement at a time
A/B test if possible: Compare old vs new
Monitor metrics: Track quality and consistency

🔍 Monitoring

Key Metrics

JSON Parsing Success Rate (Target: >95%)
BLUF Quality (Manual review samples)
Action Item Completeness (Owner, deadline, priority)
Speaker ID Confidence (Distribution of scores)

Log Monitoring

# Check for parsing errors
docker logs opentranscribe-backend | grep "Failed to parse"

# Monitor speaker identification
docker logs opentranscribe-backend | grep "speaker_predictions"

# Watch for JSON errors
docker logs opentranscribe-backend | grep "JSONDecodeError"

🐛 Troubleshooting

Common Issues

Issue	Solution
JSON parsing errors	Check prefill logic in `_prepare_payload`
BLUF too long/vague	Review examples in database prompt
Speaker ID confidence low	Verify quote extraction is enabled
Inconsistent outputs	Check temperature setting (should be 0.1)

Getting Help

Check PROMPT_ENGINEERING_GUIDE.md best practices
Review PROMPT_IMPROVEMENTS_IMPLEMENTATION.md implementation details
Check backend logs for specific errors
Test with simple/short content first

🔄 Rollback

If issues arise with the enhanced prompts:

# Revert code changes to llm_service.py
git checkout HEAD~1 backend/app/services/llm_service.py

# Restart services
./opentr.sh restart-backend

To revert database prompts, you would need to restore from a backup or manually update the prompt text in the database.

📚 Resources

Anthropic Documentation

Research Papers

📝 Files in This Implementation

.
├── docs/
│   ├── PROMPT_ENGINEERING_README.md          # This file (quick reference)
│   ├── PROMPT_ENGINEERING_GUIDE.md           # Comprehensive best practices
│   └── PROMPT_IMPROVEMENTS_IMPLEMENTATION.md # Implementation details
├── backend/app/services/llm_service.py       # Enhanced LLM service
└── database/init_db.sql                      # Enhanced default prompts (lines 358-620)

✨ Summary

What: Implemented Anthropic's prompt engineering best practices for OpenTranscribe

Why: Improve LLM output quality, consistency, and reliability

How: XML structure, few-shot examples, response prefilling, quote extraction

Impact: Higher quality summaries, better speaker ID, more reliable JSON parsing

Next Steps:

Apply database updates
Restart services
Test with sample content
Monitor metrics
Iterate and improve

Last Updated: March 2026 (v0.4.0) Based on: Anthropic Claude Official Documentation Status: Production-deployed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt Engineering for OpenTranscribe

📚 Documentation

PROMPT_ENGINEERING_GUIDE.md

PROMPT_IMPROVEMENTS_IMPLEMENTATION.md

Quick Start

For New Installations

Apply Improvements to Existing Installation

What Changed

Code Changes (v0.4.0)

Key Features

📊 Expected Improvements

🧪 Testing

Quick Test

Detailed Testing Checklist

Priority Levels

Priority 1 - Implemented

Priority 2 - Planned

Priority 3 - Future

📖 How to Use

Creating New Prompts

Modifying Existing Prompts

🔍 Monitoring

Key Metrics

Log Monitoring

🐛 Troubleshooting

Common Issues

Getting Help

🔄 Rollback

📚 Resources

Anthropic Documentation

Research Papers

📝 Files in This Implementation

✨ Summary

FilesExpand file tree

PROMPT_ENGINEERING_README.md

Latest commit

History

PROMPT_ENGINEERING_README.md

File metadata and controls

Prompt Engineering for OpenTranscribe

📚 Documentation

PROMPT_ENGINEERING_GUIDE.md

PROMPT_IMPROVEMENTS_IMPLEMENTATION.md

Quick Start

For New Installations

Apply Improvements to Existing Installation

What Changed

Code Changes (v0.4.0)

Key Features

📊 Expected Improvements

🧪 Testing

Quick Test

Detailed Testing Checklist

Priority Levels

Priority 1 - Implemented

Priority 2 - Planned

Priority 3 - Future

📖 How to Use

Creating New Prompts

Modifying Existing Prompts

🔍 Monitoring

Key Metrics

Log Monitoring

🐛 Troubleshooting

Common Issues

Getting Help

🔄 Rollback

📚 Resources

Anthropic Documentation

Research Papers

📝 Files in This Implementation

✨ Summary