Backend Documentation Index

Comprehensive documentation for the OpenTranscribe backend system, organized by component and use case.

📚 Documentation Structure

🏠 Main Backend README

Start here for backend overview and quick setup

Quick start guide
Architecture overview
Development workflow
API access points
Testing and deployment

🏗️ Application Architecture

Deep dive into application structure and design patterns

Layered architecture explanation
Directory structure and organization
Request flow and data flow
Development guidelines and patterns

📖 Component Documentation

🔐 Authentication Module

Enterprise authentication with multiple identity providers

The backend/app/auth/ module provides flexible authentication supporting all methods simultaneously (hybrid auth):

Local Authentication (direct_auth.py): bcrypt_sha256 password hashing
LDAP/Active Directory (ldap_auth.py): Enterprise directory integration
OIDC/Keycloak (keycloak_auth.py): OAuth 2.0 with PKCE flow
PKI/X.509 (pki_auth.py): Certificate-based authentication (CAC/PIV)

Auth method configuration is stored in the database (Admin UI → Settings → Authentication) and takes precedence over .env variables.

Security Components:

mfa.py - TOTP multi-factor authentication with backup codes
password_policy.py - Configurable password complexity rules
password_history.py - Password reuse prevention
lockout.py - Account lockout after failed attempts
rate_limit.py - Authentication rate limiting
session.py - Session management with token rotation
audit.py - Authentication event logging
token_service.py - JWT token management

Configuration Guides:

LDAP Authentication - Active Directory setup
Keycloak/OIDC Setup - OAuth 2.0 SSO
PKI Authentication - Certificate authentication
Security Overview - Security features and FedRAMP compliance
Testing Checklist - Authentication verification

🌐 API Layer

Complete API documentation and patterns

RESTful endpoint design
Authentication and authorization
Request/response patterns
Error handling standards
WebSocket integration
Adding new endpoints

🗄️ Data Models

Database schema and ORM models

Database design overview
Entity relationships
Model definitions and constraints
Query patterns and optimization
Migration strategies

🔧 Services Layer

Business logic and service patterns

Service layer principles
File management service
Transcription workflow service
External service integration
Error handling and transactions

⚡ Background Tasks

Asynchronous processing and AI workflows

Task system architecture
Transcription pipeline: unified 3-stage Celery chain (preprocess → GPU → postprocess)
Analytics and summarization
Task monitoring and error handling
Performance optimization
Selective reprocessing: stage picker for re-running specific pipeline stages

🎙️ ASR Module

Cloud and local ASR provider abstraction (backend/app/services/asr/)

10 providers: local (WhisperX), Deepgram, AssemblyAI, OpenAI, Google, Azure, AWS, Speechmatics, Gladia, pyannote.ai
Admin-pinned local model (asr.local_model system setting, overrides WHISPER_MODEL env var)
Per-user cloud provider configuration with encrypted API key storage
DEPLOYMENT_MODE=lite disables local GPU workers entirely

🛠️ Utilities

Common utilities and helper functions

Authentication decorators
Database helpers
Error handling utilities
Task management utilities
Testing patterns

📋 Specialized Documentation

🔒 Security Documentation

Security features and compliance

Authentication methods (Local, LDAP, OIDC, PKI)
Multi-factor authentication (MFA/TOTP)
Password policies and account lockout
Session management and rate limiting
Audit logging
FedRAMP compliance features (AC-8, IA-2, IA-5, AC-12, AU-2/AU-3)

🗃️ Database Strategy

Database management approach

Development vs production workflows
Alembic migration strategy
Schema change procedures
Troubleshooting guide

📁 Utility Scripts

Administrative and development scripts

Admin user creation
Database inspection tools
Debugging utilities
Setup scripts

Files Removed in v0.4.0

The following files were removed as part of the pipeline unification refactor. Any documentation or code references to them should be updated:

Removed File	Replacement
`whisperx_service.py`	Unified ASR provider abstraction in `app/services/asr/`
`parallel_pipeline.py`	Celery chain: preprocess → GPU → postprocess
`fast_speaker_assignment.py`	Folded into unified GPU stage
`batched_alignment.py`	Removed — word timestamps are native in faster-whisper DTW
`pyannote_compat.py`	PyAnnote v4 API used directly (no compat shim needed)

Word-level timestamps are now produced natively by faster-whisper via cross-attention DTW. ENABLE_ALIGNMENT and TRANSCRIPTION_ENGINE environment variables are silently ignored.

🚀 Getting Started Guides

For New Developers

Backend README - Start here for environment setup
Application Architecture - Understand the codebase structure
API Documentation - Learn the API patterns
Adding Features Guide - Step-by-step feature development

For API Integration

API Layer Documentation
Authentication Patterns
Error Handling
Interactive API Docs: http://localhost:5174/api/docs

For Database Work

For Background Processing

Tasks Overview
Transcription Pipeline
Task Monitoring
Flower Dashboard: http://localhost:5175/flower

🔧 Development Workflows

Adding New Features

1. Planning Phase

Review Application Architecture for patterns
Check API Documentation for endpoint conventions
Review Data Models for database design

2. Implementation Phase

# Follow this order for new features:
1. Update database schema (models/ + DATABASE_APPROACH.md)
2. Create/update Pydantic schemas (schemas/)
3. Implement business logic (services/)
4. Create API endpoints (api/endpoints/)
5. Add background tasks if needed (tasks/)
6. Write comprehensive tests (tests/)

3. Documentation Phase

Update relevant README files
Add docstrings to all new functions/classes
Update API documentation examples
Add any new patterns to architecture docs

Debugging Workflows

API Issues

Check Error Handling patterns
Review API Documentation for debugging tips
Use interactive docs at http://localhost:5174/api/docs
Check logs: ./opentr.sh logs backend

Database Issues

Review Database Strategy
Use Database Scripts for inspection
Check Query Patterns
Run: python scripts/db_inspect.py

Background Task Issues

Check Task Documentation
Monitor via Flower Dashboard: http://localhost:5175/flower
Review Task Utilities
Check logs: ./opentr.sh logs celery-worker

📊 Reference Materials

API Reference

Interactive Docs: http://localhost:5174/api/docs
ReDoc: http://localhost:5174/api/redoc
Endpoint List
Authentication Guide

Database Reference

Task Reference

Utility Reference

🧪 Testing Documentation

Test Organization

tests/
├── api/endpoints/          # API endpoint tests
├── services/              # Service layer tests
├── models/               # Database model tests
├── tasks/                # Background task tests
└── utils/                # Utility function tests

Running Tests

# All tests
./opentr.sh shell backend
pytest tests/

# Specific test categories
pytest tests/api/           # API tests
pytest tests/services/      # Service tests
pytest tests/models/        # Model tests

# With coverage
pytest --cov=app tests/

Test Documentation Links

🚀 Deployment Documentation

Environment Setup

Database Deployment

Migration Strategy
Backup Procedures

Task System Deployment

🤝 Contributing Guidelines

Code Standards

Documentation Standards

Google-style docstrings for all functions and classes
Type hints throughout the codebase
README updates for new components
API documentation for new endpoints

Review Checklist

📞 Support and Resources

Getting Help

Main README for general questions
Troubleshooting Guide
GitHub Issues for bug reports

Useful Commands

# Development
./opentr.sh start dev           # Start development environment
./opentr.sh logs backend        # View backend logs
./opentr.sh shell backend       # Access backend container

# Database
./opentr.sh reset dev           # Reset development database
python scripts/db_inspect.py    # Inspect database state

# Testing
pytest tests/                   # Run all tests
pytest --cov=app tests/         # Run with coverage

# Monitoring
# Flower: http://localhost:5175/flower
# API Docs: http://localhost:5174/api/docs

This documentation is living documentation - please keep it updated as the system evolves!

Last updated: 2026-03-22 Backend version: OpenTranscribe v0.4.0 Python: 3.11+ FastAPI: 0.100+

FilesExpand file tree

BACKEND_DOCUMENTATION.md

Latest commit

History