Skip to content

Feature: Transcript Version Control and Change Tracking #46

@davidamacey

Description

@davidamacey

Feature Request: Transcript Version Control and Change Tracking

Summary

Implement comprehensive version control and change tracking for transcript edits, allowing users to see who made changes, when they were made, and what specific modifications were applied. This feature will provide full audit trails and the ability to revert to previous versions.

Current State Analysis

  • Transcript edits currently overwrite existing data with no history preservation
  • No tracking of who made changes or when they occurred
  • No ability to revert to previous versions or compare changes
  • Changes are made directly to transcript_segment table without audit trail

Proposed Solution Overview

Implement a multi-layered approach with:

  1. Transcript Version History - Complete version snapshots
  2. Change Audit Trail - Individual edit tracking
  3. User Interface - History viewing and version comparison
  4. Version Management - Restore and rollback capabilities

📋 Detailed Requirements

Core Functionality

  • Track all transcript modifications with full audit trail
  • Display chronological history of changes by user, date, and time
  • Show specific text changes (before/after) for each edit
  • Enable version comparison with diff highlighting
  • Allow reverting to any previous version
  • Preserve original AI-generated transcript as baseline
  • Support collaborative editing with conflict resolution

User Experience Requirements

  • Intuitive history viewer with timeline interface
  • Clear indication of who made each change
  • Easy-to-understand diff visualization
  • Quick revert functionality with confirmation
  • Filtering options (by user, date range, change type)
  • Export capabilities for change history

🏗️ Technical Implementation Plan

Backend Changes

1. Database Schema Updates

New Table: transcript_versions

CREATE TABLE transcript_versions (
    id SERIAL PRIMARY KEY,
    media_file_id INTEGER NOT NULL REFERENCES media_file(id) ON DELETE CASCADE,
    version_number INTEGER NOT NULL,
    created_by INTEGER NOT NULL REFERENCES "user"(id),
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    change_reason TEXT,
    is_ai_generated BOOLEAN DEFAULT FALSE,
    total_segments INTEGER NOT NULL,
    UNIQUE(media_file_id, version_number)
);

New Table: transcript_version_segments

CREATE TABLE transcript_version_segments (
    id SERIAL PRIMARY KEY,
    version_id INTEGER NOT NULL REFERENCES transcript_versions(id) ON DELETE CASCADE,
    segment_order INTEGER NOT NULL,
    start_time FLOAT NOT NULL,
    end_time FLOAT NOT NULL,
    text TEXT NOT NULL,
    speaker_id INTEGER REFERENCES speaker(id),
    confidence FLOAT,
    UNIQUE(version_id, segment_order)
);

New Table: transcript_change_log

CREATE TABLE transcript_change_log (
    id SERIAL PRIMARY KEY,
    media_file_id INTEGER NOT NULL REFERENCES media_file(id) ON DELETE CASCADE,
    segment_id INTEGER NOT NULL,
    changed_by INTEGER NOT NULL REFERENCES "user"(id),
    changed_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    change_type VARCHAR(20) NOT NULL, -- 'text_edit', 'speaker_change', 'timing_adjust', 'segment_split', 'segment_merge'
    old_value JSONB,
    new_value JSONB,
    version_before INTEGER REFERENCES transcript_versions(id),
    version_after INTEGER REFERENCES transcript_versions(id)
);

Update Existing Table: transcript_segment

ALTER TABLE transcript_segment 
ADD COLUMN updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
ADD COLUMN updated_by INTEGER REFERENCES "user"(id),
ADD COLUMN version INTEGER DEFAULT 1;

2. New API Endpoints

Version Management

  • GET /api/files/{file_id}/transcript/versions - List all versions
  • GET /api/files/{file_id}/transcript/versions/{version_id} - Get specific version
  • POST /api/files/{file_id}/transcript/versions - Create new version snapshot
  • PUT /api/files/{file_id}/transcript/restore/{version_id} - Restore to version

Change History

  • GET /api/files/{file_id}/transcript/history - Get change log with filters
  • GET /api/files/{file_id}/transcript/diff/{version1}/{version2} - Compare versions
  • POST /api/files/{file_id}/transcript/segments/{segment_id}/history - Log individual change

Enhanced Editing

  • Update existing PUT /api/files/{file_id}/transcript/segments/{segment_id} to include version tracking

3. New Backend Services

app/services/transcript_version_service.py

  • Version creation and management
  • Diff calculation between versions
  • Change logging utilities
  • Version restoration logic

app/services/transcript_history_service.py

  • Change tracking and audit trail
  • History querying with filters
  • Change analytics and reporting

4. Updated Models & Schemas

New Models:

  • TranscriptVersion (SQLAlchemy model)
  • TranscriptVersionSegment (SQLAlchemy model)
  • TranscriptChangeLog (SQLAlchemy model)

New Schemas:

  • TranscriptVersionResponse
  • TranscriptHistoryResponse
  • TranscriptDiffResponse
  • VersionCreateRequest

Frontend Changes

1. New Components

src/components/transcript/VersionHistory.svelte

  • Timeline view of all transcript versions
  • User avatars and timestamps
  • Change summaries and statistics
  • Filter controls (user, date, change type)

src/components/transcript/VersionCompare.svelte

  • Side-by-side or unified diff view
  • Syntax highlighting for changes
  • Navigation between changes
  • Export diff functionality

src/components/transcript/ChangeDetails.svelte

  • Detailed view of individual changes
  • Before/after text comparison
  • User information and timestamps
  • Revert option for specific changes

src/components/transcript/VersionSelector.svelte

  • Dropdown for version selection
  • Quick preview of version info
  • Restore confirmation dialog

2. Enhanced Existing Components

TranscriptDisplay.svelte

  • Version indicator badge
  • "View History" button
  • Change indicators on modified segments
  • Collaborative editing indicators

EditTranscriptButton.svelte

  • Version awareness
  • Change reason input dialog
  • Conflict detection warnings

3. New Routes/Pages

src/routes/files/[id]/history/+page.svelte

  • Dedicated history page for transcript
  • Full timeline and version management
  • Advanced filtering and search

src/routes/files/[id]/compare/[v1]/[v2]/+page.svelte

  • Version comparison interface
  • Detailed diff visualization

4. State Management Updates

New Stores:

  • transcriptVersions.js - Version data and state
  • transcriptHistory.js - Change log and filters
  • versionComparison.js - Diff data and UI state

🎯 User Stories

Primary User Stories

  1. As a user, I want to see who last edited each part of a transcript so I can follow up with questions
  2. As a user, I want to see what changes were made to understand the editing history
  3. As a user, I want to revert accidental changes without losing other good edits
  4. As a team lead, I want to review all changes made by team members for quality control
  5. As a user, I want to see when changes were made to track project progress

Advanced User Stories

  1. As a collaborator, I want to see if someone else is editing to avoid conflicts
  2. As an admin, I want to export change history for compliance reporting
  3. As a user, I want to compare the current transcript with the original AI version
  4. As a user, I want to add notes explaining why I made specific changes
  5. As a user, I want to filter history to see only my changes or specific types of edits

🔧 Implementation Phases

Phase 1: Foundation (Week 1-2)

  • Create database schema and migrations
  • Implement basic version tracking models
  • Update transcript editing API to log changes
  • Create version snapshot on first edit

Phase 2: Core History (Week 3-4)

  • Implement version history API endpoints
  • Build basic history viewing component
  • Add version indicators to transcript display
  • Implement simple revert functionality

Phase 3: Advanced Features (Week 5-6)

  • Build version comparison and diff visualization
  • Add advanced filtering and search
  • Implement change reason tracking
  • Create comprehensive history timeline

Phase 4: Polish & Integration (Week 7-8)

  • Add collaborative editing indicators
  • Implement export functionality
  • Performance optimization for large histories
  • Comprehensive testing and bug fixes

🧪 Testing Strategy

Backend Testing

  • Unit tests for version tracking services
  • API endpoint testing for all version operations
  • Database migration and rollback testing
  • Performance testing with large transcript histories

Frontend Testing

  • Component testing for history viewers
  • Integration testing for version operations
  • User interaction testing for diff visualization
  • Responsive design testing across devices

End-to-End Testing

  • Complete editing workflow with version tracking
  • Multi-user collaborative editing scenarios
  • Version restoration and rollback workflows
  • History export and filtering functionality

📊 Success Metrics

Functionality Metrics

  • All transcript changes are tracked with 100% accuracy
  • Version history is preserved for all files
  • Users can successfully revert to any previous version
  • Change attribution is correct for all edits

Performance Metrics

  • History loading takes < 2 seconds for files with 100+ versions
  • Diff calculation completes in < 1 second for typical transcripts
  • Database storage overhead < 200% of original transcript size
  • UI remains responsive during history operations

User Experience Metrics

  • Users can understand change history without training
  • Version comparison is visually clear and actionable
  • Revert operations are intuitive and safe
  • Collaborative editing conflicts are handled gracefully

🔒 Security Considerations

  • Ensure users can only view history for files they have access to
  • Implement proper authentication for all version operations
  • Prevent unauthorized version restoration
  • Audit all history-related actions in application logs
  • Protect against history manipulation or deletion
  • Ensure change attribution cannot be spoofed

📝 Documentation Requirements

  • API documentation for all new endpoints
  • User guide for version control features
  • Admin guide for history management
  • Database schema documentation
  • Migration guide for existing installations

🚀 Future Enhancements

Advanced Version Control

  • Branch-based editing for major revisions
  • Merge conflict resolution interface
  • Automated change summarization using AI
  • Integration with external version control systems

Collaboration Features

  • Real-time collaborative editing with live cursors
  • Change approval workflows for team environments
  • Commenting system on specific changes
  • Change request and review process

Analytics & Insights

  • Editing patterns and productivity metrics
  • Quality improvement tracking over time
  • User contribution analytics
  • Automated change quality scoring

💡 Technical Notes

Database Considerations

  • Consider partitioning for transcript_change_log table as it will grow large
  • Implement proper indexing for version queries
  • Plan for data retention policies for old versions
  • Consider compression for historical version storage

Performance Optimizations

  • Lazy loading for version history to avoid large initial loads
  • Efficient diff algorithms for large transcripts
  • Caching strategies for frequently accessed versions
  • Pagination for change history displays

Migration Strategy

  • Create initial version snapshots for existing transcripts
  • Preserve existing transcript data during schema updates
  • Provide rollback capability for the version control feature itself
  • Gradual rollout with feature flags for testing

Priority: High
Complexity: High
Estimated Effort: 6-8 weeks
Dependencies: Current transcript editing system
Related Issues: #[insert any related issues]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions