Skip to content

[ENHANCEMENT] Implement Comprehensive Testing and Enable CI/CD Pipeline #21

@davidamacey

Description

@davidamacey

Objective

Implement comprehensive test coverage for both frontend and backend components, then re-enable and enhance the CI/CD pipeline to ensure application stability and prevent regressions.


Current State (as of 2026-02-06)

Metric Value
Passing 124 tests
Failing 0
Errors 0
Skipped 356 tests (gated by env vars or missing infrastructure)
Sequential runtime 5m 27s
Parallel runtime 1m 55s (-n auto, 25 workers)
Test files 22 unit/integration + 4 E2E

Recent Improvements (feat/search-and-rag branch)

  • Added TESTING guard in lifespan() — skips migrations, MinIO, OpenSearch, Celery on startup (saves ~10 min)
  • Fixed 12 test_media_security.py errors (missing fixtures, hardcoded UUIDs)
  • Fixed test_main.py module-level TestClient triggering lifespan at import time
  • Added [tool.pytest.ini_options] in pyproject.toml with -n auto parallel defaults
  • Fixed parallel test isolation (pool_pre_ping, race-safe dependency override cleanup)
  • Fixed create_mock_response() to include security headers matching production behavior
  • Added PKI E2E Playwright tests (backend/tests/e2e/test_pki.py)

How to Run Tests Today

# Activate venv
source backend/venv/bin/activate

# Run all unit/integration tests (parallel by default via pyproject.toml)
python -m pytest backend/tests/ --ignore=backend/tests/e2e

# Run sequentially with verbose output
python -m pytest backend/tests/ --ignore=backend/tests/e2e -v -o "addopts="

# Run specific test file
python -m pytest backend/tests/test_pki_auth.py -v -o "addopts="

# Run E2E tests (requires dev environment running)
pytest backend/tests/e2e/ -v --headed

# Run PKI E2E tests (requires PKI overlay running)
RUN_PKI_E2E=true pytest backend/tests/e2e/test_pki.py -v --headed

Inventory of 356 Skipped Tests

All skipped tests are gated by environment variables. They fall into three categories:

Category A: Infrastructure-Gated (need running services, tests are valid)

These test real implemented features but need external services that aren't available in the default test environment.

File Env Var Tests Services Needed Notes
api/endpoints/test_files.py SKIP_S3=false 5 MinIO/S3 File CRUD, upload, listing
api/endpoints/test_comments.py SKIP_S3=false 8 MinIO/S3 Comment CRUD on media files
api/endpoints/test_speakers.py SKIP_S3=false 7 MinIO/S3 Speaker CRUD
api/endpoints/test_tasks.py SKIP_S3=false 4 MinIO/S3 Task status, listing
api/endpoints/test_search.py SKIP_OPENSEARCH=false 3 OpenSearch 3.4.0 Search endpoint
test_search_quality.py RUN_SEARCH_QUALITY_TESTS=true 27 OpenSearch + 20 indexed files Search relevance, semantic quality
test_mfa_security.py RUN_MFA_TESTS=true 40 Redis (optional, has fallback) MFA token blacklist, TOTP, backup codes
test_llm_settings.py RUN_LLM_TESTS=true 16 Database only (mocked providers) LLM settings CRUD, encryption
test_pki_auth.py RUN_PKI_TESTS=true ~22 Crypto libs (no external services) Certificate parsing, validation unit tests

Total: ~132 tests — These can be enabled with proper test infrastructure.

Category B: Feature-In-Development (tests for planned features)

These test features that are partially or not yet implemented. The tests define the expected behavior.

File Env Var Tests Feature Status Notes
test_fedramp_controls.py RUN_FEDRAMP_TESTS=true 38 In development AC-10 sessions, AC-8 banners, AU-6 audit
test_fedramp_compliance.py RUN_FEDRAMP_TESTS=true 25 Partial Password policy (done), MFA (done), some controls planned
test_fips_140_3.py RUN_FIPS_TESTS=true 39 Planned PBKDF2-SHA256 600K iterations, AES-256-GCM
test_auth_config_service.py RUN_AUTH_CONFIG_TESTS=true 38 Planned Dynamic auth config from DB
test_admin_security.py RUN_ADVANCED_ADMIN_TESTS=true 35 Partial Super admin, lock/unlock, session termination
test_admin_endpoints.py Hardcoded True (always skip) 5 Planned Password reset, role updates, audit export

Total: ~180 tests — These need the corresponding features to be implemented first.

Category C: External Service Tests (need live services)

File Env Var Tests Notes
test_external_llm.py None (graceful degradation) 19 Tests Ollama, vLLM, OpenAI connections

Total: ~19 tests — These test connections to external LLM providers.

Remaining (~25 tests)

Scattered skip conditions within larger test files (e.g., test_pki_auth.py has ~68 that run unconditionally + ~22 gated by RUN_PKI_TESTS).


Action Plan

Phase 1: Enable Infrastructure-Gated Tests (Category A) — ~132 tests

These are the highest value tests to ungate because they test real working features.

1a. Enable S3/MinIO tests (24 tests)

The test environment sets SKIP_S3=True because MinIO isn't reachable from localhost. Fix by connecting tests to the running MinIO container.

Changes needed:

  • In conftest.py, detect if MinIO is reachable at localhost:5179 and set SKIP_S3 accordingly
  • Add a minio_client fixture that connects to the dev MinIO instance
  • Create a test bucket (or use the existing one with test-prefixed paths)
  • Add cleanup in fixture teardown to remove test objects
  • Ensure test_files.py, test_comments.py, test_speakers.py, test_tasks.py use the fixture

Infrastructure: Dev environment must be running (./opentr.sh start dev)

# Example: auto-detect MinIO in conftest.py
import socket

def _is_minio_available():
    try:
        sock = socket.create_connection(("localhost", 5179), timeout=2)
        sock.close()
        return True
    except (socket.error, socket.timeout):
        return False

if _is_minio_available():
    os.environ["SKIP_S3"] = "False"
    os.environ["MINIO_HOST"] = "localhost"
    os.environ["MINIO_PORT"] = "5179"

1b. Enable OpenSearch tests (30 tests)

Changes needed:

  • Auto-detect OpenSearch at localhost:5180 in conftest.py
  • For test_search.py (3 tests): just needs basic OpenSearch connectivity
  • For test_search_quality.py (27 tests): needs indexed data — either:
    • Create a fixture that indexes test documents before the test class
    • Or gate these behind RUN_SEARCH_QUALITY_TESTS and run as integration tests with real data

1c. Enable MFA security tests (40 tests)

Changes needed:

  • Most tests mock Redis or test fallback behavior — review which actually need Redis
  • Tests that test fail-open/fail-secure behavior can run without Redis
  • Set RUN_MFA_TESTS=true in CI when Redis is available

1d. Enable LLM settings tests (16 tests)

Changes needed:

  • These test LLM configuration CRUD, not actual LLM calls — should run without external LLM
  • Review tests and remove the RUN_LLM_TESTS gate for tests that only test database operations
  • Keep the gate only for tests that make actual API calls to LLM providers

1e. Enable PKI unit tests (22 tests)

Changes needed:

  • These are pure unit tests (certificate parsing, DN validation) — no external services needed
  • Remove the RUN_PKI_TESTS gate — they should always run
  • They already run fast and are self-contained with mock certificates

Phase 2: CI/CD Pipeline Setup

2a. GitHub Actions Workflow

# .github/workflows/tests.yml
name: Tests
on: [push, pull_request]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_USER: opentranscribe
          POSTGRES_PASSWORD: testpassword
          POSTGRES_DB: opentranscribe
        ports: ["5432:5432"]
        options: --health-cmd pg_isready --health-interval 10s
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - run: pip install -r backend/requirements.txt
      - run: pip install pytest pytest-xdist
      - run: |
          cd backend
          python -m pytest tests/ --ignore=tests/e2e -n auto --tb=short
        env:
          POSTGRES_HOST: localhost
          POSTGRES_PORT: 5432
          TESTING: "True"

  integration-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: ./opentr.sh start dev
      - run: sleep 30  # Wait for services
      - run: |
          source backend/venv/bin/activate
          SKIP_S3=False SKIP_OPENSEARCH=False \
          python -m pytest backend/tests/ --ignore=backend/tests/e2e -v
      - run: ./opentr.sh stop

  e2e-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: ./opentr.sh start dev
      - run: sleep 30
      - run: |
          source backend/venv/bin/activate
          pip install pytest-playwright
          npx playwright install chromium
          pytest backend/tests/e2e/ -v --browser chromium
      - run: ./opentr.sh stop

2b. Local Integration Test Script

Create scripts/run-integration-tests.sh:

#!/bin/bash
# Run all tests including infrastructure-dependent ones
# Requires: ./opentr.sh start dev

set -e
source backend/venv/bin/activate

echo "Checking services..."
# Auto-detect available services
SKIP_S3=True SKIP_OPENSEARCH=True

curl -s http://localhost:5179/minio/health/live >/dev/null 2>&1 && SKIP_S3=False
curl -s http://localhost:5180 >/dev/null 2>&1 && SKIP_OPENSEARCH=False

echo "MinIO: $([ "$SKIP_S3" = "False" ] && echo "available" || echo "unavailable")"
echo "OpenSearch: $([ "$SKIP_OPENSEARCH" = "False" ] && echo "available" || echo "unavailable")"

SKIP_S3=$SKIP_S3 \
SKIP_OPENSEARCH=$SKIP_OPENSEARCH \
RUN_MFA_TESTS=true \
RUN_LLM_TESTS=true \
RUN_PKI_TESTS=true \
python -m pytest backend/tests/ --ignore=backend/tests/e2e -v "$@"

Phase 3: Fix Feature-In-Development Tests (Category B) — ~180 tests

These tests need the corresponding features to be implemented. Track as sub-issues:

  • FedRAMP Controls (63 tests across 2 files) — AC-10 concurrent sessions, AC-8 login banners, AU-6 audit export
  • FIPS 140-3 (39 tests) — PBKDF2-SHA256 migration, AES-256-GCM encryption
    • Decision needed: Which FIPS 140-3 algorithms to adopt
  • Auth Config Service (38 tests) — Dynamic auth configuration from database
    • Depends on: FedRAMP AC-2 account management
  • Admin Security (35 tests) — Super admin roles, account lock/unlock, session termination
    • Partially implemented, needs completion
  • Admin Endpoints (5 tests) — Remove hardcoded True skip, implement or delete

Phase 4: Database & Test Data Management

Current Behavior

  • Tests use savepoint-based transaction isolation — all test data is rolled back automatically
  • Each test gets a unique UUID-suffixed email to avoid parallel conflicts
  • No test data persists in the database after test completion

Cleanup of Legacy Test Data

Previous test runs (before savepoint isolation was fixed) left ~10 orphaned test users:

testuser@example.com, testuser_68047a74@example.com, testuser_705f3292@example.com,
testuser_c063b59f@example.com, test-672747cb-...@example.com, etc.
  • Add a one-time cleanup migration or script to remove these
  • Or add a conftest.py session-scoped fixture that cleans up stale test users on startup

Test Data Factory

  • Consider adding factory_boy for more complex test data generation
  • Create factories for: User, MediaFile, Speaker, Comment, TranscriptionSegment

Phase 5: Frontend Testing


Phase 6: Coverage & Quality Gates

  • Add pytest-cov for coverage reporting
  • Set minimum coverage threshold (target: 80%)
  • Add coverage reporting to CI (Codecov or similar)
  • Add pre-commit hook for running tests on changed files

Test Infrastructure Requirements

Service Required For Port (Dev) Available in CI
PostgreSQL All tests 5176 GitHub Actions service
MinIO/S3 File operations, media tests 5179 Docker Compose
OpenSearch 3.4.0 Search tests 5180 Docker Compose
Redis MFA, rate limiting, sessions 6379 GitHub Actions service
Frontend (Vite) E2E tests 5173 Docker Compose
Backend (FastAPI) E2E tests 5174 Docker Compose
HTTPS/PKI Overlay PKI E2E tests 5182 Docker Compose + certs

Environment Variables Reference

Always-set in test environment (conftest.py):

TESTING=True
SKIP_CELERY=True
SKIP_REDIS=True
SKIP_WEBSOCKET=True
RATE_LIMIT_ENABLED=false
POSTGRES_HOST=localhost
POSTGRES_PORT=5176

Service-availability gates (auto-detect or manual):

SKIP_S3=True|False         # MinIO available?
SKIP_OPENSEARCH=True|False # OpenSearch available?

Feature test gates (opt-in):

RUN_MFA_TESTS=true             # MFA security tests (40 tests)
RUN_LLM_TESTS=true             # LLM settings tests (16 tests)
RUN_PKI_TESTS=true             # PKI unit tests (22 tests)
RUN_SEARCH_QUALITY_TESTS=true  # Search quality tests (27 tests, needs indexed data)
RUN_FEDRAMP_TESTS=true         # FedRAMP tests (63 tests, features in development)
RUN_FIPS_TESTS=true            # FIPS 140-3 tests (39 tests, planned feature)
RUN_AUTH_CONFIG_TESTS=true     # Auth config tests (38 tests, planned feature)
RUN_ADVANCED_ADMIN_TESTS=true  # Admin security tests (35 tests, partial)
RUN_PKI_E2E=true               # PKI E2E browser tests (requires PKI overlay)

Priority Order

  1. Ungate PKI unit tests — pure unit tests, no infrastructure needed, quick win (+22 tests)
  2. Ungate LLM settings tests — mostly DB operations, no external LLM needed (+16 tests)
  3. Ungate MFA tests — test fallback paths without Redis (+40 tests)
  4. Enable S3 tests with auto-detection — when dev env is running (+24 tests)
  5. Enable OpenSearch tests with auto-detection — when dev env is running (+30 tests)
  6. CI/CD pipeline — GitHub Actions with PostgreSQL service container
  7. Integration test script — local developer convenience
  8. FedRAMP/FIPS features — implement features, then ungate tests
  9. Frontend tests — vitest + Playwright expansion
  10. Coverage gates — enforce minimum coverage on PRs

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendBackend related issues and featuresci-cdCI/CD pipeline and deployment related tasksenhancementNew feature or requestfrontendFrontend related issues and featurestestingTesting related tasks and improvements

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions