[ENHANCEMENT] Implement Comprehensive Testing and Enable CI/CD Pipeline

## Objective

Implement comprehensive test coverage for both frontend and backend components, then re-enable and enhance the CI/CD pipeline to ensure application stability and prevent regressions.

---

## Current State (as of 2026-02-06)

| Metric | Value |
|--------|-------|
| **Passing** | 124 tests |
| **Failing** | 0 |
| **Errors** | 0 |
| **Skipped** | 356 tests (gated by env vars or missing infrastructure) |
| **Sequential runtime** | 5m 27s |
| **Parallel runtime** | 1m 55s (`-n auto`, 25 workers) |
| **Test files** | 22 unit/integration + 4 E2E |

### Recent Improvements (feat/search-and-rag branch)

- Added `TESTING` guard in `lifespan()` — skips migrations, MinIO, OpenSearch, Celery on startup (saves ~10 min)
- Fixed 12 `test_media_security.py` errors (missing fixtures, hardcoded UUIDs)
- Fixed `test_main.py` module-level `TestClient` triggering lifespan at import time
- Added `[tool.pytest.ini_options]` in `pyproject.toml` with `-n auto` parallel defaults
- Fixed parallel test isolation (`pool_pre_ping`, race-safe dependency override cleanup)
- Fixed `create_mock_response()` to include security headers matching production behavior
- Added PKI E2E Playwright tests (`backend/tests/e2e/test_pki.py`)

### How to Run Tests Today

```bash
# Activate venv
source backend/venv/bin/activate

# Run all unit/integration tests (parallel by default via pyproject.toml)
python -m pytest backend/tests/ --ignore=backend/tests/e2e

# Run sequentially with verbose output
python -m pytest backend/tests/ --ignore=backend/tests/e2e -v -o "addopts="

# Run specific test file
python -m pytest backend/tests/test_pki_auth.py -v -o "addopts="

# Run E2E tests (requires dev environment running)
pytest backend/tests/e2e/ -v --headed

# Run PKI E2E tests (requires PKI overlay running)
RUN_PKI_E2E=true pytest backend/tests/e2e/test_pki.py -v --headed
```

---

## Inventory of 356 Skipped Tests

All skipped tests are gated by environment variables. They fall into three categories:

### Category A: Infrastructure-Gated (need running services, tests are valid)

These test **real implemented features** but need external services that aren't available in the default test environment.

| File | Env Var | Tests | Services Needed | Notes |
|------|---------|-------|-----------------|-------|
| `api/endpoints/test_files.py` | `SKIP_S3=false` | 5 | MinIO/S3 | File CRUD, upload, listing |
| `api/endpoints/test_comments.py` | `SKIP_S3=false` | 8 | MinIO/S3 | Comment CRUD on media files |
| `api/endpoints/test_speakers.py` | `SKIP_S3=false` | 7 | MinIO/S3 | Speaker CRUD |
| `api/endpoints/test_tasks.py` | `SKIP_S3=false` | 4 | MinIO/S3 | Task status, listing |
| `api/endpoints/test_search.py` | `SKIP_OPENSEARCH=false` | 3 | OpenSearch 3.4.0 | Search endpoint |
| `test_search_quality.py` | `RUN_SEARCH_QUALITY_TESTS=true` | 27 | OpenSearch + 20 indexed files | Search relevance, semantic quality |
| `test_mfa_security.py` | `RUN_MFA_TESTS=true` | 40 | Redis (optional, has fallback) | MFA token blacklist, TOTP, backup codes |
| `test_llm_settings.py` | `RUN_LLM_TESTS=true` | 16 | Database only (mocked providers) | LLM settings CRUD, encryption |
| `test_pki_auth.py` | `RUN_PKI_TESTS=true` | ~22 | Crypto libs (no external services) | Certificate parsing, validation unit tests |

**Total: ~132 tests** — These can be enabled with proper test infrastructure.

### Category B: Feature-In-Development (tests for planned features)

These test features that are **partially or not yet implemented**. The tests define the expected behavior.

| File | Env Var | Tests | Feature Status | Notes |
|------|---------|-------|----------------|-------|
| `test_fedramp_controls.py` | `RUN_FEDRAMP_TESTS=true` | 38 | In development | AC-10 sessions, AC-8 banners, AU-6 audit |
| `test_fedramp_compliance.py` | `RUN_FEDRAMP_TESTS=true` | 25 | Partial | Password policy (done), MFA (done), some controls planned |
| `test_fips_140_3.py` | `RUN_FIPS_TESTS=true` | 39 | Planned | PBKDF2-SHA256 600K iterations, AES-256-GCM |
| `test_auth_config_service.py` | `RUN_AUTH_CONFIG_TESTS=true` | 38 | Planned | Dynamic auth config from DB |
| `test_admin_security.py` | `RUN_ADVANCED_ADMIN_TESTS=true` | 35 | Partial | Super admin, lock/unlock, session termination |
| `test_admin_endpoints.py` | Hardcoded `True` (always skip) | 5 | Planned | Password reset, role updates, audit export |

**Total: ~180 tests** — These need the corresponding features to be implemented first.

### Category C: External Service Tests (need live services)

| File | Env Var | Tests | Notes |
|------|---------|-------|-------|
| `test_external_llm.py` | None (graceful degradation) | 19 | Tests Ollama, vLLM, OpenAI connections |

**Total: ~19 tests** — These test connections to external LLM providers.

### Remaining (~25 tests)

Scattered skip conditions within larger test files (e.g., `test_pki_auth.py` has ~68 that run unconditionally + ~22 gated by `RUN_PKI_TESTS`).

---

## Action Plan

### Phase 1: Enable Infrastructure-Gated Tests (Category A) — ~132 tests

These are the highest value tests to ungate because they test **real working features**.

#### 1a. Enable S3/MinIO tests (24 tests)

The test environment sets `SKIP_S3=True` because MinIO isn't reachable from localhost. Fix by connecting tests to the running MinIO container.

**Changes needed:**
- [ ] In `conftest.py`, detect if MinIO is reachable at `localhost:5179` and set `SKIP_S3` accordingly
- [ ] Add a `minio_client` fixture that connects to the dev MinIO instance
- [ ] Create a test bucket (or use the existing one with test-prefixed paths)
- [ ] Add cleanup in fixture teardown to remove test objects
- [ ] Ensure `test_files.py`, `test_comments.py`, `test_speakers.py`, `test_tasks.py` use the fixture

**Infrastructure:** Dev environment must be running (`./opentr.sh start dev`)

```python
# Example: auto-detect MinIO in conftest.py
import socket

def _is_minio_available():
    try:
        sock = socket.create_connection(("localhost", 5179), timeout=2)
        sock.close()
        return True
    except (socket.error, socket.timeout):
        return False

if _is_minio_available():
    os.environ["SKIP_S3"] = "False"
    os.environ["MINIO_HOST"] = "localhost"
    os.environ["MINIO_PORT"] = "5179"
```

#### 1b. Enable OpenSearch tests (30 tests)

**Changes needed:**
- [ ] Auto-detect OpenSearch at `localhost:5180` in conftest.py
- [ ] For `test_search.py` (3 tests): just needs basic OpenSearch connectivity
- [ ] For `test_search_quality.py` (27 tests): needs indexed data — either:
  - Create a fixture that indexes test documents before the test class
  - Or gate these behind `RUN_SEARCH_QUALITY_TESTS` and run as integration tests with real data

#### 1c. Enable MFA security tests (40 tests)

**Changes needed:**
- [ ] Most tests mock Redis or test fallback behavior — review which actually need Redis
- [ ] Tests that test fail-open/fail-secure behavior can run without Redis
- [ ] Set `RUN_MFA_TESTS=true` in CI when Redis is available

#### 1d. Enable LLM settings tests (16 tests)

**Changes needed:**
- [ ] These test LLM configuration CRUD, not actual LLM calls — should run without external LLM
- [ ] Review tests and remove the `RUN_LLM_TESTS` gate for tests that only test database operations
- [ ] Keep the gate only for tests that make actual API calls to LLM providers

#### 1e. Enable PKI unit tests (22 tests)

**Changes needed:**
- [ ] These are pure unit tests (certificate parsing, DN validation) — no external services needed
- [ ] Remove the `RUN_PKI_TESTS` gate — they should always run
- [ ] They already run fast and are self-contained with mock certificates

---

### Phase 2: CI/CD Pipeline Setup

#### 2a. GitHub Actions Workflow

```yaml
# .github/workflows/tests.yml
name: Tests
on: [push, pull_request]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_USER: opentranscribe
          POSTGRES_PASSWORD: testpassword
          POSTGRES_DB: opentranscribe
        ports: ["5432:5432"]
        options: --health-cmd pg_isready --health-interval 10s
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - run: pip install -r backend/requirements.txt
      - run: pip install pytest pytest-xdist
      - run: |
          cd backend
          python -m pytest tests/ --ignore=tests/e2e -n auto --tb=short
        env:
          POSTGRES_HOST: localhost
          POSTGRES_PORT: 5432
          TESTING: "True"

  integration-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: ./opentr.sh start dev
      - run: sleep 30  # Wait for services
      - run: |
          source backend/venv/bin/activate
          SKIP_S3=False SKIP_OPENSEARCH=False \
          python -m pytest backend/tests/ --ignore=backend/tests/e2e -v
      - run: ./opentr.sh stop

  e2e-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: ./opentr.sh start dev
      - run: sleep 30
      - run: |
          source backend/venv/bin/activate
          pip install pytest-playwright
          npx playwright install chromium
          pytest backend/tests/e2e/ -v --browser chromium
      - run: ./opentr.sh stop
```

#### 2b. Local Integration Test Script

Create `scripts/run-integration-tests.sh`:

```bash
#!/bin/bash
# Run all tests including infrastructure-dependent ones
# Requires: ./opentr.sh start dev

set -e
source backend/venv/bin/activate

echo "Checking services..."
# Auto-detect available services
SKIP_S3=True SKIP_OPENSEARCH=True

curl -s http://localhost:5179/minio/health/live >/dev/null 2>&1 && SKIP_S3=False
curl -s http://localhost:5180 >/dev/null 2>&1 && SKIP_OPENSEARCH=False

echo "MinIO: $([ "$SKIP_S3" = "False" ] && echo "available" || echo "unavailable")"
echo "OpenSearch: $([ "$SKIP_OPENSEARCH" = "False" ] && echo "available" || echo "unavailable")"

SKIP_S3=$SKIP_S3 \
SKIP_OPENSEARCH=$SKIP_OPENSEARCH \
RUN_MFA_TESTS=true \
RUN_LLM_TESTS=true \
RUN_PKI_TESTS=true \
python -m pytest backend/tests/ --ignore=backend/tests/e2e -v "$@"
```

---

### Phase 3: Fix Feature-In-Development Tests (Category B) — ~180 tests

These tests need the corresponding features to be implemented. Track as sub-issues:

- [ ] **FedRAMP Controls** (63 tests across 2 files) — AC-10 concurrent sessions, AC-8 login banners, AU-6 audit export
  - Related: #98 (Security & Compliance)
- [ ] **FIPS 140-3** (39 tests) — PBKDF2-SHA256 migration, AES-256-GCM encryption
  - Decision needed: Which FIPS 140-3 algorithms to adopt
- [ ] **Auth Config Service** (38 tests) — Dynamic auth configuration from database
  - Depends on: FedRAMP AC-2 account management
- [ ] **Admin Security** (35 tests) — Super admin roles, account lock/unlock, session termination
  - Partially implemented, needs completion
- [ ] **Admin Endpoints** (5 tests) — Remove hardcoded `True` skip, implement or delete

---

### Phase 4: Database & Test Data Management

#### Current Behavior
- Tests use **savepoint-based transaction isolation** — all test data is rolled back automatically
- Each test gets a unique UUID-suffixed email to avoid parallel conflicts
- No test data persists in the database after test completion

#### Cleanup of Legacy Test Data
Previous test runs (before savepoint isolation was fixed) left ~10 orphaned test users:
```
testuser@example.com, testuser_68047a74@example.com, testuser_705f3292@example.com,
testuser_c063b59f@example.com, test-672747cb-...@example.com, etc.
```
- [ ] Add a one-time cleanup migration or script to remove these
- [ ] Or add a `conftest.py` session-scoped fixture that cleans up stale test users on startup

#### Test Data Factory
- [ ] Consider adding `factory_boy` for more complex test data generation
- [ ] Create factories for: User, MediaFile, Speaker, Comment, TranscriptionSegment

---

### Phase 5: Frontend Testing

- [ ] Unit tests for Svelte components (vitest + @testing-library/svelte)
- [ ] Store tests (auth, settings, transcription stores)
- [ ] Integration tests for API client functions
- [ ] Related: #123 (E2E Test Expansion)

---

### Phase 6: Coverage & Quality Gates

- [ ] Add `pytest-cov` for coverage reporting
- [ ] Set minimum coverage threshold (target: 80%)
- [ ] Add coverage reporting to CI (Codecov or similar)
- [ ] Add pre-commit hook for running tests on changed files

---

## Test Infrastructure Requirements

| Service | Required For | Port (Dev) | Available in CI |
|---------|-------------|------------|-----------------|
| PostgreSQL | All tests | 5176 | GitHub Actions service |
| MinIO/S3 | File operations, media tests | 5179 | Docker Compose |
| OpenSearch 3.4.0 | Search tests | 5180 | Docker Compose |
| Redis | MFA, rate limiting, sessions | 6379 | GitHub Actions service |
| Frontend (Vite) | E2E tests | 5173 | Docker Compose |
| Backend (FastAPI) | E2E tests | 5174 | Docker Compose |
| HTTPS/PKI Overlay | PKI E2E tests | 5182 | Docker Compose + certs |

---

## Environment Variables Reference

### Always-set in test environment (conftest.py):
```bash
TESTING=True
SKIP_CELERY=True
SKIP_REDIS=True
SKIP_WEBSOCKET=True
RATE_LIMIT_ENABLED=false
POSTGRES_HOST=localhost
POSTGRES_PORT=5176
```

### Service-availability gates (auto-detect or manual):
```bash
SKIP_S3=True|False         # MinIO available?
SKIP_OPENSEARCH=True|False # OpenSearch available?
```

### Feature test gates (opt-in):
```bash
RUN_MFA_TESTS=true             # MFA security tests (40 tests)
RUN_LLM_TESTS=true             # LLM settings tests (16 tests)
RUN_PKI_TESTS=true             # PKI unit tests (22 tests)
RUN_SEARCH_QUALITY_TESTS=true  # Search quality tests (27 tests, needs indexed data)
RUN_FEDRAMP_TESTS=true         # FedRAMP tests (63 tests, features in development)
RUN_FIPS_TESTS=true            # FIPS 140-3 tests (39 tests, planned feature)
RUN_AUTH_CONFIG_TESTS=true     # Auth config tests (38 tests, planned feature)
RUN_ADVANCED_ADMIN_TESTS=true  # Admin security tests (35 tests, partial)
RUN_PKI_E2E=true               # PKI E2E browser tests (requires PKI overlay)
```

---

## Priority Order

1. **Ungate PKI unit tests** — pure unit tests, no infrastructure needed, quick win (+22 tests)
2. **Ungate LLM settings tests** — mostly DB operations, no external LLM needed (+16 tests)
3. **Ungate MFA tests** — test fallback paths without Redis (+40 tests)
4. **Enable S3 tests with auto-detection** — when dev env is running (+24 tests)
5. **Enable OpenSearch tests with auto-detection** — when dev env is running (+30 tests)
6. **CI/CD pipeline** — GitHub Actions with PostgreSQL service container
7. **Integration test script** — local developer convenience
8. **FedRAMP/FIPS features** — implement features, then ungate tests
9. **Frontend tests** — vitest + Playwright expansion
10. **Coverage gates** — enforce minimum coverage on PRs

---

## Related Issues

- #123 — E2E Test Expansion (Gallery, Upload, Settings, Transcription, Search)
- #98 — Security & Compliance (HIPAA, SOC 2, GDPR)
- #124 — Content Security Policy Headers

Metric	Value
Passing	124 tests
Failing	0
Errors	0
Skipped	356 tests (gated by env vars or missing infrastructure)
Sequential runtime	5m 27s
Parallel runtime	1m 55s (`-n auto`, 25 workers)
Test files	22 unit/integration + 4 E2E

File	Env Var	Tests	Services Needed	Notes
`api/endpoints/test_files.py`	`SKIP_S3=false`	5	MinIO/S3	File CRUD, upload, listing
`api/endpoints/test_comments.py`	`SKIP_S3=false`	8	MinIO/S3	Comment CRUD on media files
`api/endpoints/test_speakers.py`	`SKIP_S3=false`	7	MinIO/S3	Speaker CRUD
`api/endpoints/test_tasks.py`	`SKIP_S3=false`	4	MinIO/S3	Task status, listing
`api/endpoints/test_search.py`	`SKIP_OPENSEARCH=false`	3	OpenSearch 3.4.0	Search endpoint
`test_search_quality.py`	`RUN_SEARCH_QUALITY_TESTS=true`	27	OpenSearch + 20 indexed files	Search relevance, semantic quality
`test_mfa_security.py`	`RUN_MFA_TESTS=true`	40	Redis (optional, has fallback)	MFA token blacklist, TOTP, backup codes
`test_llm_settings.py`	`RUN_LLM_TESTS=true`	16	Database only (mocked providers)	LLM settings CRUD, encryption
`test_pki_auth.py`	`RUN_PKI_TESTS=true`	~22	Crypto libs (no external services)	Certificate parsing, validation unit tests

File	Env Var	Tests	Feature Status	Notes
`test_fedramp_controls.py`	`RUN_FEDRAMP_TESTS=true`	38	In development	AC-10 sessions, AC-8 banners, AU-6 audit
`test_fedramp_compliance.py`	`RUN_FEDRAMP_TESTS=true`	25	Partial	Password policy (done), MFA (done), some controls planned
`test_fips_140_3.py`	`RUN_FIPS_TESTS=true`	39	Planned	PBKDF2-SHA256 600K iterations, AES-256-GCM
`test_auth_config_service.py`	`RUN_AUTH_CONFIG_TESTS=true`	38	Planned	Dynamic auth config from DB
`test_admin_security.py`	`RUN_ADVANCED_ADMIN_TESTS=true`	35	Partial	Super admin, lock/unlock, session termination
`test_admin_endpoints.py`	Hardcoded `True` (always skip)	5	Planned	Password reset, role updates, audit export

Service	Required For	Port (Dev)	Available in CI
PostgreSQL	All tests	5176	GitHub Actions service
MinIO/S3	File operations, media tests	5179	Docker Compose
OpenSearch 3.4.0	Search tests	5180	Docker Compose
Redis	MFA, rate limiting, sessions	6379	GitHub Actions service
Frontend (Vite)	E2E tests	5173	Docker Compose
Backend (FastAPI)	E2E tests	5174	Docker Compose
HTTPS/PKI Overlay	PKI E2E tests	5182	Docker Compose + certs

[ENHANCEMENT] Implement Comprehensive Testing and Enable CI/CD Pipeline #21

Description

Objective

Current State (as of 2026-02-06)

Recent Improvements (feat/search-and-rag branch)

How to Run Tests Today

Inventory of 356 Skipped Tests

Category A: Infrastructure-Gated (need running services, tests are valid)

Category B: Feature-In-Development (tests for planned features)

Category C: External Service Tests (need live services)

Remaining (~25 tests)

Action Plan

Phase 1: Enable Infrastructure-Gated Tests (Category A) — ~132 tests

1a. Enable S3/MinIO tests (24 tests)

1b. Enable OpenSearch tests (30 tests)

1c. Enable MFA security tests (40 tests)

1d. Enable LLM settings tests (16 tests)

1e. Enable PKI unit tests (22 tests)

Phase 2: CI/CD Pipeline Setup

2a. GitHub Actions Workflow

2b. Local Integration Test Script

Phase 3: Fix Feature-In-Development Tests (Category B) — ~180 tests

Phase 4: Database & Test Data Management

Current Behavior

Cleanup of Legacy Test Data

Test Data Factory

Phase 5: Frontend Testing

Phase 6: Coverage & Quality Gates

Test Infrastructure Requirements

Environment Variables Reference

Always-set in test environment (conftest.py):

Service-availability gates (auto-detect or manual):

Feature test gates (opt-in):

Priority Order

Related Issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions