feat(timmy): content provider infrastructure and Google Drive source

## Summary

Refactor TMI's content extraction pipeline into a two-layer Source/Extractor architecture, add document access tracking, and implement Google Drive as the proof-of-concept service provider.

**Design spec:** [2026-04-08-content-providers-design.md](https://github.com/ericfitz/tmi/blob/dev/1.4.0/docs/superpowers/specs/2026-04-08-content-providers-design.md)
**Parent issue:** #214 (Phase 1 complete)
**Follow-up issue:** #249 (Confluence + OneDrive providers, delegated provider infrastructure)

## Scope

Infrastructure refactor + Google Drive as the proof-of-concept service provider. Delegated provider infrastructure (token table, encryption, account linking endpoints) and additional providers are tracked in #249.

## Implementation Phases

1. **Source/Extractor refactor** — Split existing `ContentProvider` into `ContentSource` + `ContentExtractor` layers; introduce pipeline orchestrator. No behavior change.
2. **Document access tracking** — Add `access_status` and `content_source` fields to Document model; URL pattern matcher; creation-time detection; 422 for unconfigured providers.
3. **Google Drive source** — Operator config, service account auth, validate/request access, background poller.
4. **Timmy session integration** — Skip inaccessible documents, `skipped_sources` in session response, `refresh_sources` endpoint, `request_access` endpoint.
5. **OpenAPI spec updates** — New schemas, endpoints, modified schemas.

## Key Design Decisions

- **Two-layer pipeline**: Sources (auth + fetch bytes) separated from Extractors (bytes → text)
- **Two provider categories**: Service providers (operator credentials) and Delegated providers (per-user OAuth tokens)
- **Google Drive auth**: Regular bot account (share-with-account model), least privilege
- **Document access tracking**: `access_status` and `content_source` fields on Document model
- **Hybrid validation**: Synchronous access check at document creation, async background poller for pending access
- **Unconfigured providers**: Reject with 422 (clear, actionable error)

## Acceptance Criteria

- Existing content providers (HTTP, PDF, direct text, JSON/DFD) work identically after refactor
- Google Drive documents can be added and accessed via service account
- Documents with pending access are tracked and polled
- Unconfigured provider URLs return 422 with actionable message
- Timmy sessions skip inaccessible documents and report what was skipped
- OpenAPI spec updated with new schemas and endpoints
- Unit tests for each new component
- Integration tests for Google Drive access flow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(timmy): content provider infrastructure and Google Drive source #232

Summary

Scope

Implementation Phases

Key Design Decisions

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(timmy): content provider infrastructure and Google Drive source #232

Description

Summary

Scope

Implementation Phases

Key Design Decisions

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions