feat(ingest): audit and leverage structured metadata across all ingest sources

## Summary

We recently started collecting rich structured metadata on ingested content but we're not fully leveraging it. Audit every source type, document what's being stored, identify gaps, and wire metadata into search/filter/UI surfaces.

## What We're Collecting Today

**GitHub** (`crates/ingest/github/meta.rs`) — well-structured:
- Repo chunks: `gh_owner`, `gh_stars`, `gh_forks`, `gh_open_issues`, `gh_language`, `gh_topics`, `gh_created_at`, `gh_pushed_at`, `gh_is_fork`, `gh_is_archived`
- Issue chunks: `gh_issue_number`, `gh_state`, `gh_author`, `gh_created_at`, `gh_updated_at`, `gh_comment_count`, `gh_labels`, `gh_is_pr`
- PR chunks: + `gh_merged_at`, `gh_is_draft`

**Reddit** (`crates/ingest/reddit/meta.rs`) — audit needed, document what fields exist

**Sessions** (`crates/ingest/sessions/`) — **no structured metadata**:
- `embed_text_with_metadata(cfg, text, url, "claude_session", title)` — just a source type string
- No `session_platform`, `session_project`, `session_date`, `session_turn_count`, etc.

**Local file embeds** (`axon embed <path>`) — audit needed:
- File path, MIME type, last modified — are any of these captured in the Qdrant payload?

## What Needs To Happen

### 1. Audit pass — document everything

Add a **Qdrant Payload Fields** section to `docs/SCHEMA.md` listing every field stored per source type, its type, and example values. This is the source of truth.

### 2. Add structured metadata to sessions chunks

```rust
// Currently (no structured metadata):
embed_text_with_metadata(cfg, text, url, "claude_session", title)

// Target (structured payload):
embed_text_with_extra_payload(cfg, text, url, "claude_session", title, json!({
    "session_platform": "claude",
    "session_project": project_name,
    "session_date": session_date,       // ISO 8601
    "session_file": file_path,
    "session_turn_count": turn_count,
    "session_model": model_name,        // where parseable from export
}))
```

Same for `codex_session` and `gemini_session`.

### 3. Add structured metadata to local file embed chunks

```rust
json!({
    "embed_source": "local_file",
    "file_path": path.to_string_lossy(),
    "file_extension": ext,
    "file_size_bytes": size,
    "file_modified_at": mtime.to_rfc3339(),
})
```

### 4. Wire metadata into search + filter

- `axon query` / `axon ask`: add `--filter key=value` flag for Qdrant payload filtering
  - `axon query "memory leak" --filter gh_language=rust --filter gh_is_pr=false`
  - `axon query "async" --filter session_platform=claude --filter session_project=axon_rust`
- `axon sources` / `axon domains`: break down by source type, show metadata summary

### 5. Surface metadata in Cortex UI

- Stats page: metadata distribution (top languages, top labels, issues vs PRs vs files)
- GitHub-specific filters in search: open issues only, PRs only, specific language
- Session search: filter by platform, project, date range

## Files

| File | Action |
|------|--------|
| `crates/ingest/sessions/claude.rs` | Switch to `embed_text_with_extra_payload` with session metadata |
| `crates/ingest/sessions/codex.rs` | Same |
| `crates/ingest/sessions/gemini.rs` | Same |
| `crates/vector/ops/commands/query.rs` | Add `--filter key=value` metadata filter support |
| `docs/SCHEMA.md` | Add Qdrant payload fields section — all fields per source type |

## Acceptance Criteria

- [ ] `docs/SCHEMA.md` documents all Qdrant payload fields per source type (GitHub, Reddit, YouTube, sessions, local files)
- [ ] Session chunks have structured extra payload: `session_platform`, `session_project`, `session_date`, `session_turn_count`
- [ ] Local file embed chunks have structured payload: `embed_source`, `file_path`, `file_extension`, `file_modified_at`
- [ ] `axon query --filter key=value` filters results by Qdrant payload field
- [ ] Cortex stats page shows metadata distribution
- [ ] `cargo clippy` clean, all tests pass

File	Action
`crates/ingest/sessions/claude.rs`	Switch to `embed_text_with_extra_payload` with session metadata
`crates/ingest/sessions/codex.rs`	Same
`crates/ingest/sessions/gemini.rs`	Same
`crates/vector/ops/commands/query.rs`	Add `--filter key=value` metadata filter support
`docs/SCHEMA.md`	Add Qdrant payload fields section — all fields per source type

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingest): audit and leverage structured metadata across all ingest sources #33

Summary

What We're Collecting Today

What Needs To Happen

1. Audit pass — document everything

2. Add structured metadata to sessions chunks

3. Add structured metadata to local file embed chunks

4. Wire metadata into search + filter

5. Surface metadata in Cortex UI

Files

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(ingest): audit and leverage structured metadata across all ingest sources #33

Description

Summary

What We're Collecting Today

What Needs To Happen

1. Audit pass — document everything

2. Add structured metadata to sessions chunks

3. Add structured metadata to local file embed chunks

4. Wire metadata into search + filter

5. Surface metadata in Cortex UI

Files

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions