Add indexed query support for field existence, regex, and full-text search (FTS) while staying local-first

## Summary

Implement Lucene-style query string support for Peek so users can query logs with syntax that is as close to Lucene QueryParser as possible, focusing on:

- Full-text search (FTS) on message and selected fields (analyzed)
- Field existence queries (`field:*`)
- Regex queries (`field:/.../`)
- Wildcards (`*`, `?`), phrases (`"..."`), required/prohibited (`+`, `-`), boosting (`^`), and boolean logic

Maintain local-first, single-binary distribution and the no-build-step UI model.

## Motivation

Peek currently supports a small Lucene-like subset evaluated via scanning (with time-range key seeking). Users want Lucene-like expressiveness, specifically:

- `field:*` existence
- `field:/regex/`
- real full-text search behavior (analysis/tokenization), not substring contains

This needs to work both for querying historical logs and for realtime filtering in the UI.

## Goals

1. Accept Lucene-style query string syntax in the UI and API, staying as close to Lucene QueryParser as practical.
2. Add FTS with an analyzer-driven inverted index (default field behavior like Lucene).
3. Add field existence query semantics compatible with Lucene (`field:*`).
4. Add regex query semantics compatible with Lucene query string (`field:/.../`).
5. Keep single binary, local-only, embedded UI in `pkg/server/index.html`, no new frontend dependencies, immutable VanJS updates.
6. Add Playwright E2E tests for the new query features.

## Non-goals (for this issue)

- Remote collectors or multi-user deployments
- Distributed search or external services
- Full Solr/Elasticsearch feature parity (faceting, aggregations, scoring explanations, etc.)
- Perfect Lucene scoring parity (ranking differences are acceptable; correctness of filtering is the priority)

## Proposed approach (recommended)

Use an embedded Go search index to avoid implementing a full Lucene parser + inverted index from scratch.

Recommendation:
- Use Bleve's query string query support as the parsing and execution engine for Lucene-like syntax.
- Keep BadgerDB as the source of truth for stored log entries.

Rationale:
- Query string syntax supports phrases, field scoping, regex, required/excluded operators, and boosting.
- Bleve supports query types we need (regexp, wildcard, fuzzy, numeric/date ranges, query string).
- Keeps local-first and single-binary (just adds a Go dependency and an on-disk index directory).

## User-visible query syntax (Lucene-style)

### Default field behavior (FTS)
- Unfielded terms query the default field (configurable), recommended default: message (and optionally a composite field).
  - `timeout refused`
  - `"connection refused"`

### Field scoping
- `service:api-gateway`
- `level:ERROR`

### Field existence (Lucene semantics)
- `request_id:*`
- `user_id:*`

Semantics: field is present and has at least one term indexed.

### Regex (Lucene query string style)
- `service:/^api-(gateway|edge)$/`
- `user_id:/^usr-[0-9]{4}$/`

Semantics: regex applies to indexed terms for that field.
Important note:
- For keyword fields (not analyzed), the term is the full field value, so regex behaves like "regex over the full value".
- For analyzed fields (like message), regex is term-level, not substring-over-full-text, consistent with Lucene behavior.

### Wildcards
- `service:api*`
- `request_id:req-??????` (if `?` is supported)
- `message:*timeout*` (term-level wildcard implications apply)

### Boolean and required/prohibited clauses
- `level:ERROR AND service:api`
- `+level:ERROR -service:auth`

### Boosting
- `error^2 timeout`

## Architecture changes

### Storage remains unchanged
- BadgerDB key format remains: `log:{timestamp_nano}:{id}`
- LogEntry JSON stays as-is.

### Add embedded index
Introduce an index directory (default under Peek data dir):
- `~/.peek/index` (or `${db_path}/index`)

Add configuration:
- `[search] enabled = true|false` (default false initially)
- `[search] index_path = "~/.peek/index"`
- `[search] default_field = "message"`
- `[search] include_in_all = ["message", "raw"]` (optional)
- `[search] field_mapping_mode = "dynamic|strict"`

CLI flags:
- `--search` (enable embedded index)
- `--search-index-path`
- `--search-default-field`

### Index document model

Index one document per log entry with a stable doc id:
- `docID = "{timestamp_nano}:{id}"`
- Badger key can be derived: `log:{timestamp_nano}:{id}`

Indexed fields (suggested):
- `timestamp` (datetime)
- `level` (keyword)
- `message` (text, analyzed)
- `raw` (text or keyword, optional)
- `fields.*` (dynamic)
  - strings: keyword by default
  - numbers: numeric
  - booleans: boolean
  - optional: allow marking specific fields as analyzed text via config (eg `fields.stacktrace`)

### Query execution path

When search index is enabled:
- `/query` executes the query string against the index to obtain matching docIDs (sorted by timestamp desc if possible).
- Fetch corresponding LogEntry values from BadgerDB and return them.

When search index is disabled:
- Use current scan-based filtering behavior (existing query engine), preserving backward compatibility.

### Realtime filtering (WS /logs)

Requirement: subscriptions should use the same query semantics as `/query`.

Preferred implementation:
- Compile subscription query once.
- For each new entry, evaluate match without running a full index query per entry per client.

Options:
A) Fast path (recommended):
- Implement a lightweight per-entry matcher for the supported query subset (existence, term, wildcard, regex, phrase on message) using the same analyzers as indexing.
- Use the index for historical queries, and the matcher for streaming.

B) Simpler but potentially expensive:
- Index the new entry, then run a docID-restricted query against the index to decide whether to push to each client.
- Add guardrails (max clients, rate limits) if this path is used.

Pick A if performance matters for 1k+ logs/sec.

## Migration and operational tooling

Add DB command to build/rebuild index:
- `peek db reindex` (scans existing Badger logs, builds index)

Add DB command to verify index health:
- `peek db index-stats` (doc count, size, last indexed timestamp)

Retention and deletes:
- Ensure when logs are deleted (db clean, retention), the corresponding documents are removed from the index.
- If implementing incremental deletes is complex, document that `reindex` is needed after bulk deletes for v1, but aim to support deletes properly.

## UI changes (pkg/server/index.html only)

- Update syntax highlighting to recognize:
  - regex literals: `field:/.../`
  - required/prohibited prefixes `+` and `-`
  - boosting `^n`
  - existence `field:*`
- Autocomplete remains based on `/fields`. No new UI dependencies.

Critical invariants:
- No scroll resets when queries run, columns change, or state restores.
- Immutable VanJS state updates.

## Testing plan

### Unit tests (Go)
- Query parsing acceptance tests for:
  - `field:*` exists
  - `field:/regex/`
  - phrases `"..."` and field-scoped phrases `message:"..."`
  - required/prohibited `+` / `-`
  - wildcards `*` and `?` (if supported)
- Indexing tests:
  - Correct docID mapping
  - Dynamic field mapping for string/number/bool
- Query execution tests:
  - Results match expected docIDs
  - Time range filters (`timestamp:[start TO end]`) behave correctly
- Delete/retention tests:
  - Deleting logs removes index docs (or documented reindex requirement)

### E2E tests (Playwright)

Add: `e2e/lucene-query.spec.mjs`

Test cases (minimum):

1) Field existence
- Seed logs where some have `request_id`, others do not.
- Query `request_id:*`
- Assert only logs with that field are shown.

2) Regex on keyword field
- Seed services: `api-gateway`, `api-edge`, `auth-service`
- Query `service:/^api-(gateway|edge)$/`
- Assert only the api services match.

3) FTS on message (default field)
- Seed messages: "connection timeout", "connection refused", "all good"
- Query `timeout`
- Assert only timeout logs match.
- Query `"connection refused"`
- Assert phrase match returns the correct entry.

4) Required/prohibited clauses
- Seed mixed logs
- Query `+level:ERROR -service:auth`
- Assert results include only ERROR and exclude auth.

5) Wildcard
- Query `service:api*`
- Assert correct matches.

6) Backward compatibility path (optional)
- Run same dataset with search index disabled and confirm old behavior still works (or document differences if unavoidable).

Follow existing E2E conventions:
- Use `e2e/helpers.mjs` startServer/stopServer pattern
- Isolated ports and temp DB path
- Polling assertions, avoid timing flakiness

## Acceptance criteria

- [ ] Query string syntax supports: unfielded terms (FTS), field scoping, phrases, regex, existence `field:*`, wildcards, boolean, `+` and `-`, boosting syntax accepted.
- [ ] `/query` returns correct results using the embedded index when enabled.
- [ ] WS subscriptions apply the same query semantics for streaming.
- [ ] `peek db reindex` builds an index for an existing DB.
- [ ] Index stays consistent with deletes/retention, or reindex requirement is clearly documented for v1.
- [ ] E2E tests added and passing in CI.
- [ ] `AGENTS.md` updated in the same PR to reflect new commands, files, and dependencies.
- [ ] `/docs/README.md` updated with technical details; `/README.md` updated only with user-facing query syntax and flags.

## Implementation checklist

- [ ] Add embedded search index (config, path, enable flag)
- [ ] Define mapping (keyword vs analyzed vs numeric/datetime)
- [ ] Index on ingest and on `reindex`
- [ ] Implement `/query` execution via index + Badger fetch
- [ ] Implement WS per-entry matching strategy (prefer compiled matcher)
- [ ] Update UI syntax highlighter (index.html only)
- [ ] Add Go unit tests for parsing/matching and index integration
- [ ] Add `e2e/lucene-query.spec.mjs`
- [ ] Update docs and AGENTS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add indexed query support for field existence, regex, and full-text search (FTS) while staying local-first #38

Summary

Motivation

Goals

Non-goals (for this issue)

Proposed approach (recommended)

User-visible query syntax (Lucene-style)

Default field behavior (FTS)

Field scoping

Field existence (Lucene semantics)

Regex (Lucene query string style)

Wildcards

Boolean and required/prohibited clauses

Boosting

Architecture changes

Storage remains unchanged

Add embedded index

Index document model

Query execution path

Realtime filtering (WS /logs)

Migration and operational tooling

UI changes (pkg/server/index.html only)

Testing plan

Unit tests (Go)

E2E tests (Playwright)

Acceptance criteria

Implementation checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add indexed query support for field existence, regex, and full-text search (FTS) while staying local-first #38

Description

Summary

Motivation

Goals

Non-goals (for this issue)

Proposed approach (recommended)

User-visible query syntax (Lucene-style)

Default field behavior (FTS)

Field scoping

Field existence (Lucene semantics)

Regex (Lucene query string style)

Wildcards

Boolean and required/prohibited clauses

Boosting

Architecture changes

Storage remains unchanged

Add embedded index

Index document model

Query execution path

Realtime filtering (WS /logs)

Migration and operational tooling

UI changes (pkg/server/index.html only)

Testing plan

Unit tests (Go)

E2E tests (Playwright)

Acceptance criteria

Implementation checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions