feat(query): add field existence, regex, +/- prefix, boost, and ? wildcard query support#39
feat(query): add field existence, regex, +/- prefix, boost, and ? wildcard query support#39
Conversation
…uery support Co-authored-by: mchurichi <633182+mchurichi@users.noreply.github.com>
…review Co-authored-by: mchurichi <633182+mchurichi@users.noreply.github.com>
|
@codex review this |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cf09af7297
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if len(value) >= 2 && value[len(value)-1] == '/' { | ||
| return value[1 : len(value)-1] |
There was a problem hiding this comment.
Continue parsing regex after escaped trailing slash
extractRegex currently treats any token ending in / as a complete regex literal, but readToken can stop early at ( or ). For a valid query like path:/foo\/(bar|baz)/, the partial token seen here is "/foo\\/"; this branch returns foo\\ and regexp.Compile fails, so valid regex queries are rejected whenever an escaped slash appears before a parenthesized part.
Useful? React with 👍 / 👎.
| if (c === ':' && j + 1 < text.length && text[j + 1] === '/') { | ||
| inRegex = true |
There was a problem hiding this comment.
Limit regex-mode detection to field values that start with '/'
The highlighter switches to regex mode on any :/ sequence, even when it appears later inside a normal value. Queries such as url:https://example.com therefore get split as if https:/ started a regex literal, producing incorrect tokenization/highlighting for common URL filters. This should only trigger for the field/value delimiter followed immediately by /.
Useful? React with 👍 / 👎.
Extends the Lucene-style query parser with the missing syntax features requested in the issue:
field:*existence,field:/regex/, required/prohibited prefixes, boost acceptance, and?single-character wildcards. Also fixes bare quoted phrases ("connection refused") which previously searched with the quotes included.What
Go query engine (
pkg/query/lucene.go)ExistenceFilter—field:*matches entries where the field is presentRegexFilter—field:/regex/applies a compiledregexp.Regexp;extractRegex()handles patterns containing()that the token reader would otherwise cut short+prefix consumed as a no-op (default AND);-prefix wraps the next filter inNotFilterstripBoost()strips trailing^nfrom tokens — accepted for syntax compatibility, ignored for filtering?wildcard inWildcardFilter: detection extended toContainsAny("*?"), converted to.in the regexKeywordFilter(previously searched with literal"characters)UI syntax highlighter (
pkg/server/index.html)--peek-purpleCSS var +.hl-regexclass for regex literalsinRegexstate so(/)inside/regex/don't terminate the token+/-prefixes emitted ashl-opin both field-scoped and bare-keyword positionstokenizeValuetreats?as a wildcard char alongside*Tests
e2e/lucene-query.spec.mjs— 8 Playwright tests: field existence, regex with alternation, FTS keyword, quoted phrase,+/-combined, wildcard, and UI highlightingWhy
The parser silently dropped several Lucene query constructs that users expect to work (
field:*,field:/regex/,+term,-term,term^n), and bare quoted phrases matched nothing because the surrounding"were included in the keyword search string.Related Issue
Closes #9
Testing Done
go test ./... -race -count=1)search.spec.mjsunaffectedgo vet ./...clean; CodeQL: 0 alertsOriginal prompt
This section details on the original issue you should resolve
<issue_title>Add indexed query support for field existence, regex, and full-text search (FTS) while staying local-first</issue_title>
<issue_description>## Summary
Implement Lucene-style query string support for Peek so users can query logs with syntax that is as close to Lucene QueryParser as possible, focusing on:
field:*)field:/.../)*,?), phrases ("..."), required/prohibited (+,-), boosting (^), and boolean logicMaintain local-first, single-binary distribution and the no-build-step UI model.
Motivation
Peek currently supports a small Lucene-like subset evaluated via scanning (with time-range key seeking). Users want Lucene-like expressiveness, specifically:
field:*existencefield:/regex/This needs to work both for querying historical logs and for realtime filtering in the UI.
Goals
field:*).field:/.../).pkg/server/index.html, no new frontend dependencies, immutable VanJS updates.Non-goals (for this issue)
Proposed approach (recommended)
Use an embedded Go search index to avoid implementing a full Lucene parser + inverted index from scratch.
Recommendation:
Rationale:
User-visible query syntax (Lucene-style)
Default field behavior (FTS)
timeout refused"connection refused"Field scoping
service:api-gatewaylevel:ERRORField existence (Lucene semantics)
request_id:*user_id:*Semantics: field is present and has at least one term indexed.
Regex (Lucene query string style)
service:/^api-(gateway|edge)$/user_id:/^usr-[0-9]{4}$/Semantics: regex applies to indexed terms for that field.
Important note:
Wildcards
service:api*request_id:req-??????(if?is supported)message:*timeout*(term-level wildcard implications apply)Boolean and required/prohibited clauses
level:ERROR AND service:api+level:ERROR -service:authBoosting
error^2 timeoutArchitecture changes
Storage remains unchanged
log:{timestamp_nano}:{id}Add embedded index
Introduce an index directory (default under Peek data dir):
~/.peek/index(or${db_path}/index)Add configuration:
[search] enabled = true|false(default false initially)[search] index_path = "~/.peek/index"[search] default_field = "message"[search] include_in_all = ["message", "raw"](optional)[search] field_mapping_mode = "dynamic|strict"CLI flags:
--search(enable embedded index)--search-index-path--search-default-fieldIndex document model
Index one document per log entry with a stable doc id:
docID = "{timestamp_nano}:{id}"log:{timestamp_nano}:{id}Indexed fields (suggested):
timestamp(datetime)level(keyword)message(text, analyzed)raw(text or keyword, optional)fields.*(dynamic)fields.stacktrace)Query execution path
When search index is enabled:
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.