Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ name: Tests

on:
push:
branches: [main, master]
pull_request:
branches: [main, master]
workflow_dispatch:
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,6 @@ htmlcov/
# Old marker files (now stored in XDG data dir)
*_marker
*_results_*.out

# Claude Code temp files
tmpclaude-*
36 changes: 36 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,42 @@ ruff check src/
pytest
```

## Editable Installs

When installed with `pip install -e .` (editable mode), the `cetus` command runs directly from source code in `src/cetus/`. Any changes to the source files are immediately reflected without reinstalling.

**How it works:**
- The installed package contains a `.pth` file pointing to the source directory
- Python imports modules directly from `src/cetus/` at runtime
- Entry point scripts (like `cetus.exe`) invoke `cetus.cli:main` from source

**When to use:**
- Development and testing - changes are instant
- Debugging - breakpoints and print statements work immediately

**Note:** The venv at the repo root (`alerting_app/.venv`) is shared with the Django app. Install cetus-client from the repo root:
```bash
pip install -e "./cetus-client[dev]"
```

## Version Management

Version is defined in **one place only**: `pyproject.toml`

The `src/cetus/__init__.py` uses `importlib.metadata.version()` to read it at runtime:
```python
from importlib.metadata import version
__version__ = version("cetus-client")
```

**When bumping version:** Only update `version = "X.Y.Z"` in `pyproject.toml`

**In tests:** Never hardcode version strings. Import from the package:
```python
from cetus import __version__
assert f"cetus-client/{__version__}" in user_agent
```

## Architecture

The CLI is built with Click and uses httpx for HTTP requests. All source code is in `src/cetus/`.
Expand Down
159 changes: 128 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,91 @@ cetus query "A:192.168.1.1" --format table
cetus alerts list
```

## Operating Modes

Cetus has two primary operating modes designed for different use cases:

### Direct Mode (stdout)

**For:** Interactive exploration, piping to other tools, one-off queries

Direct mode outputs results to stdout with no state tracking. Each query is independent - you get exactly what you ask for, nothing more.

```bash
# Interactive exploration
cetus query "host:*.example.com" --format table

# Pipe to jq for processing
cetus query "host:*.example.com" | jq '.[].host'

# Chain with other tools
cetus query "A:192.168.1.*" | jq -r '.[].host' | sort -u
```

**Characteristics:**
- Results go to stdout (terminal or pipe)
- No markers - queries are stateless
- Full query results returned every time
- Default format: `json`

### Collector Mode (file output)

**For:** Data collection, scheduled exports, building datasets over time

Collector mode writes to files and tracks your position using markers. Subsequent runs fetch only new records since the last query, making it efficient for ongoing data collection.

```bash
# First run: fetches last 7 days, creates file
cetus query "host:*.example.com" -o results.jsonl
# Output: Wrote 1,523 records to results.jsonl

# Later runs: fetches only NEW records, appends to file
cetus query "host:*.example.com" -o results.jsonl
# Output: Resuming from: 2025-01-14T10:30:00
# Appended 47 records to results.jsonl

# No new data? File unchanged
cetus query "host:*.example.com" -o results.jsonl
# Output: Resuming from: 2025-01-14T15:42:18
# No new records (file unchanged)
```

**Characteristics:**
- Results written to file (`-o` or `-p`)
- Markers track last-seen record per query
- Incremental updates - only fetches new data
- Appends to existing files (or creates timestamped files with `-p`)
- Default format: `json` (recommended: `jsonl`)

**Two file output options:**

| Option | Behavior | Use Case |
|--------|----------|----------|
| `-o FILE` | Appends to same file | Cumulative dataset |
| `-p PREFIX` | Creates timestamped files | Export pipelines, archival |

**Important:** `-o` and `-p` maintain separate markers. You can use both modes
for the same query without data gaps - each tracks its own position independently.

```bash
# -o: Single cumulative file
cetus query "host:*.example.com" -o dns_data.jsonl
# Always writes to: dns_data.jsonl

# -p: Timestamped files per run
cetus query "host:*.example.com" -p exports/dns
# Creates: exports/dns_2025-01-14_10-30-00.jsonl
# Next run: exports/dns_2025-01-14_14-45-00.jsonl
```

**Switching modes:** Use `--no-marker` to run a collector-mode query without markers (full re-query, overwrites file):

```bash
cetus query "host:*.example.com" --no-marker --since-days 30 -o full_export.jsonl
```

---

## Commands

### query
Expand All @@ -68,59 +153,71 @@ cetus query SEARCH [OPTIONS]
| `-i, --index` | Index: `dns`, `certstream`, `alerting` (default: dns) |
| `-m, --media` | Storage tier: `nvme` (fast), `all` (complete) |
| `-f, --format` | Output: `json`, `jsonl`, `csv`, `table` |
| `-o, --output FILE` | Write to file instead of stdout |
| `-d, --since-days N` | Look back N days (default: 7) |
| `--stream` | Stream results as they arrive |
| `--no-marker` | Disable incremental query tracking |
| `-o, --output FILE` | Collector mode: write to file (enables markers) |
| `-p, --output-prefix PREFIX` | Collector mode: timestamped files (e.g., `prefix_2025-01-14_10-30-00.jsonl`) |
| `-d, --since-days N` | Look back N days (default: 7, ignored if marker exists) |
| `--stream` | Stream results as they arrive (large queries) |
| `--no-marker` | Disable incremental tracking (full re-query) |

**Examples:**

```bash
# Basic query
cetus query "host:*.example.com"

# Pipe to jq for processing
cetus query "host:*.example.com" | jq '.[].host'
# Direct mode - interactive queries
cetus query "host:*.example.com" # JSON to stdout
cetus query "host:*.example.com" --format table # Human-readable
cetus query "host:*.example.com" | jq '.[].host' # Pipe to tools

# Table format for human reading
cetus query "A:10.0.0.1" --format table
# Collector mode - data collection
cetus query "host:*.example.com" -o results.jsonl # Incremental collection
cetus query "host:*.example.com" -p exports/dns # Timestamped exports

# Save to file
cetus query "host:*.example.com" -o results.json

# Stream large results (uses jsonl format)
# Stream large results
cetus query "host:*" --stream -o all_records.jsonl

# Query certificate transparency logs
# Query other indices
cetus query "leaf_cert.subject.CN:*.example.com" --index certstream
cetus query "alert_type:dns_match" --index alerting

# Look back 30 days
cetus query "host:example.com" --since-days 30
# Full re-query (ignore markers)
cetus query "host:*.example.com" --no-marker --since-days 30 -o full.jsonl
```

### Incremental Queries
### Collector Mode Details

The client tracks your queries using markers. First run fetches N days of data; subsequent runs fetch only new records.
**Markers** track your position so subsequent queries fetch only new records:

```bash
# First run: fetches last 7 days
cetus query "host:*.example.com" -o results.jsonl
cetus markers list # Show all markers
cetus markers clear # Clear all markers
cetus markers clear --index dns # Clear only DNS markers
```

# Later runs: fetches only new data
cetus query "host:*.example.com" -o results.jsonl
**Console feedback** shows what's happening:

# Skip markers for a full query
cetus query "host:*.example.com" --no-marker --since-days 30
```
# Starting incremental query with existing marker:
Resuming from: 2025-01-14T10:30:00
Fetched 1,523 records (page 2)...
Appended 47 records to results.jsonl in 2.34s

Manage markers:
# No new records (file exists):
Resuming from: 2025-01-14T15:42:18
No new records (file unchanged) in 0.45s

```bash
cetus markers list # Show all markers
cetus markers clear # Clear all markers
cetus markers clear --index dns # Clear only DNS markers
# No new records (first run, no data in time range):
No new records since last query (no file written) in 0.38s
```

**Recommended format:** `jsonl` (JSON Lines)
- Efficient append operations
- Easy to process: `wc -l`, `grep`, `jq -s`
- No rewriting of existing data

Other formats:
- `csv`: Appends rows without repeating header
- `json`: Merges into existing array (requires rewriting file)
- `table`: Not recommended for file output

### alerts list

List alert definitions.
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "cetus-client"
version = "0.0.2"
version = "0.0.3"
description = "CLI client for the Cetus threat intelligence alerting API"
readme = "README.md"
requires-python = ">=3.10"
Expand Down
4 changes: 3 additions & 1 deletion src/cetus/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
"""Cetus CLI - Client for the Cetus threat intelligence alerting API."""

__version__ = "0.0.1"
from importlib.metadata import version

__version__ = version("cetus-client")
Loading