From 8d2bf6dc790eb4d17f5d046d774e3cd67e8e39cc Mon Sep 17 00:00:00 2001 From: Midia Kiasat Date: Mon, 2 Mar 2026 20:21:38 +0100 Subject: [PATCH] docs: standardize readme baseline --- README.md | 143 +++++++++--------------------------------------------- 1 file changed, 24 insertions(+), 119 deletions(-) diff --git a/README.md b/README.md index 38942c5..e71af5a 100644 --- a/README.md +++ b/README.md @@ -1,141 +1,46 @@ # MAILSIEVE -MAILSIEVE is a command-line tool for discovering publicly listed business email addresses from domains, with an emphasis on **rate-limiting, resumability, and evidence logging**. +## Purpose -It is designed for research, compliance checks, and operational workflows where **polite crawling and auditability** matter. +External operational tooling for intake, triage, and routing (support ops). ---- +## Status -## Features +- **Stability**: Experimental +- **SemVer**: Not guaranteed until v1.0.0 +- **Security**: See **Security** section below -- Domain-based email discovery -- Safe resume via `processed.txt` -- Polite rate-limiting and concurrency controls -- CSV output (append-only) -- Evidence logging (GDPR-trimmed) -- Parallel execution with controlled fan-out +## Scope ---- +- What this repo is responsible for +- What it explicitly does **not** do -## Installation - -Requires **Node.js ≥ 18**. +## Quickstart ```bash -git clone https://github.com/midiakiasat/MAILSIEVE.git +# clone +git clone https://github.com/Verifrax/MAILSIEVE.git cd MAILSIEVE -npm install -chmod +x batch-run.sh -```` - ---- - -## Basic Usage - -### 1. Prepare input domains - -Create a file named `domains.txt`: - -```txt -example.com -https://anotherdomain.it -www.somedomain.org -``` - -One domain per line. -URLs are normalized automatically. - ---- - -### 2. Run the batch processor - -```bash -./batch-run.sh -``` - -MAILSIEVE will: - -* skip domains already listed in `processed.txt` -* append results to `results.csv` -* log trimmed evidence to `logs/evidence.jsonl` ---- - -### 3. Check progress - -```bash -wc -l domains.txt processed.txt results.csv -tail -n 5 results.csv +# install (adjust if needed) +# (placeholder) npm install / pnpm install / go test ./... / etc. ``` ---- - -## Configuration (Environment Variables) - -You can tune behavior without editing code: - -### Concurrency - -```bash -POLITE_CONCURRENCY=3 ./batch-run.sh -``` - -### Slower / safer crawling - -```bash -RATE_MS=1500 TIMEOUT_MS=20000 POLITE_CONCURRENCY=1 ./batch-run.sh -``` - -### Verbose output - -```bash -QUIET_ENV=0 ./batch-run.sh -``` - ---- - -## Reset a Run - -To start fresh: - -```bash -rm -f processed.txt results.csv -rm -rf .cache/http -rm -f logs/evidence.jsonl -``` - -Then rerun: - -```bash -./batch-run.sh -``` - ---- - -## Output Files - -| File | Purpose | -| --------------------- | ------------------------- | -| `results.csv` | Discovered emails | -| `processed.txt` | Domains already processed | -| `logs/evidence.jsonl` | Minimal evidence trail | - ---- - -## Legal & Ethical Use +## Repository layout -MAILSIEVE **only processes publicly available information**. +- `/` Root sources +- `/.github/` Issue + PR templates +- `/docs/` Documentation (if present) -You are responsible for ensuring that your usage complies with: +## Security -* local laws and regulations -* website terms of service -* data protection frameworks (e.g. GDPR) +- Report vulnerabilities privately: **security@verifrax.org** +- Do **not** open public issues for sensitive findings -This tool is provided **as-is**, without warranty. +## Contributing ---- +See `CONTRIBUTING.md`. ## License -See [`LICENSE`](./LICENSE). +MIT. See `LICENSE`.