auto1-oss · aydasraf · Mar 18, 2026
diff --git a/.gitignore b/.gitignore
@@ -19,6 +19,7 @@ classes/
 .storage/data/**
 .storage/data/test/**
 artipie-main/docker-compose/artipie/data/**
+artipie-main/docker-compose/artipie/prod_repo/**
 artipie-main/docker-compose/artipie/artifacts/npm/node_modules/**
 artipie-main/docker-compose/artipie/cache/**
 artipie-main/docker-compose/artipie/artifacts/php/vendor/**
@@ -29,4 +30,16 @@ artipie-main/docker-compose/artipie/artifacts/php/vendor/**
 artipie-main/docker-compose/.env
 
 # AI agent task/analysis documents - not part of product documentation
-agents/
+agents/
+
+# Git worktrees
+.worktrees/
+/benchmark/fixtures
+/benchmark/results
+/docs/plans
+/docs/superpowers
+*.csv
+*.png
+/.superpowers
+/artipie-ui/mockups
+artipie-backfill/dependency-reduced-pom.xml
diff --git a/.wiki/Configuration-HA.md b/.wiki/Configuration-HA.md
@@ -0,0 +1,156 @@
+# HA Deployment
+
+Artipie supports multi-instance deployment for high availability. Instances
+coordinate through a shared PostgreSQL database and a shared Valkey (Redis-compatible)
+instance so that metadata, events, and job scheduling remain consistent across the
+cluster.
+
+## Requirements
+
+| Component             | Minimum version | Purpose                                 |
+|-----------------------|-----------------|-----------------------------------------|
+| PostgreSQL            | 15+             | Artifacts metadata, Quartz job store    |
+| Valkey / Redis        | 7+              | Cross-instance event bus (pub/sub)      |
+| S3-compatible storage | -               | Shared artifact storage                 |
+| Load balancer (nginx) | -               | Request distribution and health checks  |
+
+All instances must be able to reach the same PostgreSQL database, the same
+Valkey instance, and the same S3-compatible storage bucket.
+
+## Architecture
+
+Each Artipie instance registers itself in the `artipie_nodes` table on startup and
+sends periodic heartbeats so that the cluster knows which nodes are alive.
+
+`ClusterEventBus` uses Valkey pub/sub channels to broadcast notifications across
+instances. When one instance receives an artifact upload, the event is published to
+Valkey so that other instances can update caches or indexes accordingly.
+
+Quartz is configured with JDBC clustering (`org.quartz.jobStore.isClustered = true`)
+to prevent duplicate execution of scheduled jobs such as metadata flush or proxy
+cache verification. Only one instance in the cluster will execute a given job trigger
+at any point in time.
+
+## Configuration
+
+An example HA configuration file is provided at `docs/ha-deployment/artipie-ha.yml`.
+The key points are:
+
+- **Storage must use S3** (not the local filesystem) so that all instances share the
+  same artifact data. A filesystem backend would result in each instance having its own
+  isolated copy of the data.
+
+- **All instances must share the same PostgreSQL database.** The `artifacts_database`
+  section in the main configuration file must point to the same host, port, and
+  database name on every instance.
+
+- **All instances must share the same Valkey instance.** The `valkey` section in the
+  main configuration must use the same connection details everywhere.
+
+- **JWT secrets must be identical across instances.** If one instance issues a JWT
+  token, any other instance must be able to validate it. Configure the same
+  `jwt_secret` value on every node.
+
+```yaml
+meta:
+  storage:
+    type: s3
+    bucket: artipie-data
+    region: us-east-1
+    endpoint: http://minio:9000
+    credentials:
+      type: basic
+      accessKeyId: minioadmin
+      secretAccessKey: minioadmin
+  artifacts_database:
+    postgres_host: postgres
+    postgres_port: 5432
+    postgres_database: artipie
+    postgres_user: artipie
+    postgres_password: artipie
+    threads_count: 4
+    interval_seconds: 2
+  valkey:
+    host: valkey
+    port: 6379
+```
+
+## nginx setup
+
+An example nginx configuration is provided at `docs/ha-deployment/nginx-ha.conf`.
+The recommended setup uses:
+
+- `least_conn` load-balancing algorithm to distribute requests to the instance with
+  the fewest active connections.
+- `keepalive 64` on the upstream block to reuse connections to backend instances and
+  reduce latency.
+- Passive health checks via the `/.health` endpoint. nginx marks a backend as down
+  after a configurable number of failed requests and re-checks periodically.
+
+```nginx
+upstream artipie {
+    least_conn;
+    server artipie-1:8080;
+    server artipie-2:8080;
+    server artipie-3:8080;
+    keepalive 64;
+}
+
+server {
+    listen 80;
+
+    location / {
+        proxy_pass http://artipie;
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+        proxy_http_version 1.1;
+        proxy_set_header Connection "";
+    }
+
+    location /.health {
+        proxy_pass http://artipie;
+        proxy_connect_timeout 5s;
+        proxy_read_timeout 5s;
+    }
+}
+```
+
+## Docker Compose
+
+A ready-to-use Docker Compose file for a 3-instance deployment is provided at
+`docs/ha-deployment/docker-compose-ha.yml`. It includes:
+
+- 3 Artipie instances behind an nginx load balancer
+- PostgreSQL 15 for metadata and Quartz job store
+- Valkey 7 for the cross-instance event bus
+- MinIO for S3-compatible shared storage
+- nginx configured with `least_conn` and passive health checks
+
+To start the cluster:
+
+```bash
+cd docs/ha-deployment
+docker compose -f docker-compose-ha.yml up -d
+```
+
+After startup, the Artipie API is available through the nginx load balancer on port 80
+and each instance exposes its own health endpoint at `/.health`.
+
+## Monitoring
+
+The [health endpoint](Configuration-Health) at `GET /.health` returns per-component
+status for each instance. In an HA deployment, configure your load balancer or external
+monitoring system to poll `/.health` on each instance independently.
+
+Recommended monitoring setup:
+
+- **nginx passive health checks**: rely on the upstream `max_fails` / `fail_timeout`
+  directives to automatically remove unhealthy instances from the pool.
+- **External monitoring**: poll each instance's `/.health` endpoint every 10 seconds.
+  Alert when any instance returns `unhealthy` (HTTP 503) for more than 3 consecutive
+  checks.
+- **Prometheus metrics**: each instance exposes its own `/metrics/vertx` endpoint (see
+  [Metrics](Configuration-Metrics)). Aggregate across instances in Prometheus or
+  Grafana for a cluster-wide view.
diff --git a/.wiki/Configuration-Health.md b/.wiki/Configuration-Health.md
@@ -0,0 +1,120 @@
+# Health Checks
+
+Artipie exposes a built-in health endpoint that reports the status of every major
+subsystem. The endpoint is designed for load balancer integration and external
+monitoring.
+
+## Endpoint
+
+```
+GET /.health
+```
+
+No authentication is required. The endpoint is always available, even when JWT
+authentication is configured for the rest of the API.
+
+## Response format
+
+```json
+{
+  "status": "healthy",
+  "components": {
+    "storage": {"status": "up"},
+    "database": {"status": "up"},
+    "valkey": {"status": "not_configured"},
+    "quartz": {"status": "up"},
+    "http_client": {"status": "up"}
+  }
+}
+```
+
+The top-level `status` field is a roll-up of the individual component statuses.
+Each entry in `components` reports the state of a single subsystem.
+
+## Status values
+
+| Top-level status | Condition                                         | HTTP code |
+|------------------|---------------------------------------------------|-----------|
+| `healthy`        | All components report `up` (or `not_configured`)  | 200       |
+| `degraded`       | Exactly one non-storage component is down          | 200       |
+| `unhealthy`      | Storage is down, OR two or more components are down | 503       |
+
+The distinction between `degraded` and `unhealthy` allows load balancers to keep
+routing traffic to an instance that has a single non-critical failure (for example,
+Valkey is temporarily unreachable) while removing instances that cannot serve
+artifacts at all.
+
+## Component details
+
+### storage
+
+Tests the primary artifact storage by calling `list(Key.ROOT)` with a 5-second
+timeout. If the storage does not respond within the timeout, the component is
+reported as `down`. Because storage is the most critical subsystem, a storage
+failure alone is enough to mark the instance as `unhealthy`.
+
+### database
+
+Tests the PostgreSQL connection pool by calling `connection.isValid(5)` through
+HikariCP. This component is only checked when `artifacts_database` is configured
+in the main Artipie configuration file. If PostgreSQL is not configured, the
+component is omitted from the response.
+
+### valkey
+
+Tests connectivity to the Valkey (Redis-compatible) instance used for the
+cross-instance event bus. When Valkey is not configured (single-instance
+deployments), the status is reported as `not_configured` rather than `down`.
+
+### quartz
+
+Checks the Quartz scheduler state:
+`scheduler.isStarted() && !scheduler.isShutdown() && !scheduler.isInStandbyMode()`.
+If the scheduler has been shut down or placed in standby mode, the component is
+reported as `down`.
+
+### http_client
+
+Checks that the Jetty HTTP client used for proxy and remote operations is running
+and operational. A `down` status typically indicates a resource exhaustion problem
+(thread pool or connection pool).
+
+## HTTP status codes
+
+| Code | Meaning                                                     |
+|------|-------------------------------------------------------------|
+| 200  | Instance is `healthy` or `degraded` -- safe to route traffic |
+| 503  | Instance is `unhealthy` -- remove from load balancer pool    |
+
+## Load balancer integration
+
+Use the `/.health` endpoint as the health check target for nginx, HAProxy, AWS ALB,
+or any other load balancer.
+
+### nginx example
+
+```nginx
+upstream artipie {
+    least_conn;
+    server artipie-1:8080 max_fails=3 fail_timeout=30s;
+    server artipie-2:8080 max_fails=3 fail_timeout=30s;
+    keepalive 64;
+}
+```
+
+With the configuration above, nginx uses passive health checks: after 3 consecutive
+failed requests (including 503 responses from `/.health`), the backend is marked as
+unavailable for 30 seconds before nginx retries it.
+
+### Recommended settings
+
+| Parameter       | Recommended value | Description                           |
+|-----------------|-------------------|---------------------------------------|
+| Check interval  | 10s               | How often to poll `/.health`          |
+| Failure threshold | 3               | Consecutive failures before marking down |
+| Success threshold | 1               | Consecutive successes before marking up  |
+| Timeout         | 5s                | Maximum time to wait for a response   |
+
+For [HA deployments](Configuration-HA), configure health checks independently for
+each Artipie instance so that a single unhealthy node is removed from the pool
+without affecting the others.
diff --git a/.wiki/Configuration-Metadata.md b/.wiki/Configuration-Metadata.md
@@ -34,9 +34,61 @@ The database has a single table `artifacts` with the following structure:
 
 All fields are NOT NULL; a UNIQUE constraint is created on `(repo_name, name, version)`.
 
+### Full-text search (tsvector)
+
+The `artifacts` table includes a `search_tokens` column of type `tsvector`. This column is
+auto-populated by a PostgreSQL trigger on every INSERT and UPDATE. The trigger concatenates
+`repo_name`, `name`, and `version` into a single text-search vector so no application-side
+indexing is required.
+
+Full-text queries use `ts_rank()` to order results by relevance. When the query string
+contains wildcard characters (`*` or `?`), the search engine automatically falls back to
+`LIKE`-based matching so that glob-style patterns still work.
+
+### Connection pool environment variables
+
+The following environment variables tune the HikariCP connection pool used for PostgreSQL.
+They can be set as system environment variables or passed via `-D` JVM flags:
+
+| Variable                           | Default   | Description                                      |
+|------------------------------------|-----------|--------------------------------------------------|
+| `ARTIPIE_DB_CONNECTION_TIMEOUT_MS` | 5000      | Maximum time (ms) to wait for a connection       |
+| `ARTIPIE_DB_IDLE_TIMEOUT_MS`      | 600000    | Maximum time (ms) a connection may sit idle       |
+| `ARTIPIE_DB_MAX_LIFETIME_MS`      | 1800000   | Maximum lifetime (ms) of a connection in the pool |
+
+These defaults are suitable for most single-instance deployments. In HA setups with many
+concurrent writers you may need to lower `ARTIPIE_DB_IDLE_TIMEOUT_MS` or raise the pool
+size to avoid connection starvation.
+
 Migration note: earlier versions supported SQLite via `sqlite_data_file_path`. This is deprecated in favor of PostgreSQL.
 Please migrate your data and update the configuration to use the `postgres_*` settings.
 
+## Artifact Index (Lucene)
+
+Artipie supports a Lucene-based artifact index for fast O(1) group repository lookups.
+To enable this, add the following section to the main configuration file (`meta` section):
+
+```yaml
+meta:
+  artifact_index:
+    enabled: true                     # Enable Lucene artifact index (default: false)
+    directory: /var/artipie/index     # Path for Lucene index files (required if enabled)
+    warmup_on_startup: true           # Scan repos on startup to populate index (default: true)
+```
+
+| Field              | Required | Default | Description                                              |
+|--------------------|----------|---------|----------------------------------------------------------|
+| enabled            | no       | false   | Enable or disable the Lucene artifact index              |
+| directory          | yes*     | -       | Filesystem path for Lucene index files (*required if enabled) |
+| warmup_on_startup  | no       | true    | Scan all repository storage on startup to populate index |
+
+When the index is enabled:
+- On startup, `IndexWarmupService` scans all repository storage to build the initial index (unless `warmup_on_startup` is `false`)
+- During warmup, group repositories fall back to querying all members (fan-out)
+- Once warmup completes, group lookups return immediately from the index
+- Artifact uploads and deletes automatically update the index via the event pipeline
+- The REST API exposes search and stats endpoints under `/api/v1/search/`
+
 ## Maven, NPM and PyPI proxy adapters
 
 [Maven-proxy](maven-proxy), [npm-proxy](npm-proxy) and [python-proxy](pypi-proxy) have some extra mechanism to process