Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
15 changes: 14 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ classes/
.storage/data/**
.storage/data/test/**
artipie-main/docker-compose/artipie/data/**
artipie-main/docker-compose/artipie/prod_repo/**
artipie-main/docker-compose/artipie/artifacts/npm/node_modules/**
artipie-main/docker-compose/artipie/cache/**
artipie-main/docker-compose/artipie/artifacts/php/vendor/**
Expand All @@ -29,4 +30,16 @@ artipie-main/docker-compose/artipie/artifacts/php/vendor/**
artipie-main/docker-compose/.env

# AI agent task/analysis documents - not part of product documentation
agents/
agents/

# Git worktrees
.worktrees/
/benchmark/fixtures
/benchmark/results
/docs/plans
/docs/superpowers
*.csv
*.png
/.superpowers
/artipie-ui/mockups
artipie-backfill/dependency-reduced-pom.xml
156 changes: 156 additions & 0 deletions .wiki/Configuration-HA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# HA Deployment

Artipie supports multi-instance deployment for high availability. Instances
coordinate through a shared PostgreSQL database and a shared Valkey (Redis-compatible)
instance so that metadata, events, and job scheduling remain consistent across the
cluster.

## Requirements

| Component | Minimum version | Purpose |
|-----------------------|-----------------|-----------------------------------------|
| PostgreSQL | 15+ | Artifacts metadata, Quartz job store |
| Valkey / Redis | 7+ | Cross-instance event bus (pub/sub) |
| S3-compatible storage | - | Shared artifact storage |
| Load balancer (nginx) | - | Request distribution and health checks |

All instances must be able to reach the same PostgreSQL database, the same
Valkey instance, and the same S3-compatible storage bucket.

## Architecture

Each Artipie instance registers itself in the `artipie_nodes` table on startup and
sends periodic heartbeats so that the cluster knows which nodes are alive.

`ClusterEventBus` uses Valkey pub/sub channels to broadcast notifications across
instances. When one instance receives an artifact upload, the event is published to
Valkey so that other instances can update caches or indexes accordingly.

Quartz is configured with JDBC clustering (`org.quartz.jobStore.isClustered = true`)
to prevent duplicate execution of scheduled jobs such as metadata flush or proxy
cache verification. Only one instance in the cluster will execute a given job trigger
at any point in time.

## Configuration

An example HA configuration file is provided at `docs/ha-deployment/artipie-ha.yml`.
The key points are:

- **Storage must use S3** (not the local filesystem) so that all instances share the
same artifact data. A filesystem backend would result in each instance having its own
isolated copy of the data.

- **All instances must share the same PostgreSQL database.** The `artifacts_database`
section in the main configuration file must point to the same host, port, and
database name on every instance.

- **All instances must share the same Valkey instance.** The `valkey` section in the
main configuration must use the same connection details everywhere.

- **JWT secrets must be identical across instances.** If one instance issues a JWT
token, any other instance must be able to validate it. Configure the same
`jwt_secret` value on every node.

```yaml
meta:
storage:
type: s3
bucket: artipie-data
region: us-east-1
endpoint: http://minio:9000
credentials:
type: basic
accessKeyId: minioadmin
secretAccessKey: minioadmin
artifacts_database:
postgres_host: postgres
postgres_port: 5432
postgres_database: artipie
postgres_user: artipie
postgres_password: artipie
threads_count: 4
interval_seconds: 2
valkey:
host: valkey
port: 6379
```

## nginx setup

An example nginx configuration is provided at `docs/ha-deployment/nginx-ha.conf`.
The recommended setup uses:

- `least_conn` load-balancing algorithm to distribute requests to the instance with
the fewest active connections.
- `keepalive 64` on the upstream block to reuse connections to backend instances and
reduce latency.
- Passive health checks via the `/.health` endpoint. nginx marks a backend as down
after a configurable number of failed requests and re-checks periodically.

```nginx
upstream artipie {
least_conn;
server artipie-1:8080;
server artipie-2:8080;
server artipie-3:8080;
keepalive 64;
}

server {
listen 80;

location / {
proxy_pass http://artipie;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Connection "";
}

location /.health {
proxy_pass http://artipie;
proxy_connect_timeout 5s;
proxy_read_timeout 5s;
}
}
```

## Docker Compose

A ready-to-use Docker Compose file for a 3-instance deployment is provided at
`docs/ha-deployment/docker-compose-ha.yml`. It includes:

- 3 Artipie instances behind an nginx load balancer
- PostgreSQL 15 for metadata and Quartz job store
- Valkey 7 for the cross-instance event bus
- MinIO for S3-compatible shared storage
- nginx configured with `least_conn` and passive health checks

To start the cluster:

```bash
cd docs/ha-deployment
docker compose -f docker-compose-ha.yml up -d
```

After startup, the Artipie API is available through the nginx load balancer on port 80
and each instance exposes its own health endpoint at `/.health`.

## Monitoring

The [health endpoint](Configuration-Health) at `GET /.health` returns per-component
status for each instance. In an HA deployment, configure your load balancer or external
monitoring system to poll `/.health` on each instance independently.

Recommended monitoring setup:

- **nginx passive health checks**: rely on the upstream `max_fails` / `fail_timeout`
directives to automatically remove unhealthy instances from the pool.
- **External monitoring**: poll each instance's `/.health` endpoint every 10 seconds.
Alert when any instance returns `unhealthy` (HTTP 503) for more than 3 consecutive
checks.
- **Prometheus metrics**: each instance exposes its own `/metrics/vertx` endpoint (see
[Metrics](Configuration-Metrics)). Aggregate across instances in Prometheus or
Grafana for a cluster-wide view.
120 changes: 120 additions & 0 deletions .wiki/Configuration-Health.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Health Checks

Artipie exposes a built-in health endpoint that reports the status of every major
subsystem. The endpoint is designed for load balancer integration and external
monitoring.

## Endpoint

```
GET /.health
```

No authentication is required. The endpoint is always available, even when JWT
authentication is configured for the rest of the API.

## Response format

```json
{
"status": "healthy",
"components": {
"storage": {"status": "up"},
"database": {"status": "up"},
"valkey": {"status": "not_configured"},
"quartz": {"status": "up"},
"http_client": {"status": "up"}
}
}
```

The top-level `status` field is a roll-up of the individual component statuses.
Each entry in `components` reports the state of a single subsystem.

## Status values

| Top-level status | Condition | HTTP code |
|------------------|---------------------------------------------------|-----------|
| `healthy` | All components report `up` (or `not_configured`) | 200 |
| `degraded` | Exactly one non-storage component is down | 200 |
| `unhealthy` | Storage is down, OR two or more components are down | 503 |

The distinction between `degraded` and `unhealthy` allows load balancers to keep
routing traffic to an instance that has a single non-critical failure (for example,
Valkey is temporarily unreachable) while removing instances that cannot serve
artifacts at all.

## Component details

### storage

Tests the primary artifact storage by calling `list(Key.ROOT)` with a 5-second
timeout. If the storage does not respond within the timeout, the component is
reported as `down`. Because storage is the most critical subsystem, a storage
failure alone is enough to mark the instance as `unhealthy`.

### database

Tests the PostgreSQL connection pool by calling `connection.isValid(5)` through
HikariCP. This component is only checked when `artifacts_database` is configured
in the main Artipie configuration file. If PostgreSQL is not configured, the
component is omitted from the response.

### valkey

Tests connectivity to the Valkey (Redis-compatible) instance used for the
cross-instance event bus. When Valkey is not configured (single-instance
deployments), the status is reported as `not_configured` rather than `down`.

### quartz

Checks the Quartz scheduler state:
`scheduler.isStarted() && !scheduler.isShutdown() && !scheduler.isInStandbyMode()`.
If the scheduler has been shut down or placed in standby mode, the component is
reported as `down`.

### http_client

Checks that the Jetty HTTP client used for proxy and remote operations is running
and operational. A `down` status typically indicates a resource exhaustion problem
(thread pool or connection pool).

## HTTP status codes

| Code | Meaning |
|------|-------------------------------------------------------------|
| 200 | Instance is `healthy` or `degraded` -- safe to route traffic |
| 503 | Instance is `unhealthy` -- remove from load balancer pool |

## Load balancer integration

Use the `/.health` endpoint as the health check target for nginx, HAProxy, AWS ALB,
or any other load balancer.

### nginx example

```nginx
upstream artipie {
least_conn;
server artipie-1:8080 max_fails=3 fail_timeout=30s;
server artipie-2:8080 max_fails=3 fail_timeout=30s;
keepalive 64;
}
```

With the configuration above, nginx uses passive health checks: after 3 consecutive
failed requests (including 503 responses from `/.health`), the backend is marked as
unavailable for 30 seconds before nginx retries it.

### Recommended settings

| Parameter | Recommended value | Description |
|-----------------|-------------------|---------------------------------------|
| Check interval | 10s | How often to poll `/.health` |
| Failure threshold | 3 | Consecutive failures before marking down |
| Success threshold | 1 | Consecutive successes before marking up |
| Timeout | 5s | Maximum time to wait for a response |

For [HA deployments](Configuration-HA), configure health checks independently for
each Artipie instance so that a single unhealthy node is removed from the pool
without affecting the others.
52 changes: 52 additions & 0 deletions .wiki/Configuration-Metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,61 @@ The database has a single table `artifacts` with the following structure:

All fields are NOT NULL; a UNIQUE constraint is created on `(repo_name, name, version)`.

### Full-text search (tsvector)

The `artifacts` table includes a `search_tokens` column of type `tsvector`. This column is
auto-populated by a PostgreSQL trigger on every INSERT and UPDATE. The trigger concatenates
`repo_name`, `name`, and `version` into a single text-search vector so no application-side
indexing is required.

Full-text queries use `ts_rank()` to order results by relevance. When the query string
contains wildcard characters (`*` or `?`), the search engine automatically falls back to
`LIKE`-based matching so that glob-style patterns still work.

### Connection pool environment variables

The following environment variables tune the HikariCP connection pool used for PostgreSQL.
They can be set as system environment variables or passed via `-D` JVM flags:

| Variable | Default | Description |
|------------------------------------|-----------|--------------------------------------------------|
| `ARTIPIE_DB_CONNECTION_TIMEOUT_MS` | 5000 | Maximum time (ms) to wait for a connection |
| `ARTIPIE_DB_IDLE_TIMEOUT_MS` | 600000 | Maximum time (ms) a connection may sit idle |
| `ARTIPIE_DB_MAX_LIFETIME_MS` | 1800000 | Maximum lifetime (ms) of a connection in the pool |

These defaults are suitable for most single-instance deployments. In HA setups with many
concurrent writers you may need to lower `ARTIPIE_DB_IDLE_TIMEOUT_MS` or raise the pool
size to avoid connection starvation.

Migration note: earlier versions supported SQLite via `sqlite_data_file_path`. This is deprecated in favor of PostgreSQL.
Please migrate your data and update the configuration to use the `postgres_*` settings.

## Artifact Index (Lucene)

Artipie supports a Lucene-based artifact index for fast O(1) group repository lookups.
To enable this, add the following section to the main configuration file (`meta` section):

```yaml
meta:
artifact_index:
enabled: true # Enable Lucene artifact index (default: false)
directory: /var/artipie/index # Path for Lucene index files (required if enabled)
warmup_on_startup: true # Scan repos on startup to populate index (default: true)
```

| Field | Required | Default | Description |
|--------------------|----------|---------|----------------------------------------------------------|
| enabled | no | false | Enable or disable the Lucene artifact index |
| directory | yes* | - | Filesystem path for Lucene index files (*required if enabled) |
| warmup_on_startup | no | true | Scan all repository storage on startup to populate index |

When the index is enabled:
- On startup, `IndexWarmupService` scans all repository storage to build the initial index (unless `warmup_on_startup` is `false`)
- During warmup, group repositories fall back to querying all members (fan-out)
- Once warmup completes, group lookups return immediately from the index
- Artifact uploads and deletes automatically update the index via the event pipeline
- The REST API exposes search and stats endpoints under `/api/v1/search/`

## Maven, NPM and PyPI proxy adapters

[Maven-proxy](maven-proxy), [npm-proxy](npm-proxy) and [python-proxy](pypi-proxy) have some extra mechanism to process
Expand Down
Loading