NebulaFS is a production-grade, cloud-storage style file server written in C++20 using Boost.Asio/Beast for async HTTP and Poco for configuration, logging, and utilities. It is built as a learning-by-building project that scales from a single-node file server to a distributed storage cluster.
- Async HTTP server with Boost.Asio/Beast and multi-threaded IO.
- Local filesystem storage with atomic writes and checksum-based ETags.
- SQLite metadata for buckets and objects.
- Structured logging via Poco with request correlation.
- Security-first design with OIDC/JWT auth and JWKS validation support.
flowchart LR
client(("Client")) -->|"HTTPS"| gateway["HTTP Server"]
gateway --> auth["Auth (OIDC/JWT + JWKS)"]
gateway --> local_storage["Local Storage Engine (single_node)"]
gateway --> local_metadata["SQLite Metadata (single_node)"]
gateway --> metadata_svc["Metadata Service (distributed)"]
gateway --> storage_nodes["Storage Nodes (distributed)"]
gateway --> observability["Metrics/Health"]
- CMake 3.20+
- C++20 compiler
- vcpkg
cmake --preset debug
cmake --build --preset debug./build/debug/nebulafs --config config/server.jsonMilestone 6 adds two internal services:
./build/debug/nebulafs_metadata(metadata + placement service)./build/debug/nebulafs_storage_node(blob storage node)
Gateway distributed mode is enabled with:
server.mode = "distributed"distributed.metadata_base_urldistributed.storage_nodesdistributed.service_auth_token
Distributed mode now emits service-specific counters and latency sums via /metrics.
- Gateway:
nebulafs_gateway_storage_put_failures_totalnebulafs_gateway_metadata_rpc_failures_totalnebulafs_gateway_replica_fallback_totalnebulafs_gateway_multipart_compose_failures_totalnebulafs_gateway_multipart_rollback_attempts_totalnebulafs_gateway_multipart_rollback_failures_totalnebulafs_gateway_distributed_cleanup_uploads_totalnebulafs_gateway_distributed_cleanup_upload_failures_totalnebulafs_gateway_distributed_cleanup_blob_deletes_totalnebulafs_gateway_distributed_cleanup_blob_delete_failures_total
- Metadata service:
nebulafs_metadata_allocate_requests_totalnebulafs_metadata_allocate_failures_totalnebulafs_metadata_allocate_latency_ms_sumnebulafs_metadata_commit_requests_totalnebulafs_metadata_commit_failures_totalnebulafs_metadata_commit_latency_ms_sum
- Storage node service:
nebulafs_storage_node_blob_writes_totalnebulafs_storage_node_blob_write_failures_totalnebulafs_storage_node_blob_write_latency_ms_sumnebulafs_storage_node_blob_reads_totalnebulafs_storage_node_blob_read_failures_totalnebulafs_storage_node_blob_read_latency_ms_sumnebulafs_storage_node_blob_deletes_totalnebulafs_storage_node_blob_delete_failures_totalnebulafs_storage_node_blob_delete_latency_ms_sumnebulafs_storage_node_blob_composes_totalnebulafs_storage_node_blob_compose_failures_totalnebulafs_storage_node_blob_compose_latency_ms_sum
config/server.json supports:
server.limits.request_timeout_ms(default30000)server.limits.rate_limit_rps(default0, disabled)server.limits.rate_limit_burst(default0, disabled)
# Health
curl http://localhost:8080/healthz
# Create bucket
curl -X POST http://localhost:8080/v1/buckets -d '{"name":"demo"}'
# Upload object
curl -X PUT \
--data-binary @README.md \
http://localhost:8080/v1/buckets/demo/objects/readme.txt
# Upload object (query-style)
curl -X POST \
--data-binary @README.md \
"http://localhost:8080/v1/buckets/demo/objects?name=readme.txt"
# Multipart upload: initiate
UPLOAD_ID=$(curl -s -X POST \
-H "Content-Type: application/json" \
-d '{"object":"large.bin"}' \
http://localhost:8080/v1/buckets/demo/multipart-uploads | jq -r .upload_id)
# Multipart upload: parts
curl -X PUT --data-binary @part1.bin \
"http://localhost:8080/v1/buckets/demo/multipart-uploads/$UPLOAD_ID/parts/1"
curl -X PUT --data-binary @part2.bin \
"http://localhost:8080/v1/buckets/demo/multipart-uploads/$UPLOAD_ID/parts/2"
# Multipart upload: complete
PART1_ETAG=$(curl -s "http://localhost:8080/v1/buckets/demo/multipart-uploads/$UPLOAD_ID/parts" \
| jq -r '.parts[] | select(.part_number==1) | .etag')
PART2_ETAG=$(curl -s "http://localhost:8080/v1/buckets/demo/multipart-uploads/$UPLOAD_ID/parts" \
| jq -r '.parts[] | select(.part_number==2) | .etag')
curl -X POST -H "Content-Type: application/json" \
-d "{\"parts\":[{\"part_number\":1,\"etag\":\"$PART1_ETAG\"},{\"part_number\":2,\"etag\":\"$PART2_ETAG\"}]}" \
"http://localhost:8080/v1/buckets/demo/multipart-uploads/$UPLOAD_ID/complete"
# Multipart upload: abort
curl -X DELETE "http://localhost:8080/v1/buckets/demo/multipart-uploads/$UPLOAD_ID"
# Download object
curl http://localhost:8080/v1/buckets/demo/objects/readme.txt -o readme.txt
# List objects
curl "http://localhost:8080/v1/buckets/demo/objects?prefix=read"Note: multipart upload endpoints are available in both single-node and distributed mode. In distributed mode, parts are stored on storage nodes and finalized through gateway orchestration.
Use this to validate auth.enabled=true end-to-end.
- Start Keycloak:
docker run --name keycloak -p 8081:8080 \
-e KEYCLOAK_ADMIN=admin \
-e KEYCLOAK_ADMIN_PASSWORD=admin \
quay.io/keycloak/keycloak:26.0 start-dev- Set
authinconfig/server.json:
"auth": {
"enabled": true,
"issuer": "http://127.0.0.1:8081/realms/master",
"audience": "",
"jwks_url": "http://127.0.0.1:8081/realms/master/protocol/openid-connect/certs",
"cache_ttl_seconds": 300,
"clock_skew_seconds": 60,
"allowed_alg": "RS256"
}-
Restart NebulaFS.
-
Verify protected route without token (should be
401):
curl -i http://127.0.0.1:8080/v1/buckets- Request token and call protected route:
TOKEN=$(curl -s -X POST \
"http://127.0.0.1:8081/realms/master/protocol/openid-connect/token" \
-d "grant_type=password" \
-d "client_id=admin-cli" \
-d "username=admin" \
-d "password=admin" | jq -r .access_token)
curl -i -H "Authorization: Bearer $TOKEN" http://127.0.0.1:8080/v1/bucketsTroubleshooting:
issuer mismatch:auth.issuermust exactly equal tokeniss.audience mismatch: setauth.audienceto match tokenaud, or use empty string to skip.jwks fetch failed: verifyauth.jwks_urland IdP reachability from NebulaFS process.
- OpenSSL 3 deprecation warnings appear in JWT/JWKS test helper code (
RSA_*APIs). They are test-only warnings and do not block runtime behavior. /metricsis currently treated as a protected endpoint whenauth.enabled=true(only/healthzand/readyzare public).
- TLS supported via config; disabled by default for local dev.
- Auth is available via OIDC/JWT when enabled in config. Health is public; all other endpoints require a valid token.
- Path traversal protection enforced in storage.
- Size limits enforced by config.
- Async IO with per-connection strands.
- Streaming request bodies to disk with size limits.
- Download supports HTTP range requests.
- Milestone 3: OIDC/JWT validation with JWKS caching (completed).
- Milestone 3.1: Startup auth config hardening (completed).
- Milestone 4: Multipart uploads and cleanup baseline (completed).
- Milestone 5: Metrics (Prometheus), rate limiting, timeouts (completed).
- Milestone 6: Distributed baseline implemented (gateway + metadata service + storage nodes + distributed CI lane).
- Milestone 7: Distributed upload maturity (streamed writes + distributed multipart baseline) (completed).
- Milestone 8: Distributed reliability hardening (compose reliability + distributed cleanup) (in progress).
- Distributed mode keeps public object CRUD routes unchanged at the gateway.
- Metadata and storage-node internal services run as separate binaries with service-token checks.
- Distributed failure correctness is covered in integration tests (read fallback, write quorum failure, token rejection).
- Distributed metrics are exposed and validated for gateway, metadata service, and storage node.
- Current limitation: distributed cleanup coordination is best-effort per gateway instance (no cluster leader election).
- Architecture:
docs/architecture.md - Milestone 4 design:
docs/design/milestone-4-multipart-cleanup.md - Milestone 6 design:
docs/design/milestone-6-distributed-mode.md - Threat model:
docs/threat-model.md - ADRs:
docs/adr/ - Code style:
docs/code-style.md
MIT. See LICENSE.