Skip to content

Jihaoyb/file_server

NebulaFS

NebulaFS is a production-grade, cloud-storage style file server written in C++20 using Boost.Asio/Beast for async HTTP and Poco for configuration, logging, and utilities. It is built as a learning-by-building project that scales from a single-node file server to a distributed storage cluster.

Highlights

  • Async HTTP server with Boost.Asio/Beast and multi-threaded IO.
  • Local filesystem storage with atomic writes and checksum-based ETags.
  • SQLite metadata for buckets and objects.
  • Structured logging via Poco with request correlation.
  • Security-first design with OIDC/JWT auth and JWKS validation support.

Architecture (Milestone 0–7 baseline)

flowchart LR
  client(("Client")) -->|"HTTPS"| gateway["HTTP Server"]
  gateway --> auth["Auth (OIDC/JWT + JWKS)"]
  gateway --> local_storage["Local Storage Engine (single_node)"]
  gateway --> local_metadata["SQLite Metadata (single_node)"]
  gateway --> metadata_svc["Metadata Service (distributed)"]
  gateway --> storage_nodes["Storage Nodes (distributed)"]
  gateway --> observability["Metrics/Health"]
Loading

Quickstart

Prerequisites

  • CMake 3.20+
  • C++20 compiler
  • vcpkg

Build

cmake --preset debug
cmake --build --preset debug

Run

./build/debug/nebulafs --config config/server.json

Distributed mode binaries

Milestone 6 adds two internal services:

  • ./build/debug/nebulafs_metadata (metadata + placement service)
  • ./build/debug/nebulafs_storage_node (blob storage node)

Gateway distributed mode is enabled with:

  • server.mode = "distributed"
  • distributed.metadata_base_url
  • distributed.storage_nodes
  • distributed.service_auth_token

Distributed observability metrics (Milestone 6)

Distributed mode now emits service-specific counters and latency sums via /metrics.

  • Gateway:
    • nebulafs_gateway_storage_put_failures_total
    • nebulafs_gateway_metadata_rpc_failures_total
    • nebulafs_gateway_replica_fallback_total
    • nebulafs_gateway_multipart_compose_failures_total
    • nebulafs_gateway_multipart_rollback_attempts_total
    • nebulafs_gateway_multipart_rollback_failures_total
    • nebulafs_gateway_distributed_cleanup_uploads_total
    • nebulafs_gateway_distributed_cleanup_upload_failures_total
    • nebulafs_gateway_distributed_cleanup_blob_deletes_total
    • nebulafs_gateway_distributed_cleanup_blob_delete_failures_total
  • Metadata service:
    • nebulafs_metadata_allocate_requests_total
    • nebulafs_metadata_allocate_failures_total
    • nebulafs_metadata_allocate_latency_ms_sum
    • nebulafs_metadata_commit_requests_total
    • nebulafs_metadata_commit_failures_total
    • nebulafs_metadata_commit_latency_ms_sum
  • Storage node service:
    • nebulafs_storage_node_blob_writes_total
    • nebulafs_storage_node_blob_write_failures_total
    • nebulafs_storage_node_blob_write_latency_ms_sum
    • nebulafs_storage_node_blob_reads_total
    • nebulafs_storage_node_blob_read_failures_total
    • nebulafs_storage_node_blob_read_latency_ms_sum
    • nebulafs_storage_node_blob_deletes_total
    • nebulafs_storage_node_blob_delete_failures_total
    • nebulafs_storage_node_blob_delete_latency_ms_sum
    • nebulafs_storage_node_blob_composes_total
    • nebulafs_storage_node_blob_compose_failures_total
    • nebulafs_storage_node_blob_compose_latency_ms_sum

Traffic controls

config/server.json supports:

  • server.limits.request_timeout_ms (default 30000)
  • server.limits.rate_limit_rps (default 0, disabled)
  • server.limits.rate_limit_burst (default 0, disabled)

Example API calls

# Health
curl http://localhost:8080/healthz

# Create bucket
curl -X POST http://localhost:8080/v1/buckets -d '{"name":"demo"}'

# Upload object
curl -X PUT \
  --data-binary @README.md \
  http://localhost:8080/v1/buckets/demo/objects/readme.txt

# Upload object (query-style)
curl -X POST \
  --data-binary @README.md \
  "http://localhost:8080/v1/buckets/demo/objects?name=readme.txt"

# Multipart upload: initiate
UPLOAD_ID=$(curl -s -X POST \
  -H "Content-Type: application/json" \
  -d '{"object":"large.bin"}' \
  http://localhost:8080/v1/buckets/demo/multipart-uploads | jq -r .upload_id)

# Multipart upload: parts
curl -X PUT --data-binary @part1.bin \
  "http://localhost:8080/v1/buckets/demo/multipart-uploads/$UPLOAD_ID/parts/1"
curl -X PUT --data-binary @part2.bin \
  "http://localhost:8080/v1/buckets/demo/multipart-uploads/$UPLOAD_ID/parts/2"

# Multipart upload: complete
PART1_ETAG=$(curl -s "http://localhost:8080/v1/buckets/demo/multipart-uploads/$UPLOAD_ID/parts" \
  | jq -r '.parts[] | select(.part_number==1) | .etag')
PART2_ETAG=$(curl -s "http://localhost:8080/v1/buckets/demo/multipart-uploads/$UPLOAD_ID/parts" \
  | jq -r '.parts[] | select(.part_number==2) | .etag')
curl -X POST -H "Content-Type: application/json" \
  -d "{\"parts\":[{\"part_number\":1,\"etag\":\"$PART1_ETAG\"},{\"part_number\":2,\"etag\":\"$PART2_ETAG\"}]}" \
  "http://localhost:8080/v1/buckets/demo/multipart-uploads/$UPLOAD_ID/complete"

# Multipart upload: abort
curl -X DELETE "http://localhost:8080/v1/buckets/demo/multipart-uploads/$UPLOAD_ID"

# Download object
curl http://localhost:8080/v1/buckets/demo/objects/readme.txt -o readme.txt

# List objects
curl "http://localhost:8080/v1/buckets/demo/objects?prefix=read"

Note: multipart upload endpoints are available in both single-node and distributed mode. In distributed mode, parts are stored on storage nodes and finalized through gateway orchestration.

Authentication test (Keycloak local)

Use this to validate auth.enabled=true end-to-end.

  1. Start Keycloak:
docker run --name keycloak -p 8081:8080 \
  -e KEYCLOAK_ADMIN=admin \
  -e KEYCLOAK_ADMIN_PASSWORD=admin \
  quay.io/keycloak/keycloak:26.0 start-dev
  1. Set auth in config/server.json:
"auth": {
  "enabled": true,
  "issuer": "http://127.0.0.1:8081/realms/master",
  "audience": "",
  "jwks_url": "http://127.0.0.1:8081/realms/master/protocol/openid-connect/certs",
  "cache_ttl_seconds": 300,
  "clock_skew_seconds": 60,
  "allowed_alg": "RS256"
}
  1. Restart NebulaFS.

  2. Verify protected route without token (should be 401):

curl -i http://127.0.0.1:8080/v1/buckets
  1. Request token and call protected route:
TOKEN=$(curl -s -X POST \
  "http://127.0.0.1:8081/realms/master/protocol/openid-connect/token" \
  -d "grant_type=password" \
  -d "client_id=admin-cli" \
  -d "username=admin" \
  -d "password=admin" | jq -r .access_token)

curl -i -H "Authorization: Bearer $TOKEN" http://127.0.0.1:8080/v1/buckets

Troubleshooting:

  • issuer mismatch: auth.issuer must exactly equal token iss.
  • audience mismatch: set auth.audience to match token aud, or use empty string to skip.
  • jwks fetch failed: verify auth.jwks_url and IdP reachability from NebulaFS process.

Known limitations (Milestone 3 baseline)

  • OpenSSL 3 deprecation warnings appear in JWT/JWKS test helper code (RSA_* APIs). They are test-only warnings and do not block runtime behavior.
  • /metrics is currently treated as a protected endpoint when auth.enabled=true (only /healthz and /readyz are public).

Security Model (Current)

  • TLS supported via config; disabled by default for local dev.
  • Auth is available via OIDC/JWT when enabled in config. Health is public; all other endpoints require a valid token.
  • Path traversal protection enforced in storage.
  • Size limits enforced by config.

Performance Notes (Current)

  • Async IO with per-connection strands.
  • Streaming request bodies to disk with size limits.
  • Download supports HTTP range requests.

Roadmap

  • Milestone 3: OIDC/JWT validation with JWKS caching (completed).
  • Milestone 3.1: Startup auth config hardening (completed).
  • Milestone 4: Multipart uploads and cleanup baseline (completed).
  • Milestone 5: Metrics (Prometheus), rate limiting, timeouts (completed).
  • Milestone 6: Distributed baseline implemented (gateway + metadata service + storage nodes + distributed CI lane).
  • Milestone 7: Distributed upload maturity (streamed writes + distributed multipart baseline) (completed).
  • Milestone 8: Distributed reliability hardening (compose reliability + distributed cleanup) (in progress).

Milestone 6 completion criteria

  • Distributed mode keeps public object CRUD routes unchanged at the gateway.
  • Metadata and storage-node internal services run as separate binaries with service-token checks.
  • Distributed failure correctness is covered in integration tests (read fallback, write quorum failure, token rejection).
  • Distributed metrics are exposed and validated for gateway, metadata service, and storage node.
  • Current limitation: distributed cleanup coordination is best-effort per gateway instance (no cluster leader election).

Docs

  • Architecture: docs/architecture.md
  • Milestone 4 design: docs/design/milestone-4-multipart-cleanup.md
  • Milestone 6 design: docs/design/milestone-6-distributed-mode.md
  • Threat model: docs/threat-model.md
  • ADRs: docs/adr/
  • Code style: docs/code-style.md

License

MIT. See LICENSE.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages