API Security Best Practices

Authentication

JWT Authentication

Include the JWT token in requests:

curl -H "Authorization: Bearer <your-jwt-token>" \
     https://inferno.example.com/api/v1/models

API Key Authentication

Include the API key in requests:

curl -H "X-API-Key: <your-api-key>" \
     https://inferno.example.com/api/v1/models

Obtaining Tokens

# Login to get JWT token
curl -X POST https://inferno.example.com/api/v1/auth/login \
     -H "Content-Type: application/json" \
     -d '{"username": "admin", "password": "your-password"}'

# Response
{
  "token": "eyJ...",
  "expires_at": "2024-01-02T00:00:00Z"
}

API Key Management

Creating API Keys

curl -X POST https://inferno.example.com/api/v1/auth/api-keys \
     -H "Authorization: Bearer <admin-token>" \
     -H "Content-Type: application/json" \
     -d '{
       "name": "production-key",
       "permissions": ["read_models", "run_inference"],
       "expires_in_days": 90
     }'

API Key Permissions

Available permissions:

read_models - List and view model information
write_models - Upload and modify models
delete_models - Delete models
run_inference - Execute model inference
manage_cache - Manage cache operations
read_metrics - View system metrics
write_config - Modify configuration
manage_users - User management
view_audit_logs - View audit logs
use_streaming - Use streaming inference
manage_queue - Manage job queue

Revoking API Keys

curl -X DELETE https://inferno.example.com/api/v1/auth/api-keys/<key-id> \
     -H "Authorization: Bearer <admin-token>"

Rate Limiting

Rate Limit Headers

Responses include rate limit information:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1704067200

Handling Rate Limits

When rate limited (HTTP 429), implement exponential backoff:

import time
import requests

def make_request_with_retry(url, headers, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)

        if response.status_code == 429:
            retry_after = int(response.headers.get('Retry-After', 60))
            time.sleep(retry_after * (2 ** attempt))
            continue

        return response

    raise Exception("Max retries exceeded")

Input Validation

Request Size Limits

Maximum request body: 10MB (configurable)
Maximum prompt length: 10,000 characters (configurable)

Content Types

Always specify the content type:

curl -X POST https://inferno.example.com/api/v1/inference \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer <token>" \
     -d '{"prompt": "Hello, world!", "model": "llama-7b"}'

Error Handling

Error Response Format

{
  "error": {
    "code": "AUTHENTICATION_FAILED",
    "message": "Invalid API key",
    "details": null
  }
}

Common Error Codes

HTTP Status	Code	Description
401	AUTHENTICATION_FAILED	Invalid or missing credentials
403	PERMISSION_DENIED	Insufficient permissions
429	RATE_LIMITED	Too many requests
400	VALIDATION_ERROR	Invalid input
500	INTERNAL_ERROR	Server error

CORS Configuration

Allowed Origins

Configure allowed origins in your config:

[server]
cors_origins = ["https://your-frontend.com"]
cors_methods = ["GET", "POST", "DELETE"]
cors_headers = ["Authorization", "Content-Type", "X-API-Key"]

Secure Communication

TLS Requirements

Minimum TLS version: 1.2
Recommended: TLS 1.3
Always use HTTPS in production

Certificate Validation

When making requests, always validate certificates:

# Good - validates certificates
requests.get("https://inferno.example.com", verify=True)

# Bad - disables certificate validation
# requests.get("https://inferno.example.com", verify=False)

Logging and Monitoring

Request Logging

All API requests are logged with:

Timestamp
Client IP
User ID (if authenticated)
Endpoint
Response status
Response time

Sensitive Data

Sensitive data is automatically redacted from logs:

API keys
JWT tokens
Passwords
Email addresses

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Security Best Practices

Authentication

JWT Authentication

API Key Authentication

Obtaining Tokens

API Key Management

Creating API Keys

API Key Permissions

Revoking API Keys

Rate Limiting

Rate Limit Headers

Handling Rate Limits

Input Validation

Request Size Limits

Content Types

Error Handling

Error Response Format

Common Error Codes

CORS Configuration

Allowed Origins

Secure Communication

TLS Requirements

Certificate Validation

Logging and Monitoring

Request Logging

Sensitive Data

Best Practices Checklist

FilesExpand file tree

API_SECURITY.md

Latest commit

History

API_SECURITY.md

File metadata and controls

API Security Best Practices

Authentication

JWT Authentication

API Key Authentication

Obtaining Tokens

API Key Management

Creating API Keys

API Key Permissions

Revoking API Keys

Rate Limiting

Rate Limit Headers

Handling Rate Limits

Input Validation

Request Size Limits

Content Types

Error Handling

Error Response Format

Common Error Codes

CORS Configuration

Allowed Origins

Secure Communication

TLS Requirements

Certificate Validation

Logging and Monitoring

Request Logging

Sensitive Data

Best Practices Checklist