Include the JWT token in requests:
curl -H "Authorization: Bearer <your-jwt-token>" \
https://inferno.example.com/api/v1/modelsInclude the API key in requests:
curl -H "X-API-Key: <your-api-key>" \
https://inferno.example.com/api/v1/models# Login to get JWT token
curl -X POST https://inferno.example.com/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"username": "admin", "password": "your-password"}'
# Response
{
"token": "eyJ...",
"expires_at": "2024-01-02T00:00:00Z"
}curl -X POST https://inferno.example.com/api/v1/auth/api-keys \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d '{
"name": "production-key",
"permissions": ["read_models", "run_inference"],
"expires_in_days": 90
}'Available permissions:
read_models- List and view model informationwrite_models- Upload and modify modelsdelete_models- Delete modelsrun_inference- Execute model inferencemanage_cache- Manage cache operationsread_metrics- View system metricswrite_config- Modify configurationmanage_users- User managementview_audit_logs- View audit logsuse_streaming- Use streaming inferencemanage_queue- Manage job queue
curl -X DELETE https://inferno.example.com/api/v1/auth/api-keys/<key-id> \
-H "Authorization: Bearer <admin-token>"Responses include rate limit information:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1704067200
When rate limited (HTTP 429), implement exponential backoff:
import time
import requests
def make_request_with_retry(url, headers, max_retries=5):
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 60))
time.sleep(retry_after * (2 ** attempt))
continue
return response
raise Exception("Max retries exceeded")- Maximum request body: 10MB (configurable)
- Maximum prompt length: 10,000 characters (configurable)
Always specify the content type:
curl -X POST https://inferno.example.com/api/v1/inference \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"prompt": "Hello, world!", "model": "llama-7b"}'{
"error": {
"code": "AUTHENTICATION_FAILED",
"message": "Invalid API key",
"details": null
}
}| HTTP Status | Code | Description |
|---|---|---|
| 401 | AUTHENTICATION_FAILED | Invalid or missing credentials |
| 403 | PERMISSION_DENIED | Insufficient permissions |
| 429 | RATE_LIMITED | Too many requests |
| 400 | VALIDATION_ERROR | Invalid input |
| 500 | INTERNAL_ERROR | Server error |
Configure allowed origins in your config:
[server]
cors_origins = ["https://your-frontend.com"]
cors_methods = ["GET", "POST", "DELETE"]
cors_headers = ["Authorization", "Content-Type", "X-API-Key"]- Minimum TLS version: 1.2
- Recommended: TLS 1.3
- Always use HTTPS in production
When making requests, always validate certificates:
# Good - validates certificates
requests.get("https://inferno.example.com", verify=True)
# Bad - disables certificate validation
# requests.get("https://inferno.example.com", verify=False)All API requests are logged with:
- Timestamp
- Client IP
- User ID (if authenticated)
- Endpoint
- Response status
- Response time
Sensitive data is automatically redacted from logs:
- API keys
- JWT tokens
- Passwords
- Email addresses
- Always use HTTPS
- Rotate API keys regularly
- Use minimum required permissions
- Implement rate limiting on client side
- Handle errors gracefully
- Log and monitor API usage
- Validate all inputs
- Keep credentials out of version control