A simple FastAPI project to demonstrate how to build an API server for an AI application using Gemini.
Create a virtual environment
python -m venv venv | uv init
source venv/bin/activate | uv .env/bin/activateInstall dependencies
pip install -r requirements.txt | uv add -r requirements.txtSet up API Key
Set your Gemini API key as an environment variable in your terminal. You can get your API key from Google AI Studio.
export GEMINI_API_KEY="your_gemini_api_key" | touch ../.env (add GEMINI_API_KEY=...) For Windows users:
:: In Command Prompt
set GEMINI_API_KEY="your_gemini_api_key"
:: In PowerShell
$env:GEMINI_API_KEY="your_gemini_api_key"The application will load this key from your environment.
To run the application, use the following command:
uvicorn src.main:app --reload | uvicorn src.main:app --host 0.0.0.0 --port 8888 --reloadThe API will be available at http://127.0.0.1:8000.
You can send a request to the chat API without an authentication token. These requests are subject to a global rate limit.
You can send a request to the chat API without an authentication token. These requests are subject to a global rate limit.
curl -X POST "http://127.0.0.1:8888/chat" \
-H "Content-Type: application/json" \
-d '{"prompt": "How many peas in a pod?"}'For a higher rate limit, you can authenticate by providing a JWT token. Make sure to replace YOUR_GENERATED_TOKEN with a valid token.
curl -X POST "http://127.0.0.1:8888/chat" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_GENERATED_TOKEN" \
-d '{"prompt": "Why is the sky blue?"}'The /chat endpoint is protected and requires a JWT token for authentication. For testing purposes, you can generate a valid token using jwt.io:
Algorithm: Change the algorithm to HS256.
Payload: Use the following payload. The sub field will be used as the user identifier for rate limiting.
{
"sub": "testuser",
"name": "John Doe",
"iat": 1516239022
}Signature: In the "Verify Signature" section, use the secret key a-string-secret-at-least-256-bits-long. This is the same secret key that is hardcoded in src/auth/dependencies.py.
You can now use the generated token to make authenticated requests.
You can also use the auto-generated FastAPI documentation to interact with the API.
Once the server is running, go to http://127.0.0.1:8888/docs in your browser.
You can configure the system prompt by editing the src/prompts/system_prompt.md file.
The API implements rate limiting to prevent abuse. You can modify these limits by changing the constants in src/auth/throttling.py:
GLOBAL_RATE_LIMIT = 3
GLOBAL_TIME_WINDOW_SECONDS = 60The project is structured to be modular, allowing for different AI platforms to be used.
src/main.py: The main FastAPI application file.src/ai/base.py: Defines theAIPlatforminterface.src/ai/gemini.py: The implementation of theAIPlatforminterface for Gemini.src/prompts/system_prompt.md: The system prompt for the AI.src/auth/dependencies.py: Handles JWT decoding and user identification.src/auth/throttling.py: Provides a simple in-memory rate limiter with different limits for authenticated and unauthenticated users.
To use a different AI, you would create a new class that inherits from AIPlatform and implement the chat method. Then, you would update src/main.py to use your new AI class.