A minimal FastAPI web API for text generation using Hugging Face Transformers models, served with Ray Serve for scalable model serving. This project provides a /text/simple-gen endpoint that generates text completions using a default text generation pipeline.
- REST API for text generation (using Hugging Face
pipeline("text-generation")) - Scalable model serving with Ray Serve on a Ray Cluster
- Deployed on Kubernetes and managed by the KubeRay operator
- FastAPI-based, easily extendable and documented (provides OpenAPI/Swagger docs out of the box)
- Configured for easy Docker deployment using uv for ultra-fast Python package management
- Works with PyTorch (CPU) by default
- Infrastructure-as-code for Azure provisioning using Terraform, with Kubernetes manifests for AKS-based app deployment
- Python 3.13+ (see
pyproject.toml) - Or Docker
Note on Python and Ray versions: There is a potential version conflict. pyproject.toml specifies Python 3.13+ and ray>=2.50.0, while a comment in infra/k8s/rayservice.yml suggests ray==2.46.0 which is not compatible with Python 3.13. This README assumes the versions in pyproject.toml are correct.
-
Install uv (
pip install uvor use pre-built binaries). -
Sync the dependencies:
uv sync
-
Run the app with Ray Serve:
serve run serve_app:deployment_graph
You can build and run the container as follows:
docker build -t fastapi-transformers .
docker run -p 8000:8000 fastapi-transformersThe API is served by Ray Serve and exposes the following endpoint:
Generates text from provided input text. Uses the default text-generation pipeline from Hugging Face transformers (e.g., gpt2 or equivalent, depending on environment/model cache).
-
Request Body:
{ "input": "Once upon a time" } -
Response:
[ { "generated_text": "Once upon a time..." } ](output format depends on the underlying model)
curl -X POST http://localhost:8000/text/simple-gen -H 'Content-Type: application/json' -d '{"input":"Hello, world!"}'- Once running, see Swagger UI at http://localhost:8000/docs
- The OpenAPI schema is available at http://localhost:8000/openapi.json
.
├── serve_app.py # Ray Serve application entrypoint
├── Dockerfile # Docker container configuration
├── pyproject.toml, uv.lock # Project dependencies (managed by uv)
├── infra/
│ ├── azure/terraform/ # Terraform for Azure resources (AKS, ACR)
│ └── k8s/
│ └── rayservice.yml # RayService manifest for deploying the app on K8s
└── routers/
├── models/
│ └── text_gen/
│ └── simple_input.py # Data model for text generation input
└── text/
└── __init__.py
- To add new models or pipelines, create new Ray Serve deployments in
serve_app.py. - To change the default model, override the
pipeline("text-generation")call in theTextGenServiceclass with your desired model, e.g.pipeline("text-generation", model="gpt2").
The Terraform configurations are located at infra/azure/terraform and provision the following Azure resources:
- Resource Group
- Azure Container Registry (ACR)
- Azure Kubernetes Service (AKS) cluster
To deploy the infrastructure, ensure you have the Azure CLI installed and are logged in:
az loginThen, from the Terraform directory:
cd infra/azure/terraform
terraform init
terraform plan -out=tfplan
terraform apply tfplanAfter deployment, view the outputs (e.g., resource group and AKS cluster names):
terraform outputConfigure kubectl to connect to the new AKS cluster:
az aks get-credentials --resource-group $(terraform output -raw rg_name) --name $(terraform output -raw aks_name)
kubectl get nodesThe application is deployed as a RayService on the AKS cluster. This requires the KubeRay operator to be installed on the cluster.
-
Install the KubeRay operator:
Follow the instructions in the KubeRay documentation to install the operator using Helm.
-
Deploy the RayService:
The Kubernetes manifest is located at
infra/k8s/rayservice.yml. Review and adjust theimagefield to match your ACR, then deploy:kubectl apply -f infra/k8s/rayservice.yml
-
Verify the deployment:
Check the status of the RayService and the pods:
kubectl get rayservice kubectl get pods
To access the application, you will need to port-forward the Ray Serve service:
kubectl port-forward service/fastapi-transformer-service-head-svc 8000:8000
This project is for educational/starter purposes. No explicit license.