fastapi-transformers

A minimal FastAPI web API for text generation using Hugging Face Transformers models, served with Ray Serve for scalable model serving. This project provides a /text/simple-gen endpoint that generates text completions using a default text generation pipeline.

Features

REST API for text generation (using Hugging Face pipeline("text-generation"))
Scalable model serving with Ray Serve on a Ray Cluster
Deployed on Kubernetes and managed by the KubeRay operator
FastAPI-based, easily extendable and documented (provides OpenAPI/Swagger docs out of the box)
Configured for easy Docker deployment using uv for ultra-fast Python package management
Works with PyTorch (CPU) by default
Infrastructure-as-code for Azure provisioning using Terraform, with Kubernetes manifests for AKS-based app deployment

Requirements

Python 3.13+ (see pyproject.toml)
Or Docker

Note on Python and Ray versions: There is a potential version conflict. pyproject.toml specifies Python 3.13+ and ray>=2.50.0, while a comment in infra/k8s/rayservice.yml suggests ray==2.46.0 which is not compatible with Python 3.13. This README assumes the versions in pyproject.toml are correct.

Installation

Native

Install uv (pip install uv or use pre-built binaries).
Sync the dependencies:
```
uv sync
```
Run the app with Ray Serve:
```
serve run serve_app:deployment_graph
```

With Docker

You can build and run the container as follows:

docker build -t fastapi-transformers .
docker run -p 8000:8000 fastapi-transformers

Usage

The API is served by Ray Serve and exposes the following endpoint:

`POST /text/simple-gen`

Generates text from provided input text. Uses the default text-generation pipeline from Hugging Face transformers (e.g., gpt2 or equivalent, depending on environment/model cache).

Request Body:
```
{
  "input": "Once upon a time"
}
```

Response:

[
  {
    "generated_text": "Once upon a time..."
  }
]

(output format depends on the underlying model)

Example with curl

curl -X POST http://localhost:8000/text/simple-gen -H 'Content-Type: application/json' -d '{"input":"Hello, world!"}'

API Docs

Once running, see Swagger UI at http://localhost:8000/docs
The OpenAPI schema is available at http://localhost:8000/openapi.json

Project Structure

.
├── serve_app.py                  # Ray Serve application entrypoint
├── Dockerfile                    # Docker container configuration
├── pyproject.toml, uv.lock       # Project dependencies (managed by uv)
├── infra/
│   ├── azure/terraform/          # Terraform for Azure resources (AKS, ACR)
│   └── k8s/
│       └── rayservice.yml        # RayService manifest for deploying the app on K8s
└── routers/
    ├── models/
    │   └── text_gen/
    │       └── simple_input.py   # Data model for text generation input
    └── text/
        └── __init__.py

Extending

To add new models or pipelines, create new Ray Serve deployments in serve_app.py.
To change the default model, override the pipeline("text-generation") call in the TextGenService class with your desired model, e.g. pipeline("text-generation", model="gpt2").

Infrastructure Deployment

Azure Infrastructure via Terraform

The Terraform configurations are located at infra/azure/terraform and provision the following Azure resources:

Resource Group
Azure Container Registry (ACR)
Azure Kubernetes Service (AKS) cluster

To deploy the infrastructure, ensure you have the Azure CLI installed and are logged in:

az login

Then, from the Terraform directory:

cd infra/azure/terraform
terraform init
terraform plan -out=tfplan
terraform apply tfplan

After deployment, view the outputs (e.g., resource group and AKS cluster names):

terraform output

Accessing the AKS Cluster

Configure kubectl to connect to the new AKS cluster:

az aks get-credentials --resource-group $(terraform output -raw rg_name) --name $(terraform output -raw aks_name)
kubectl get nodes

Deploying the Application on AKS with KubeRay

The application is deployed as a RayService on the AKS cluster. This requires the KubeRay operator to be installed on the cluster.

Install the KubeRay operator:

Follow the instructions in the KubeRay documentation to install the operator using Helm.
Deploy the RayService:

The Kubernetes manifest is located at infra/k8s/rayservice.yml. Review and adjust the image field to match your ACR, then deploy:
```
kubectl apply -f infra/k8s/rayservice.yml
```
Verify the deployment:

Check the status of the RayService and the pods:
```
kubectl get rayservice
kubectl get pods
```
To access the application, you will need to port-forward the Ray Serve service:
```
kubectl port-forward service/fastapi-transformer-service-head-svc 8000:8000
```

License

This project is for educational/starter purposes. No explicit license.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.github/workflows		.github/workflows
docs		docs
infra		infra
openspec		openspec
routers		routers
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
deploy-app.sh		deploy-app.sh
pyproject.toml		pyproject.toml
serve_app.py		serve_app.py
update-deployment.sh		update-deployment.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

fastapi-transformers

Features

Requirements

Installation

Native

With Docker

Usage

`POST /text/simple-gen`

Example with curl

API Docs

Project Structure

Extending

Infrastructure Deployment

Azure Infrastructure via Terraform

Accessing the AKS Cluster

Deploying the Application on AKS with KubeRay

License

About

Uh oh!

Releases

Packages

Languages

Parry-97/fastapi-transformers

Folders and files

Latest commit

History

Repository files navigation

fastapi-transformers

Features

Requirements

Installation

Native

With Docker

Usage

POST /text/simple-gen

Example with curl

API Docs

Project Structure

Extending

Infrastructure Deployment

Azure Infrastructure via Terraform

Accessing the AKS Cluster

Deploying the Application on AKS with KubeRay

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`POST /text/simple-gen`

Packages