This is an example of an AI inference server that can scale out using Triton. The blockchain payment system is a bonus.
Demo video: LINK
Nvidia Triton is a very attractive project.
It supports various AI models while boasting fast speeds.
However, it is not ideal from a scale-out perspective.
In this project, we implement a very simple example that addresses the scale-out issue using Nvidia Triton.
The configuration is as follows:

| Service | Description |
|---|---|
| Frontend + Backend Server |
Provides server code to verify actual operation. |
| Gateway | The entry point for user requests and responses. |
| Scheduler | Schedules which Triton node should handle the request. In this project, a simple round-robin method is used. |
| Triton Node | Consists of two parts: * Triton Server: Performs inference using AI models. * Manager: Manages the Triton server. It announces itself with health check messages and forwards requests to the Triton server. |
| Health Checker | Monitors Triton nodes. Continuously creates a list of nodes and provides it to the scheduler. |
| Blockchain-Based Payment System | Implements a payment system. |
- If there are permission issues, add the 'sudo' keyword.
- The server address can be set in 'setting.json' or the shell script.
cd backend
bash quick_start.sh
cd service/gateway
bash quick_start.sh
cd service/scheduler
bash quick_start.sh
cd service/health-checker
bash quick_start.sh
cd ethereum/token-manager
bash quick_start.sh
cd ethereum
bash start_ethereum.sh
cd gpu-node/manager
bash quick_start.sh
cd gpu-node
bash start_triton.sh
- If any issues arise, please post in 'Issues'.
| Repository | Description | URL |
|---|---|---|
| triton-inference-server | Connected to the Manager to serve AI models using Nvidia's Triton. This project uses version 23.12-py3. |
LINK |
| go-ethereum | Used go-ethereum to build a private network for a blockchain-based payment system. This project uses version 1.13.15. |
LINK |