Blackfish is an open source "ML-as-a-Service" (MLaaS) platform that helps researchers use state-of-the-art, open source artificial intelligence and machine learning models. With Blackfish, researchers can spin up their own version of popular public cloud services (e.g., ChatGPT, Amazon Transcribe, etc.) using high-performance computing (HPC) resources already available on campus.
The primary goal of Blackfish is to facilitate transparent and reproducible research based on open source machine learning and artificial intelligence. We do this by providing mechanisms to run user-specified models with user-defined configurations. For academic research, open source models present several advantages over closed source models. First, whereas large-scale projects using public cloud services might cost $10K to $100K for similar quality results, open source models running on HPC resources are free to researchers. Second, with open source models you know exactly what model you are using and you can easily provide a copy of that model to other researchers. Closed source models can and do change without notice. Third, using open-source models allows complete transparency into how your data is being used.
Researchers should focus on research, not tooling. We try to meet researchers where they're at by providing multiple ways to work with Blackfish, including a Python API, a command-line tool (CLI), and a browser-based user interface (UI).
Don't want to install a Python package? Ask your HPC admins to install Blackfish OnDemand!
You decide what model to run (down to the Git commit) and how you want it configured. There are no unexpected (or undetected) changes in performance because the model is always the same. All services are private, so you know exactly how your data is being handled.
You have an HPC cluster. We have software to run on it.
Blackfish is a pip-installable python package. We recommend installing Blackfish to its own virtual environment, for example:
python -m venv .venv
source .venv/bin/activate
pip install blackfish-aiOr, using uv:
uv venv
uv pip install blackfish-aiThe following command should return the path of the installed application if installation was successful:
which blackfishBefore you begin using Blackfish, you'll need to initialize the application. To do so, type
blackfish initat the command line. This command will prompt you to provide details for a Blackfish profile. Let's create a default profile that will allow us to run services on compute nodes via the Slurm job scheduler:
name=default
type=slurm
host=localhost
user=shamu
home=/home/shamu/.blackfish
cache=/scratch/gpfs/shared/.blackfish
The cache is a shared directory set up by your HPC admin for storing shared model and image files. This quickstart assumes you have access to a cache directory with all required Docker images downloaded. If your HPC does not have a cache set up, you can assign the same directory used for home and add the images yourself.
Once Blackfish is properly initialized, you can run the blackfish start command to launch the application:
blackfish startIf everything is working, you should see output like the following:
INFO: Added class SpeechRecognition to service class dictionary. [2025-02-24 11:55:06.639]
INFO: Added class TextGeneration to service class dictionary. [2025-02-24 11:55:06.639]
WARNING: Blackfish is running in debug mode. API endpoints are unprotected. In a production
environment, set BLACKFISH_DEBUG=0 to require user authentication. [2025-02-24 11:55:06.639]
INFO: Upgrading database... [2025-02-24 11:55:06.915]
WARNING: Current configuration will not reload as not all conditions are met, please refer to documentation.
INFO: Started server process [58591]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)CongratulationsβBlackfish is now up and running! The application serves the user interface as well as endpoints to manage services and Blackfish itself. The rest of this guide will walk through how to use the CLI to interact with these endpoints.
Let's start by exploring what services are available. In a new terminal (with your virtual environment activated), type
blackfish run --helpThe output displays a list of available commands. One of these is commands text-generation, which launches a service to generate text given an input prompt or message history (for models supporting chat). There are many models to choose from to perform this task, but Blackfish only allows you to run models you have already downloaded. To view a list of available models, run
blackfish model ls --image=text-generation --refreshThis command shows all models that we can pass to the blackfish run text-generation command. Because we haven't downloaded any models yet (unless your profile connected to a shared model repo), our list is empty! Let's add a "tiny" model:
blackfish model add TinyLlama/TinyLlama-1.1B-Chat-v1.0 # This will take a minute...Once the model is done downloading, you can check that it is available by re-running the blackfish model ls --refresh command. We're now ready to spin up a text-generation service:
blackfish run --gres 1 --time 00:30:00 text-generation TinyLlama/TinyLlama-1.1B-Chat-v1.0 --api-key sealsaretastyThis command should produce similar output to:
β Found 49 models.
β Found 1 snapshots.
β No revision provided. Using latest available commit: fe8a4ea1ffedaf415f4da2f062534de366a451e6.
β Found model TinyLlama/TinyLlama-1.1B-Chat-v1.0!
β Started service: 55862e3b-c2c2-428d-ac2d-89bdfa911fa4
Take note of the service ID returned. We can use this ID to view more information about the service by running:
blackfish lsThe command should return a table like the following:
SERVICE ID IMAGE MODEL CREATED UPDATED STATUS PORT NAME PROFILE
55862e3b-c2c2 text_generation TinyLlama/TinyLlama-1.1B-Chat-v1.0 3 sec ago 3 sec ago PENDING None blackfish-77771 default
As you can see, the service is still waiting in the job queue (PENDING). It might take a few minutes for a Slurm job to start, and it will require additional time for the service to load after it starts. Until then, our service's status will be either SUBMITTED or STARTING. Now would be a good time to make some tea π«
Tip
While you're doing that, note that you can obtain additional information about an individual service with the blackfish details <service_id> command. Now back to that tea...
Now that we're refreshed, let's see how our service is getting along. Re-run the command above:
blackfish lsIf things went well, the service's status should now be HEALTHY. At this point, we can start using the service. Let's ask an important question:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sealsaretasty" \
-d '{
"messages": [
{"role": "system", "content": "You are an expert marine biologist."},
{"role": "user", "content": "Why are orcas so awesome?"}
],
"max_completion_tokens": 100,
"temperature": 0.1,
"stream": false
}' | jqThis request should generate a response like the following:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1191 100 910 100 281 1332 411 --:--:-- --:--:-- --:--:-- 1743
{
"id": "chatcmpl-93f94b03258044cba7ad8ada48b01e5b",
"object": "chat.completion",
"created": 1748628455,
"model": "/data/snapshots/fe8a4ea1ffedaf415f4da2f062534de366a451e6",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": null,
"content": "Orcas, also known as killer whales, are incredibly intelligent and social animals that are known for their incredible abilities. Here are some reasons why orcas are so awesome:\n\n1. Intelligence: Orcas are highly intelligent and have been observed using tools, communication, and social behavior to achieve their goals. They are also highly adaptable and can live in a variety of environments, including marine and freshwater habitats.\n\n2. Strength:",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "length",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 40,
"total_tokens": 140,
"completion_tokens": 100,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}Success! Our service is responding as expected. Feel free to play around with this model to your heart's delight. It should remain available for approximately thirty minutes in total (--time 00:30:00).
Tip
The text generation service runs an OpenAI-compatible vllm server. You can interact with text generation services using OpenAI's official Python library, openai. If you're already using openai to work with private models like ChatGPT, your existing scripts should work with minimal modification!
When you're done with the service, shut it down to return its resources to the cluster:
blackfish stop 55862e3b-c2c2-428d-ac2d-89bdfa911fa4If you run blackfish ls once more, you should see that the service is no longer listed: ls only displays active services by default. You can view all services by including the --all flag. Services remain in your services database until you explicit remove them, like so:
blackfish rm --filters id=55862e3b-c2c2-428d-ac2d-89bdfa911fa4Using Blackfish from your laptop requires a seamless (i.e., password-less) method of communicating with remote clusters. On many systems, this is simple to setup with the ssh-keygen and ssh-copy-id utilitites. First, make sure that you are connected to your institution's network or VPN (if required), then type the following at the command-line:
ssh-keygen -t rsa # generates ~/.ssh/id_rsa.pub and ~/.ssh/id_rsa
ssh-copy-id <user>@<host> # answer yes to transfer the public keyThese commands create a secure public-private key pair and send the public key to the HPC server you need access to. You now have password-less access to your HPC server!
Warning
Blackfish depends on seamless interaction with your university's HPC cluster. Before proceeding, make sure that you have enabled password-less login and are connected to your institutions network or VPN, if required.
Before we start using services, we'll need to initialize Blackfish and create a profile. Type
blackfish initat the command line. This command will prompt you to provide details for a default Blackfish profile. If you want to run services on your laptop by default, then your profile should look something like this:
name=default
type=local
home=/home/shamu/.blackfish # local directory
cache=/scratch/gpfs/shared/.blackfish # shared local directory to store model and image data
On the other hand, if you normally want to run services on a remote Slurm cluster, then your profile should look as follows:
name=default
type=slurm
host=della.princeton.edu
user=shamu
home=/home/shamu/.blackfish # directory on host
cache=/scratch/gpfs/shared/.blackfish # shared directory on host to store model and image data
For further details on profiles, refer to our documentation.
The current version of Blackfish does not ship Docker images required to run services. When running jobs locally, Docker will attempt to download the required image before starting the service, resulting in delays during the launching step. Instead, it's recommended that users pre-download the required images listed below.
Note
When running services on Slurm clusters, Blackfish looks for the required SIF file in $PROFILE_CACHE_DIR/images.
| Version | Text Generation | Speech Recognition | Object Detection |
|---|---|---|---|
| 0.1.0 | vllm-openai:0.8.4 | speech-recognition-inference:0.1.2 | - |
| 0.2.0 | vllm-openai:0.8.4 | speech-recognition-inference:0.1.2 | - |
Blackfish (or rather, the services Blackfish runs) does not guarantee support for every model available from the Hugging Face's Model Hub. As a practical matter, however, services support nearly all "popular" models listed under their corresponding pipeline, including many "quantized" models (in the case of LLMs). Below is an evolving list of models that we have tested on HPC, including the resources requested and utilized by the service.
The main requirement to run online inference is sufficient GPU memory. As a rule-of-thumb, the minimum memory required for a model is obtained by multiplying the number of parameters (in billions) times the number of bytes per parameter (dtype / 8). In practice, you need to budget an additional 5-10 GB for KV caching and keep in mind that default GPU utilization is typically set to around 90-95% by service images.
| Model | Pipeline | Supported | Chat | Gated | Reasoning | Embedding 1 | Memory | GPUs | Cores | Size | Dtype | Notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen/QwQ-32B | Text-generation | β | β | β | β | 16G | 61.0/160G | 4 | 32.8B | bf16 | Supports reasonsing. See docs. | |
| Qwen/Qwen3-32B | Text-generation | β | β | β | β | 16G | 64.4/160G | 4 | 32.8B | bf16 | Supports reasonsing. See docs. | |
| Qwen/Qwen2.5-72B | Text-generation | β | β | 16G | 144.8/320G | 4 | 72.7B | bf16 | Possible to fit on 2x80B by decreasing max_model_len or increasing gpu_memory_utilization. |
|||
| Qwen/Qwen2.5-72B-Instruct | Text-generation | β | β | β | 16G | 144.8/320g | 4 | 72.7B | bf16 | Possible to fit on 2x80B by decreasing max_model_len or increasing gpu_memory_utilization. |
||
| Qwen/Qwen2.5-32B | Text-generation | β | β | 16G | 63.1/80G | 4 | 32.8B | bf16 | ||||
| Qwen/Qwen2.5-32B-Instruct | Text-generation | β | β | β | 16G | 63.1/80G | 4 | 32.8B | bf16 | |||
| google/gemma-3-27b-it | Text-generation | β | β | β | β | 16G | 54.1/80G | 4 | 27.4B | bf16 | ||
| google/gemma-3-1b-it | Text-generation | β | β | β | β | 8G | /10G | 4 | 27.4B | bf16 | ||
| meta-llama/Llama-4-Scout-17B-16E-Instruct | Text-generation | β | β | β | 32G | /320G | 4 | 109B | bf16 | Supports multimodal inputs. See docs. | ||
| meta-llama/Llama-4-Scout-17B-16E | Text-generation | β | β | 32G | /320G | 4 | 109B | bf16 | Supports multimodal inputs. See docs. | |||
| meta-llama/Llama-3.3-70B-Instruct | Text-generation | β | β | β | β | 16G | 140.4/320G | 4 | 70.6B | bf16 | ||
| deepseek-ai/DeepSeek-R1-Distill-Llama-70B | Text generation | β | β | β | β | 16G | 141.2/320G | 4 | 70.6B | bf16 | Supports reasonsing. See docs. | |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | Text generation | β | β | β | β | 16G | 64.6/80G | 4 | 32.8B | bf16 | Supports reasonsing. See docs. | |
| deepseek-ai/DeepSeek-V2-Lite | Text generation | β | β | 16G | 30.5/40G | 4 | 15.7B | bf16 | ||||
| deepseek-ai/DeepSeek-V2-Lite-Chat | Text generation | β | β | β | 16G | 30.5/40G | 4 | 15.7B | bf16 | |||
| openai/whisper-large-v3 | Automatic-speech-recognition | β | - | 3.6/10G | 1 | 1.54B | f16 |
This is a monorepo containing:
| Package | Description |
|---|---|
| lib/ | Python backend (blackfish-ai) - CLI, server, services |
| web/ | Next.js frontend (blackfish-ui) - browser interface |
See the package READMEs for development setup instructions.
You can find additional details and examples on our official documentation page.
Footnotes
-
Models that can be used to retrieve embeddings with --task embed β©
