Skip to content

princeton-ddss/blackfish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

513 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GitHub Release GitHub Actions Workflow Status coverage PyPI License

Blackfish

album-cover

Blackfish is an open source "ML-as-a-Service" (MLaaS) platform that helps researchers use state-of-the-art, open source artificial intelligence and machine learning models. With Blackfish, researchers can spin up their own version of popular public cloud services (e.g., ChatGPT, Amazon Transcribe, etc.) using high-performance computing (HPC) resources already available on campus.

The primary goal of Blackfish is to facilitate transparent and reproducible research based on open source machine learning and artificial intelligence. We do this by providing mechanisms to run user-specified models with user-defined configurations. For academic research, open source models present several advantages over closed source models. First, whereas large-scale projects using public cloud services might cost $10K to $100K for similar quality results, open source models running on HPC resources are free to researchers. Second, with open source models you know exactly what model you are using and you can easily provide a copy of that model to other researchers. Closed source models can and do change without notice. Third, using open-source models allows complete transparency into how your data is being used.

Why should you use Blackfish?

1. It's easy! 🌈

Researchers should focus on research, not tooling. We try to meet researchers where they're at by providing multiple ways to work with Blackfish, including a Python API, a command-line tool (CLI), and a browser-based user interface (UI).

Don't want to install a Python package? Ask your HPC admins to install Blackfish OnDemand!

2. It's transparent 🧐

You decide what model to run (down to the Git commit) and how you want it configured. There are no unexpected (or undetected) changes in performance because the model is always the same. All services are private, so you know exactly how your data is being handled.

3. It's free! πŸ’Έ

You have an HPC cluster. We have software to run on it.

Installation

Blackfish is a pip-installable python package. We recommend installing Blackfish to its own virtual environment, for example:

python -m venv .venv
source .venv/bin/activate
pip install blackfish-ai

Or, using uv:

uv venv
uv pip install blackfish-ai

The following command should return the path of the installed application if installation was successful:

which blackfish

Quickstart

HPC Setup

Before you begin using Blackfish, you'll need to initialize the application. To do so, type

blackfish init

at the command line. This command will prompt you to provide details for a Blackfish profile. Let's create a default profile that will allow us to run services on compute nodes via the Slurm job scheduler:

name=default
type=slurm
host=localhost
user=shamu
home=/home/shamu/.blackfish
cache=/scratch/gpfs/shared/.blackfish

The cache is a shared directory set up by your HPC admin for storing shared model and image files. This quickstart assumes you have access to a cache directory with all required Docker images downloaded. If your HPC does not have a cache set up, you can assign the same directory used for home and add the images yourself.

Once Blackfish is properly initialized, you can run the blackfish start command to launch the application:

blackfish start

If everything is working, you should see output like the following:

INFO:     Added class SpeechRecognition to service class dictionary. [2025-02-24 11:55:06.639]
INFO:     Added class TextGeneration to service class dictionary. [2025-02-24 11:55:06.639]
WARNING:  Blackfish is running in debug mode. API endpoints are unprotected. In a production
          environment, set BLACKFISH_DEBUG=0 to require user authentication. [2025-02-24 11:55:06.639]
INFO:     Upgrading database... [2025-02-24 11:55:06.915]
WARNING:  Current configuration will not reload as not all conditions are met, please refer to documentation.
INFO:     Started server process [58591]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)

Congratulationsβ€”Blackfish is now up and running! The application serves the user interface as well as endpoints to manage services and Blackfish itself. The rest of this guide will walk through how to use the CLI to interact with these endpoints.

Running Services

Let's start by exploring what services are available. In a new terminal (with your virtual environment activated), type

blackfish run --help

The output displays a list of available commands. One of these is commands text-generation, which launches a service to generate text given an input prompt or message history (for models supporting chat). There are many models to choose from to perform this task, but Blackfish only allows you to run models you have already downloaded. To view a list of available models, run

blackfish model ls --image=text-generation --refresh

This command shows all models that we can pass to the blackfish run text-generation command. Because we haven't downloaded any models yet (unless your profile connected to a shared model repo), our list is empty! Let's add a "tiny" model:

blackfish model add TinyLlama/TinyLlama-1.1B-Chat-v1.0  # This will take a minute...

Once the model is done downloading, you can check that it is available by re-running the blackfish model ls --refresh command. We're now ready to spin up a text-generation service:

blackfish run --gres 1 --time 00:30:00 text-generation TinyLlama/TinyLlama-1.1B-Chat-v1.0 --api-key sealsaretasty

This command should produce similar output to:

βœ” Found 49 models.
βœ” Found 1 snapshots.
⚠ No revision provided. Using latest available commit: fe8a4ea1ffedaf415f4da2f062534de366a451e6.
βœ” Found model TinyLlama/TinyLlama-1.1B-Chat-v1.0!
βœ” Started service: 55862e3b-c2c2-428d-ac2d-89bdfa911fa4

Take note of the service ID returned. We can use this ID to view more information about the service by running:

blackfish ls

The command should return a table like the following:

SERVICE ID      IMAGE             MODEL                                CREATED     UPDATED     STATUS    PORT   NAME              PROFILE
55862e3b-c2c2   text_generation   TinyLlama/TinyLlama-1.1B-Chat-v1.0   3 sec ago   3 sec ago   PENDING   None   blackfish-77771   default

As you can see, the service is still waiting in the job queue (PENDING). It might take a few minutes for a Slurm job to start, and it will require additional time for the service to load after it starts. Until then, our service's status will be either SUBMITTED or STARTING. Now would be a good time to make some tea πŸ«–

Tip

While you're doing that, note that you can obtain additional information about an individual service with the blackfish details <service_id> command. Now back to that tea...

Now that we're refreshed, let's see how our service is getting along. Re-run the command above:

blackfish ls

If things went well, the service's status should now be HEALTHY. At this point, we can start using the service. Let's ask an important question:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sealsaretasty" \
  -d '{
        "messages": [
            {"role": "system", "content": "You are an expert marine biologist."},
            {"role": "user", "content": "Why are orcas so awesome?"}
        ],
        "max_completion_tokens": 100,
        "temperature": 0.1,
        "stream": false
    }' | jq

This request should generate a response like the following:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1191  100   910  100   281   1332    411 --:--:-- --:--:-- --:--:--  1743
{
  "id": "chatcmpl-93f94b03258044cba7ad8ada48b01e5b",
  "object": "chat.completion",
  "created": 1748628455,
  "model": "/data/snapshots/fe8a4ea1ffedaf415f4da2f062534de366a451e6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "Orcas, also known as killer whales, are incredibly intelligent and social animals that are known for their incredible abilities. Here are some reasons why orcas are so awesome:\n\n1. Intelligence: Orcas are highly intelligent and have been observed using tools, communication, and social behavior to achieve their goals. They are also highly adaptable and can live in a variety of environments, including marine and freshwater habitats.\n\n2. Strength:",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 40,
    "total_tokens": 140,
    "completion_tokens": 100,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

Success! Our service is responding as expected. Feel free to play around with this model to your heart's delight. It should remain available for approximately thirty minutes in total (--time 00:30:00).

Tip

The text generation service runs an OpenAI-compatible vllm server. You can interact with text generation services using OpenAI's official Python library, openai. If you're already using openai to work with private models like ChatGPT, your existing scripts should work with minimal modification!

When you're done with the service, shut it down to return its resources to the cluster:

blackfish stop 55862e3b-c2c2-428d-ac2d-89bdfa911fa4

If you run blackfish ls once more, you should see that the service is no longer listed: ls only displays active services by default. You can view all services by including the --all flag. Services remain in your services database until you explicit remove them, like so:

blackfish rm --filters id=55862e3b-c2c2-428d-ac2d-89bdfa911fa4

Local Setup

Using Blackfish from your laptop requires a seamless (i.e., password-less) method of communicating with remote clusters. On many systems, this is simple to setup with the ssh-keygen and ssh-copy-id utilitites. First, make sure that you are connected to your institution's network or VPN (if required), then type the following at the command-line:

ssh-keygen -t rsa # generates ~/.ssh/id_rsa.pub and ~/.ssh/id_rsa
ssh-copy-id <user>@<host> # answer yes to transfer the public key

These commands create a secure public-private key pair and send the public key to the HPC server you need access to. You now have password-less access to your HPC server!

Warning

Blackfish depends on seamless interaction with your university's HPC cluster. Before proceeding, make sure that you have enabled password-less login and are connected to your institutions network or VPN, if required.

Local Profile

Before we start using services, we'll need to initialize Blackfish and create a profile. Type

blackfish init

at the command line. This command will prompt you to provide details for a default Blackfish profile. If you want to run services on your laptop by default, then your profile should look something like this:

name=default
type=local
home=/home/shamu/.blackfish # local directory
cache=/scratch/gpfs/shared/.blackfish # shared local directory to store model and image data

Slurm Profile

On the other hand, if you normally want to run services on a remote Slurm cluster, then your profile should look as follows:

name=default
type=slurm
host=della.princeton.edu
user=shamu
home=/home/shamu/.blackfish # directory on host
cache=/scratch/gpfs/shared/.blackfish # shared directory on host to store model and image data

For further details on profiles, refer to our documentation.

Images

The current version of Blackfish does not ship Docker images required to run services. When running jobs locally, Docker will attempt to download the required image before starting the service, resulting in delays during the launching step. Instead, it's recommended that users pre-download the required images listed below.

Note

When running services on Slurm clusters, Blackfish looks for the required SIF file in $PROFILE_CACHE_DIR/images.

Version Text Generation Speech Recognition Object Detection
0.1.0 vllm-openai:0.8.4 speech-recognition-inference:0.1.2 -
0.2.0 vllm-openai:0.8.4 speech-recognition-inference:0.1.2 -

Models

Blackfish (or rather, the services Blackfish runs) does not guarantee support for every model available from the Hugging Face's Model Hub. As a practical matter, however, services support nearly all "popular" models listed under their corresponding pipeline, including many "quantized" models (in the case of LLMs). Below is an evolving list of models that we have tested on HPC, including the resources requested and utilized by the service.

The main requirement to run online inference is sufficient GPU memory. As a rule-of-thumb, the minimum memory required for a model is obtained by multiplying the number of parameters (in billions) times the number of bytes per parameter (dtype / 8). In practice, you need to budget an additional 5-10 GB for KV caching and keep in mind that default GPU utilization is typically set to around 90-95% by service images.

Model Pipeline Supported Chat Gated Reasoning Embedding 1 Memory GPUs Cores Size Dtype Notes
Qwen/QwQ-32B Text-generation βœ… βœ… βœ… βœ… 16G 61.0/160G 4 32.8B bf16 Supports reasonsing. See docs.
Qwen/Qwen3-32B Text-generation βœ… βœ… βœ… βœ… 16G 64.4/160G 4 32.8B bf16 Supports reasonsing. See docs.
Qwen/Qwen2.5-72B Text-generation βœ… βœ… 16G 144.8/320G 4 72.7B bf16 Possible to fit on 2x80B by decreasing max_model_len or increasing gpu_memory_utilization.
Qwen/Qwen2.5-72B-Instruct Text-generation βœ… βœ… βœ… 16G 144.8/320g 4 72.7B bf16 Possible to fit on 2x80B by decreasing max_model_len or increasing gpu_memory_utilization.
Qwen/Qwen2.5-32B Text-generation βœ… βœ… 16G 63.1/80G 4 32.8B bf16
Qwen/Qwen2.5-32B-Instruct Text-generation βœ… βœ… βœ… 16G 63.1/80G 4 32.8B bf16
google/gemma-3-27b-it Text-generation βœ… βœ… βœ… ❌ 16G 54.1/80G 4 27.4B bf16
google/gemma-3-1b-it Text-generation βœ… βœ… βœ… βœ… 8G /10G 4 27.4B bf16
meta-llama/Llama-4-Scout-17B-16E-Instruct Text-generation βœ… βœ… βœ… 32G /320G 4 109B bf16 Supports multimodal inputs. See docs.
meta-llama/Llama-4-Scout-17B-16E Text-generation βœ… βœ… 32G /320G 4 109B bf16 Supports multimodal inputs. See docs.
meta-llama/Llama-3.3-70B-Instruct Text-generation βœ… βœ… βœ… βœ… 16G 140.4/320G 4 70.6B bf16
deepseek-ai/DeepSeek-R1-Distill-Llama-70B Text generation βœ… βœ… βœ… βœ… 16G 141.2/320G 4 70.6B bf16 Supports reasonsing. See docs.
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B Text generation βœ… βœ… βœ… βœ… 16G 64.6/80G 4 32.8B bf16 Supports reasonsing. See docs.
deepseek-ai/DeepSeek-V2-Lite Text generation βœ… βœ… 16G 30.5/40G 4 15.7B bf16
deepseek-ai/DeepSeek-V2-Lite-Chat Text generation βœ… βœ… βœ… 16G 30.5/40G 4 15.7B bf16
openai/whisper-large-v3 Automatic-speech-recognition βœ… - 3.6/10G 1 1.54B f16

Development

This is a monorepo containing:

Package Description
lib/ Python backend (blackfish-ai) - CLI, server, services
web/ Next.js frontend (blackfish-ui) - browser interface

See the package READMEs for development setup instructions.

Want to learn more?

You can find additional details and examples on our official documentation page.

Footnotes

  1. Models that can be used to retrieve embeddings with --task embed ↩

About

Machine learning as a service (MLaaS) for busy researchers

Resources

License

Contributing

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published