GAME Matcher Overview

A RESTful standalone service for the Genomic API for Model Evaluation (GAME) that maps free-text biological terms to canonical biological entities using a local LLM.

Useful Links

👉 View Latest Release & Matcher Logic

GAME Matcher Overview

Modules of GAME

All GAME modules are inherently interoperable by communicating via the RESTful GAME API protocol. For each benchmark, the Evaluator requests a prediction from the Predictor, which consults the Matcher to determine the closest task the Predictor can complete. Once the Matcher returns the best match over HTTP, the Predictor will complete its prediction and return it to the Evaluator, which will evaluate performance [Luthra et al. (2025)].

System Requirements

To run the Matcher, the host system (e.g. a GPU node) must have:

Apptainer (formerly, Singularity): Easy-to-use standard for running application containers.
NVIDIA GPU: Required for hardware acceleration of the LLM. The --nv flag must be used when running the container to enable GPU access.

For a different GPU, like AMD's, check out the Apptainer GPU Support documentation.

Usage

The Matcher container can be downloaded from Zenodo: https://zenodo.org/records/18203297.

Download the Matcher Container
```
wget -O matcher.sif [LINK]
```

Run the Matcher Server

This single command starts the container, launches a private Ollama server inside it, and starts the FastAPI server listening for HTTP requests.

# General Usage:
# apptainer run --nv --containall <sif_file> <IP_TO_LISTEN_ON> <PORT>

# Example: Run the matcher, listening on all network interfaces on port 8080
apptainer run --nv --containall matcher.sif 0.0.0.0 8080

Automated and Externalized Matching

GAME introduces a module called “Matcher”, which automatically maps the Evaluator's requested cell type, measured molecule (TF binding molecule/ protein and histone markers), and species with what a Predictor can provide. The Matcher is designed to perform this task by interpreting the relationship between terms through lexical, syntactic, and semantic matching.

Lexical matching: handles cases of direct string correspondence, such as finding the exact token A549 within a more descriptive choice like lung adenocarcinoma cell line: A549.
Syntactic matching: addresses structural variations and common abbreviations, such as hek-293 or SKNSH to HEK293 or SK-N-SH, respectively.
Semantic matching: uses biological knowledge to connect different terms that refer to the same entity, such as mapping the description chronic myelogenous leukemia cell line to its canonical name, K562.

How Queries Work Using a Local Large Language Model (LLM)

The Matcher bundles the gemma3:12b model and all necessary Python dependencies to map fuzzy, free-text user inputs to canonical terms from a controlled vocabulary. It operates as a standalone REST API service, accepting JSON-formatted POST requests and returning the best-matched term.

Running Gemma 3 locally means that the model operates directly on the hardware of the system it is deployed in, ensuring data privacy, reducing reliance on external cloud APIs, and lowering operational costs, which are crucial for sensitive or high-throughput workflows. Gemma 3 is also very light-weight and efficient to run on a single GPU or TPU.

Prompt Engineering

The Matcher uses highly specialized prompts for each biological category (cell_type, species, binding_{molecule}). These templates provide the LLM with a persona ("You are an expert..."), strict instructions on reasoning and output format, and few-shot examples to guide its behaviour.

The Recursive Tournament Algorithm

To efficiently match a term against a list of thousands of choices, the Matcher uses a recursive tournament-style elimination process:

Divide: The full list of choices is broken into small chunks (e.g. 20 items per chunk).
Compete: The LLM finds the single best match within each chunk (the "chunk champion").
Conquer: The champions from all chunks advance to a championship round. If there are too many champions for a single final round, they compete against each other in new elimination rounds. This process repeats recursively, like a tournament bracket, narrowing the field until a single, overall winner is determined.

This divide-and-conquer method allows the Matcher to scalably and efficiently find the best match from a massive search space.

API Reference

The Matcher communicates over HTTP using a standardized REST API.

Endpoint

URL: /match
Method: POST
Content-Type: application/json

Matcher Request Payload

The request payload must be a JSON object conforming to the schema below. The API enforces strict validation logic using the pydantic library to ensure data integrity.

Validation Rules:

Paired Fields: If you provide a requested term (e.g. cell_type_requested), you must also provide the corresponding list (e.g. cell_type_list).
Minimum Requirement: The request must contain at least one valid category pair to process. Empty requests will be rejected with a 422 Unprocessable Entity error.

Key	Value type - Required/Optional	Description	Example
`cell_type_requested`	`string` - Optional (Paired)	The fuzzy input term for the cell type requested by the Evaluator	`"Leukemia cell line"`
`cell_type_list`	`array of strings` - Optional (Paired)	The list of choices the Predictor can support to match against	`["K562", "A549", "HepG2"]`
`species_requested`	`string` - Optional (Paired)	The fuzzy input term for the species requested by the Evaluator	`"h_sap"`
`species_list`	`array of strings` - Optional (Paired)	The list of choices the Predictor can support to match against	`["Homo sapiens", "Mus musculus"]`
`binding_molecule_requested`	`string` - Optional (Paired)	The fuzzy input term for the binding molecule requested by the Evaluator	`"H3K4_trimethylation"`
`binding_molecule_list`	`array of strings` - Optional (Paired)	The list of choices the Predictor can support to match against	`["CTCF", "H3K4me3", "POLR2A"]`

Matcher Response Payload

The Matcher (server) sends back a JSON payload to the Predictor (client), containing the results of the matching tasks. An _actual key will be present for each category pair that was provided in the request.

Key	Value type	Description	Example
`cell_type_actual`	`string` or `null`	The best match from the `cell_type_list`	`"K562"`
`species_actual`	`string` or `null`	The best match from the `species_list`	`"Homo sapiens"`
`binding_molecule_actual`	`string` or `null`	The best match from the `binding_molecule_list`	`"H3K4me3"`
`matcher_version`	`string`	The version of the Matcher that processed the request.	`"2.0"`

Examples of Matcher Performance

The matcher performs better when more contextual information is provided (e.g. K562 (erythroid precursors) is better than just K562).

Cell Type Requested	Matched (Enformer CAGE)	Matched (Borzoi RNA)
K562 (erythroid precursors)	`chronic myelogenous leukemia cell line:K562`	`K562`
HepG2 (hepatocytes)	`hepatocellular carcinoma cell line: HepG2 ENCODE, biol_`	`HepG2`
SK-N-SH (neuroblastoma)	`neuroblastoma cell line:NB-1`	`SK-N-SH`
induced pluripotent stem cells (iPS cells; WTC11)	`Multipotent Cord Blood Unrestricted Somatic Stem Cells,`	`WTC11`
lymphoblasts	`Burkitt's lymphoma cell line:DAUDI`	`Jurkat, Clone E6-1`
Jurkat T cells	`acute lymphoblastic leukemia (T-ALL) cell line:Jurkat`	`Jurkat, Clone E6-1`
THP-1 monocytes	`acute myeloid leukemia (FAB M5) cell line:THP-1 (fresh)`	`CD14-positive monocyte`
lymphoblasts (k562)	`chronic myelogenous leukemia cell line:K562`	`K562`

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
images		images
rest_llm_matcher		rest_llm_matcher
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GAME Matcher Overview

Useful Links

Modules of GAME

System Requirements

Usage

Automated and Externalized Matching

How Queries Work Using a Local Large Language Model (LLM)

Prompt Engineering

The Recursive Tournament Algorithm

API Reference

Endpoint

Matcher Request Payload

Matcher Response Payload

Examples of Matcher Performance

About

Uh oh!

Releases 1

Packages

Languages

de-Boer-Lab/GAME_matcher

Folders and files

Latest commit

History

Repository files navigation

GAME Matcher Overview

Useful Links

Modules of GAME

System Requirements

Usage

Automated and Externalized Matching

How Queries Work Using a Local Large Language Model (LLM)

Prompt Engineering

The Recursive Tournament Algorithm

API Reference

Endpoint

Matcher Request Payload

Matcher Response Payload

Examples of Matcher Performance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages