Project Starter

A template for Python research projects. Provides standardised environment setup, config-driven parameter management, structured logging, large file sync via HuggingFace, and artifact tracking.

Checklist

Do setup
Checkout llm-utils to the latest commit
Create symlink with llm-utils/setup/.venv and wherever your actual env is
Add all and commit

Reproduction

This project uses Python with uv for dependency management. See setup/README.md for full instructions.

Quick start (from project root):

git pull <url> --recursive
cd <project_name>
cd setup && uv sync
cd ..
cd llm-utils/setup && uv sync
cd ../../
source setup/.venv/bin/activate

Then fill in your local values in configs/private_vars.yaml (replacing any PLACEHOLDER entries) and generate the shell config:

python configs/create_env_file.py

Maybe a line on pulling data

Config Files

All .yaml files in configs/ are automatically merged into the parameters dict used throughout the codebase. See configs/README.md for what each variable does and how to add new ones.

private_vars.yaml — machine-specific paths and credentials. Never shared as-is.
project_vars.yaml — project-level settings (seeds, result paths, etc.)

Core Utilities

Function / Class	Description	Example
`load_parameters`	Loads and merges all YAML configs into a single dict. Safe to call with an existing dict — returns it immediately if already loaded.	`parameters = load_parameters(parameters)`
`log_info`, `log_warn`, `log_error`, `log_dict`	Structured logging. Always pass `parameters` to write to the log file in addition to console. `log_error` terminates execution.	`log_info("Done", parameters=parameters)`
`write_meta`	Saves a hyperparameter dict to a YAML file, named by a content hash. Use this whenever you produce an artifact that has configuration you want to track.	`meta_hash = write_meta("results/run/", args, parameters)`
`add_meta_details`	Returns a copy of a meta dict with additional fields merged in.	`extended = add_meta_details(args, {"epoch": 5})`
`sync_data.py`	Syncs the local `sync_dir` with a HuggingFace Dataset repo. Use for sharing large files (models, datasets) across machines.	`python sync_data.py pull` / `push` / `init`
`utils/lm_inference.py`	LM and VLM inference via OpenAI-compatible APIs (`OpenAIModel`, `vLLMModel`, `OpenRouterModel`), Anthropic (`AnthropicModel`), and local HuggingFace models (`HuggingFaceModel`). Supports text and image inputs.	`model = OpenAIModel(model="gpt-4o"); outputs = model.infer(texts, max_new_tokens=256)`
`paired_bootstrap`	Statistical significance test comparing two systems via paired bootstrap resampling. Returns the achieved p-value. Optionally logs win ratios and 95% confidence intervals.	`p = paired_bootstrap(sys1_scores, sys2_scores, verbose=True, parameters=parameters)`
`llm-utils` (submodule)	Efficient, scalable, offline, batched LLM/VLM inference via HuggingFace Transformers and vLLM. Also supports pretraining, SFT, DPO, and unlearning. Entry point: `llm-utils/infer.py`. Called via `scripts/llm-utils.sh`.	`bash scripts/llm-utils.sh --input data.csv --model_name <name> hf --batch_size 8`

Running Code

Python entry points follow the click pattern. See main.py for the template:

python main.py [--global_option value] subcommand [--subcommand_option value]

For bash scripts, see BASH_TEMPLATE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
.claude		.claude
configs		configs
llm-utils @ c5b44dc		llm-utils @ c5b44dc
scripts		scripts
setup		setup
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
BASH_TEMPLATE.md		BASH_TEMPLATE.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
sync_data.py		sync_data.py
test_inference.py		test_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Starter

Checklist

Reproduction

Config Files

Core Utilities

Running Code

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Starter

Checklist

Reproduction

Config Files

Core Utilities

Running Code

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages