DMP Chef

DMP Chef is an open-source (MIT License), Python-based pipeline that draft funder-compliant Data Management & Sharing Plan (DMPs) using a Large Language Model (LLM), such as Llama 3.3

It supports two modes entirely in Python:

RAG: Retrieves related guidance from an indexed document collection and uses it to ground the draft. In this mode, the pipeline can ingest documents, build and search an index, and draft a DMP.
No-RAG: Generates the draft only from the user’s project inputs (no retrieval).

This project is part of a broader extension of the DMP Tool platform. The ultimate goal is to integrate the DMP Chef pipeline into the DMP Tool platform, providing researchers with a familiar and convenient user interface that does not require any coding knowledge.

👉 Learn more: DMP-Chef.

Standards followed

The overall codebase is organized in alignment with the FAIR-BioRS guidelines. All Python code follows PEP 8 conventions, including consistent formatting, inline comments, and docstrings. Project dependencies are fully captured in requirements.txt. We also retain dmp-template as inside the prompt template used by the DMP generation workflow.

Main files for testing

main.py — Command-line entry point for running the pipeline end-to-end.
demo.ipynb — Jupyter demo showing.

Repository Structure

dmpchef/
│── main.py                 # CLI entry point (run pipeline end-to-end)
│── README.md               # Project overview + usage
│── requirements.txt        # Python dependencies
│── setup.py                # Packaging (editable installs via pip install -e .)
│── pyproject.toml          # Build system config (wheel builds)
│── MANIFEST.in             # Include non-code files in distributions
│── demo.ipynb              # Notebook demo: import + run generate()
│── LICENSE
│── .gitignore
│── .env                    # Local env vars (do not commit)
│
├── dmpchef/                # Installable Python package (public API)
│   ├── __init__.py         # Exports: generate, draft
│   └── api.py              # Importable API used by notebooks/backends
│
├── config/                 # Configuration
│   ├── config.yaml         # Main settings (models, paths, retriever params)
│   └── config_schema.py    # Pydantic schema for DMPCHEF-Pipeline config
│   └── schema_validate.py  # Validation/schema helpers for input.json 
│
├── data/                   # Local workspace data + artifacts (not guaranteed in wheel)
│   ├── inputs/             # Templates + examples
│   │   ├── nih-dms-plan-template.docx  # NIH blank Word template
│   │   └── input.json                  # Example request file
│   ├── vector_db/              # Vector index artifacts (e.g., FAISS)
|        ├── DMPtools_db/
|        ├── NIH_all_db/
|        └── NIH_sharing_db/
│   ├── data_ingestion/         # Source Pdfs and text from DMPtool+ NIH + NIH_sharing and etc
│   ├── outputs/            # Generated artifacts
│   │   ├── markdown/       # Generated Markdown DMPs
│   │   ├── docx/           # Generated DOCX DMPs (template-preserving)
│   │   ├── json/           # DMPTool-compatible JSON outputs
│   │   └── pdf/            # Optional PDFs converted from DOCX
│   
│
├── src/                    # Core implementation
│   ├── __init__.py
│   ├── core_pipeline.py    # Pipeline logic (RAG/no-RAG)
│   ├── Build_index.py      #Bulid index of vectore db
│   └── NIH_data_ingestion.py # NIH/DMPTool crawl → export PDFs to data/database
│
├── prompt/                 # Prompt templates/utilities
│   └── prompt_library.py
│
├── utils/                  # Shared helpers
│   ├── config_loader.py
│   ├── model_loader.py
│   ├── dmptool_json.py
│   └── nih_docx_writer.py
│   └── download_vector_db.py
│
├── logger/                 # Logging utilities
│   ├── __init__.py
│   └── custom_logger.py
│
├── exception/              # Custom exceptions
│   ├── __init__.py
│   └── custom_exception.py
│
├── notebook_DMP_RAG/       # Notebooks/experiments (non-production)
└── venv/                   # Local virtualenv

Setup (Local Development)

Step 1 — Clone the repository

git clone https://github.com/fairdataihub/dmpchef.git
cd dmpchef
code .

Step 2 — Create and activate a virtual environment

Windows (cmd):

python -m venv venv
venv\Scripts\activate.bat

macOS/Linux:

python -m venv venv
source venv/bin/activate

Step 3 — Install dependencies

pip install -r requirements.txt
# or (recommended for local dev)
pip install -e .

Step 4 — Configure Large Language Models

Llama 3.3 (via Ollama)

Install Ollama from:
https://ollama.com/
Pull the llama3.3:

ollama pull llama3.3:70b

Step 4 — Run DMP Chef

Option A — Jupyter demo

Use demo.ipynb.

Option B — CLI: Command-line entry point for running the pipeline end-to-end

Use main.py

Inputs

Input.JSON: A single JSON file (e.g., data/inputs/input.json) that tells the pipeline what to generate. Before execution, the request is validated against Schema.JSON using the schema_validate validator.

{
  "config": { ... },
  "inputs": { ... }
}

`config` (Execution Settings)

config.funding.agency: Funder key (string; NIH|NSF|OTHER)
config.funding.subagency: sub-agency (string; optional)
config.pipeline.rag: true / false (boolean flags; If omitted, the pipeline uses the YAML default (rag.enabled)).
config.pipeline.llm: LLM settings (boolean flags; e.g., provider, model_name).
config.export: Output (boolean flags; md, docx, pdf, dmptool_json)

`inputs`

inputs: A dictionary of user/project fields used to draft the plan include:
- research_context, data_types, data_source, human_subjects, consent_status, data_volume, etc.

Outputs

Markdown: the generated funder-aligned DMP narrative (currently NIH structure).
DOCX: generated using the funder template (NIH template today) to preserve official formatting.
PDF: created by converting the DOCX (platform-dependent; typically works on Windows/macOS with Word).
JSON: a DMPTool-compatible JSON file.

License

This work is licensed under the MIT License. See LICENSE for more information.

Feedback and contribution

Use GitHub Issues to submit feedback, report problems, or suggest improvements.
You can also fork the repository and submit a Pull Request with your changes.

How to cite

If you use this code, please cite this repository using the versioned DOI on Zenodo for the specific release you used (instructions will be added once the Zenodo record is available). For now, you can reference the repository here: fairdataihub/dmpchef.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DMP Chef

Standards followed

Main files for testing

Repository Structure

Setup (Local Development)

Step 1 — Clone the repository

Step 2 — Create and activate a virtual environment

Step 3 — Install dependencies

Step 4 — Configure Large Language Models

Step 4 — Run DMP Chef

Option A — Jupyter demo

Option B — CLI: Command-line entry point for running the pipeline end-to-end

Inputs

`config` (Execution Settings)

`inputs`

Outputs

License

Feedback and contribution

How to cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 274 Commits
config		config
data		data
dmpchef		dmpchef
exception		exception
logger		logger
model		model
notebook_DMP_RAG		notebook_DMP_RAG
prompt		prompt
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
demo.ipynb		demo.ipynb
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

DMP Chef

Standards followed

Main files for testing

Repository Structure

Setup (Local Development)

Step 1 — Clone the repository

Step 2 — Create and activate a virtual environment

Step 3 — Install dependencies

Step 4 — Configure Large Language Models

Step 4 — Run DMP Chef

Option A — Jupyter demo

Option B — CLI: Command-line entry point for running the pipeline end-to-end

Inputs

config (Execution Settings)

inputs

Outputs

License

Feedback and contribution

How to cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`config` (Execution Settings)

`inputs`

Packages