QUARC (QUAtitative Recommendations of reaction Conditions)

QUARC is a data-driven model for recommending agents, temperature, and equivalence ratios for organic synthesis (see paper).

Important

The QUARC models used in the paper rely on the NameRxn reaction classification codes as part of the model input. Specifically, the reaction class is encoded as a one-hot vector, requiring access to the full NameRxn code mapping.

Users with a Pistachio license can access the 2023Q4 reaction-type mapping (2271 NameRxn classes plus Unrecognized) from the Pistachio Reaction Types.csv file on the Pistachio webapp. Alternatively, you may email xiaoqis@mit.edu to obtain the file directly.
Users without NameRxn access can try our open-source version, which eliminates this dependency. This version is planned to be integrated into ASKCOS in the October release.

Quick Start (Inference Only)

If you just want to predict conditions for your reactions using the provided pretrained models:

Step 1/3: Environment Setup

# 1. Create conda environment
conda env create -f environment.yml -n quarc
conda activate quarc
pip install --no-deps -e .

# 2. Configure NameRxn Code Mapping (REQUIRED)
export PISTACHIO_NAMERXN_PATH="/path/to/your/Pistachio Reaction Types.csv"

# 3. Set data paths (or uses defaults in configs/quarc_config.yaml)
export DATA_ROOT="~/quarc/data"
export PROCESSED_DATA_ROOT="~/quarc/data/processed"

The 2023Q4 version of Pistachio Reaction Types.csv is required for compatibility with the pretrained models (requires 2272 classes for reaction class encoding). Using a different version may cause the model to fail.

Step 2/3: Download pretrained models

sh checkpoints/download_trained_models.sh

Step 3/3: Run Predictions

# Get predictions using the example input file
python scripts/inference.py \
    --config-path configs/ffn_pipeline.yaml \
    --input data/example_input.json \
    --output predictions.json \
    --top-k 5

Results will be in predictions.json with recommended agents, temperatures, and amounts. Atom-mapped SMILES are required for the GNN models.

Usage

Inference with custom data

Input Format

[
  {
    "rxn_smiles": "[CH3:1][O:2][C:3]...",
    "rxn_class": "1.8.7",
    "doc_id": "my_reaction_1"
  }
]

Model Options

# FFN models (works with any SMILES)
python scripts/inference.py \
    --config-path configs/ffn_pipeline.yaml \
    --input input.json \
    --output predictions.json \
    --top-k 5

# GNN models (requires atom-mapped SMILES)
python scripts/inference.py \
    --config-path configs/gnn_pipeline.yaml \
    --input input.json \
    --output predictions.json \
    --top-k 5

# Also supports pickle input (e.g., preprocessed test sets)
python scripts/inference.py \
    --config-path configs/ffn_pipeline.yaml \
    --input data/processed/overlap/overlap_test.pickle \
    --output predictions.json \
    --top-k 5

Retraining

Note: Requires Pistachio's density data and NameRxn access

Step 1/5: Environment Setup

Create conda environment and install dependencies:

conda env create -f environment.yml -n quarc
conda activate quarc
pip install --no-deps -e .

Configure paths using one of the following options:

Option 1: Environment Variables (Recommended)

Variables can be set directly in terminal or in a .env file.

# data paths
export DATA_ROOT="~/quarc/data"
export PROCESSED_DATA_ROOT="~/quarc/data/processed"
export CHECKPOINTS_ROOT="~/quarc/checkpoints"
export LOGS_ROOT="~/quarc/logs"

# needed for inference
export PISTACHIO_NAMERXN_PATH="/path/to/Pistachio Reaction Types.csv"

# needed for preprocessing
export PISTACHIO_DENSITY_PATH="/path/to/density.tsv"
export RAW_DIR="/path/to/pistachio/extract"

Option 2: Edit Configuration Files Edit configs/quarc_config.yaml to modify default paths.

The default values in src/quarc/settings.py are overridden by configs/quarc_config.yaml if present, then further overridden by environment variables, with each step taking precedence over the previous.

Step 2/5: Data Preprocessing

The preprocessing pipeline transforms raw Pistachio data into ReactionDatum objects that are used for training. You can configure the preprocessing pipeline in configs/preprocess_config.yaml. The dirs section contains placeholders paths that will be overridden by the environment variables. Details of the preprocessing pipeline are described here.

# Run complete preprocessing pipeline
python scripts/preprocess.py \
    --config configs/preprocess_config.yaml \
    --all

# Or individual steps
python scripts/preprocess.py \
    --config configs/preprocess_config.yaml \
    --chunk-json \
    --collect-dedup \
    ...

Note that running the --generate-agent-class step will overwrite the agent_encoder_list.json and agent_other_dict.json that we provide in the data/processed/ directory. If you want to use the provided agent_encoder_list.json and agent_other_dict.json, you can skip the --generate-agent-class step.

Step 3/5: Training

Run training for each stage:

# Example: stage 1 agent model gnn
python scripts/train.py \
    --stage 1 \
    --model-type gnn \
    --graph-hidden-size 1024 \
    --depth 2 \
    --hidden-size 2048 \
    --n-blocks 3 \
    --max-epochs 30 \
    --batch-size 512 \
    --max-lr 1e-3 \
    --logger-name stage1_gnn
    --output-size 1376 \
    --num-classes 1376 \
    ...

Details of training parameters can be found in src/quarc/cli/quarc_parser.py. For binned classification tasks, custom binning can be specified using the --binning-path argument. Example binning configs can be found in configs/binning_config.yaml.

For stage 1 agent prediction, the tensorboard logger only keeps track of the greedy search accuracy. You may want to perform offline beam search evaluation to select the best checkpoint.

Step 4/5: Create Pipeline Config

To chain the individually trained models together, you can create a new pipeline config file using the configs/ffn_pipeline.yaml as a template.

Optional: Optimize Stage Weights

By default, each stage in the pipeline is assigned an equal weight of 0.25. To improve the overall performance of chained models, you can tune these weights using hyperparameter optimization with Optuna:

python scripts/optimize_weights.py \
  --config-path configs/new_pipeline.yaml \
  --n-trials 30 \
  --sample-size 1000 \
  --use-top-k 5 # use top-5 accuracy as the objective

Tip

The optimization script uses the EnumeratePredictor to generate predictions and rank them on the fly. For faster optimization using a larger sample size, you can consider switching to the PrecomputedHierarchicalPredictor, which caches model predictions to avoid redundant computations.

Step 5/5: Inference

The new pipeline config file can be used for inference:

python scripts/inference.py \
    --config-path configs/new_pipeline.yaml \
    --input data/processed/overlap/overlap_test.pickle \
    --output predictions.json \
    --top-k 5

References

If you find our code or model useful, we kindly ask that you consider citing our work in your papers.

@article{Sun2025quarc,
  title={Data-Driven Recommendation of Agents, Temperature, and Equivalence Ratios for Organic Synthesis},
  author={Sun, Xiaoqi and Liu, Jiannan and Mahjour, Babak and Jensen, Klavs F and Coley, Connor W},
  journal={ChemRxiv},
  doi={10.26434/chemrxiv-2025-4wzkh},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
analysis		analysis
checkpoints		checkpoints
configs		configs
data		data
notebooks		notebooks
scripts		scripts
src/quarc		src/quarc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
quarc.png		quarc.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

QUARC (QUAtitative Recommendations of reaction Conditions)

Quick Start (Inference Only)

Step 1/3: Environment Setup

Step 2/3: Download pretrained models

Step 3/3: Run Predictions

Usage

Inference with custom data

Retraining

Step 1/5: Environment Setup

Step 2/5: Data Preprocessing

Step 3/5: Training

Step 4/5: Create Pipeline Config

Optional: Optimize Stage Weights

Step 5/5: Inference

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

coleygroup/quarc

Folders and files

Latest commit

History

Repository files navigation

QUARC (QUAtitative Recommendations of reaction Conditions)

Quick Start (Inference Only)

Step 1/3: Environment Setup

Step 2/3: Download pretrained models

Step 3/3: Run Predictions

Usage

Inference with custom data

Retraining

Step 1/5: Environment Setup

Step 2/5: Data Preprocessing

Step 3/5: Training

Step 4/5: Create Pipeline Config

Optional: Optimize Stage Weights

Step 5/5: Inference

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages