CBRdb: A Curated Biochemical Reaction database for precise Biochemical Reaction Analysis

🗺️ Overview

We present CBR-db, a curated biochemical database that integrates and refines data from KEGG and ATLAS databases to support precise analyses of biochemical reaction data. This curation effort addresses key limitations, such as the presence of malformed chemical representations, inaccurate stoichiometry, and ambiguous or incomplete reaction entries. These key improvements in CBR-db could facilitate the accurate analysis of physicochemical and thermodynamic properties, which are essential for modeling the evolution of chemical reactions in research areas ranging from prebiotic chemistry to metabolic network expansion. Altogether, CBR-db features a high number of high-quality reactions and compounds. The CBR-db is designed to be continuously updated, incorporating the latest releases from the KEGG and ATLAS databases. Furthermore, it provides a framework so the network can be extended and further issues can be addressed.

To start everything locally, you can look at run_all.py, which will take you through the whole data pipeline. Otherwise, you can run each part by itself. To sidestep installation checkout Binder.

For further information, see the preprint on ChemRxiv.

📋 Features

Integration of KEGG and ATLAS biochemical reaction databases.
Curation of compounds and reactions.
Thermodynamic properties.
Reaction pruning based on reaction feasibility and similarity.

Example fixes and receipts of changes can be found in the data directory. See for example C_IDs_bad.dat for a list of KEGG compounds that were removed or adjusted.

💡 Getting Started

To get started with the data, you can download the compounds CBRdb_C.csv and reactions CBRdb_R.csv from the repository. For convenience, the files are also hosted on Zenodo and are stored as a compressed format to reduce download time.

import pandas as pd
import os

# Download CBRdb compounds
os.system('wget https://github.com/ELIFE-ASU/CBRdb/blob/main/CBRdb_C.csv.zip')

# Load CBRdb compounds
compounds = pd.read_csv('CBRdb_C.csv.zip')
compounds.head()

# Download CBRdb reactions
os.system('wget https://github.com/ELIFE-ASU/CBRdb/blob/main/CBRdb_R.csv.zip')

# Load CBRdb reactions
reactions = pd.read_csv('CBRdb_R.csv.zip')
reactions.head()

Below is a breakdown description of the columns in each of the data files. Each description table is collapsible for easier viewing, but the subcategories are shown for clarity. In the tables below, the Entry Title column corresponds to the column headers in the CSV files and the Description column provides a brief explanation of each entry.

CBRdb_C

General information:

Entry Title	Description
compound_id	Unique identifier for each compound in the CBRdb database.
smiles	SMILES representation of the compound.
inchi	InChI representation of the compound.
formula	Molecular formula of the compound.
molecular_weight	Molecular weight of the compound.
n_heavy_atoms	number of heavy atoms in the compound
n_chiral_centers	number of chiral centers in the compound.
formal_charge	Formal charge of the compound.
smiles_capped	SMILES representation with hydrogen capped R groups.
inchi_capped	InChI representation with hydrogen capped R groups.
nickname	Common alias for the compound; first name in KEGG.
comment	KEGG's comments with context about the compound.
CBRdb_R_ids	List of reaction IDs in CBRdb that involve this compound.

Complexity scores and indices:

Entry Title	Description
assembly_index	Assembly index of the compound.
wiener_index	Wiener index of the compound.
unique_bonds	Number of unique bonds in the compound.
spacial_score	Spacial score of the compound.
randic_index	Randic index of the compound.
proudfoot	Proudfoot score of the compound.
mc2	Molecular complexity score 2.
mc1	Molecular complexity score 1.
kirchhoff_index	Kirchhoff index of the compound.
fcfp4	FCFP4 fingerprint of the compound.
bertz	Bertz complexity of the compound.
balaban_index	Balaban index of the compound.

Properties:

Entry Title	Description
ionization_states	Ionization states of the compound. For a pH range of 4 to 10.
std_dgf	Free energy of formation using experimental values.
std_dgf_error	Error in the free energy of formation using experimental values.
std_dgf_p	Free energy of formation at pressure using experimental values.
std_dgf_p_error	Error in free energy of formation at pressure using experimental values.
d_free	Free energy of formation using quantum methods.
d_enthalpy	Enthalpy of formation using quantum methods.
d_entropy	Entropy of formation using quantum methods.
free	Free energy using quantum methods.
enthalpy	Enthalpy using quantum methods.
entropy	Entropy energy using quantum methods.
vib_energies	List of vibration modes using quantum methods.

KEGG specific information:

Entry Title	Description
kegg_type	Type for peptide or polyketide entry.
kegg_sequence	Sequence for peptide or polyketide entry.
kegg_reaction	KEGG REACTION entries associated with the compound.
kegg_pathway	KEGG PATHWAY entries associated with the compound.
kegg_organism	For peptides or polyketides, KEGG organism with sequence.
kegg_network	KEGG NETWORK entries associated with the compound.
kegg_mol_weight	Molecular weight provided by KEGG.
kegg_module	KEGG MODULE entries associated with the compound.
kegg_glycan	KEGG GLYCAN entries associated with the compound.
kegg_gene	KEGG GENES entries associated with the compound.
kegg_formula	Molecular formula provided by KEGG.
kegg_exact_mass	Exact (single-isotope) mass provided by KEGG.
kegg_enzyme	KEGG ENZYME entries associated with the compound.
kegg_drug	KEGG DRUG entries associated with the compound.
kegg_brite_full	Full KEGG BRITE hierarchies with context, all levels.
kegg_brite	KEGG BRITE hierarchy entries associated with the compound.

Database cross-references:

Entry Title	Description
PubChem	Linked PubChem identifiers for the compound.
PDB_CCD	Linked PDB CCD identifiers for the compound.
NIKKAJI	Linked NIKKAJI identifiers for the compound.
LIPIDMAPS	Linked LIPIDMAPS identifiers for the compound.
KNApSAcK	Linked KNApSAcK identifiers for the compound.
Drug_group	Linked KEGG DGROUP entries for the compound.
ChEBI	Linked ChEBI identifiers for the compound.
CAS	Linked CAS identifiers for the compound.
ATC_code	Linked ATC codes of the compound.

CBRdb_R

General information:

Entry Title	Description
id	Reaction identifier.
reaction	Reaction equation.
ec	Enzyme Commission numbers.
name	Reaction name.
smarts	SMARTS representation.
reac_sim	Closest reaction similarity.
rhea	Rhea identifiers if provided.
CBRdb_C_ids	Corresponding CBRdb C identifiers.
id_orig	Original reaction identifier(s).

KEGG specific information:

Entry Title	Description
module	KEGG MODULE identifiers.
orthology	KEGG ORTHOLOGY identifiers.
pathway	KEGG PATHWAY identifiers.
rclass	KEGG RCLASS IDs & COMPOUND pairs.
remark	KEGG remark.
comment	Unstructured KEGG comment field.
overall	Flag: Overall reaction (br08210).

ATLAS specific information:

Entry Title	Description
most_sim_kegg	Most similar KEGG reaction (MSK). *
bridgit_score	BridgIT score for most_sim_kegg. *

Data flags:

Entry Title	Description
flags	From CBRdb.df_of_suspect_reactions
balancer_failed	Flag: Balancer failed.
bool_missing_data	Flag: Missing structure(s).
bool_var_list	Flag: Variable in coefficients.
cpd_starred	Flag: Has starred compounds.
els_and_stars_balance	Flag: Elements & stars are balanced

* Provided by ATLAS; see ATLAS User Guide for most_similar_kegg header details.

To get started with deeper components of CBRdb, follow the installation instructions below. After installation, you can start using the CBRdb package in your Python scripts or Jupyter notebooks.

🔧 Installation

Here you can find the full installation instructions. See also Binder.

Using conda env

Using conda is the recommended way to install the required packages to help with dependencies. First, lets clone the repo so that we have a local copy.

git clone https://github.com/ELIFE-ASU/CBRdb.git

Change into the CBRdb directory.

cd CBRdb

Change into the build tools directory.

cd build_tools

Create the conda environment.

conda env create -f environment.yml

Activate the conda environment.

conda activate cbrdb

Lets leave the build tools directory.

cd ..

Install CBRdb.

pip install -e .

If you want to install manually, follow the instructions below.

Manually

Fresh environment

It is recommended that you start from a fresh environment to prevent issues.

conda create -n cbrdb_env python=3.13

Activate the new env.

conda activate cbrdb_env

Add the conda-forge channel.

conda config --env --add channels conda-forge

Best to make them strict

conda config --set channel_priority true

Make sure to upgrade the conda env to force the channel priority.

conda update conda --all -y

Install the requirements

conda install numpy sympy matplotlib networkx pandas rdkit chempy requests urllib3 chemparse ase pymatgen -y

Optional extras

The main optional extra to install is ORCA and/or MACE. You will need these if you are interested in doing ab initio chemistry calculations. For ORCA, head to their downloads page. For MACE, you will need to make sure you have PyTorch. Head to the official PyTorch installation instructions page. An example might look like:

pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu129

Then, proceed to install MACE. When you first run MACE, downloading a model might take a while.

pip3 install mace-torch

MACE offers a massive speed-up on GPUs but can run on a CPU. For more thermodynamic calculations, we use equilibrator-api.

conda install equilibrator-api

CBR-db install

Then install CBRdb:

pip3 install git+https://github.com/ELIFE-ASU/CBRdb.git

⚠️ Issue Tracker

Found a bug? Have an enhancement request? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as possible about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

👥 Contributors

Louie Slocombe, Camerian Millsaps, M.Reza Shahjahan, and Kamesh Narasimhan.

⚖️ License

MIT License. We ask that you cite the relevant papers, please!

📚 References

If you use this code in your work, you must reference the following:

Hafner, J., MohammadiPeyhani, H., Sveshnikova, A., Scheidegger, A., & Hatzimanikatis, V. (2020). Updated ATLAS of biochemistry with new metabolites and improved enzyme prediction power. ACS synthetic biology, 9(6), 1479-1482.
Hadadi, N., Hafner, J., Shajkofci, A., Zisaki, A., & Hatzimanikatis, V. (2016). ATLAS of biochemistry: a repository of all possible biochemical reactions for synthetic biology and metabolic engineering studies. ACS synthetic biology, 5( 10), 1155-1166.
Kanehisa, M., & Goto, S. (2000). KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research, 28(1), 27-30.

Please take a look at the .bib file.

Name		Name	Last commit message	Last commit date
Latest commit History 901 Commits
CBRdb		CBRdb
atom_tracking		atom_tracking
build_tools		build_tools
data		data
examples		examples
hpc		hpc
tests		tests
.gitignore		.gitignore
CBRdb.bib		CBRdb.bib
CBRdb_C.csv		CBRdb_C.csv
CBRdb_C.csv.zip		CBRdb_C.csv.zip
CBRdb_R.csv		CBRdb_R.csv
CBRdb_R.csv.zip		CBRdb_R.csv.zip
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CBRdb: A Curated Biochemical Reaction database for precise Biochemical Reaction Analysis

🗺️ Overview

📋 Features

💡 Getting Started

🔧 Installation

Fresh environment

Install the requirements

Optional extras

CBR-db install

⚠️ Issue Tracker

👥 Contributors

⚖️ License

📚 References

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

ELIFE-ASU/CBRdb

Folders and files

Latest commit

History

Repository files navigation

CBRdb: A Curated Biochemical Reaction database for precise Biochemical Reaction Analysis

🗺️ Overview

📋 Features

💡 Getting Started

🔧 Installation

Fresh environment

Install the requirements

Optional extras

CBR-db install

⚠️ Issue Tracker

👥 Contributors

⚖️ License

📚 References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages