Lipid Nanoparticle Bayesian Optimization (LNPBO)

Figure: LNPBO is a Python implementation of Bayesian optimization tailored for lipid nanoparticle optimization.

Description

LNPBO is an open source method for implementing Bayesian optimization (BO) for lipid nanoparticle (LNP) optimization, as introduced in our paper (under review).

Things LNPBO can do:

Suggest new LNPs to make based on initial batch
Back-and-forth suggest new LNPs based on subsequent batches

To do these things, LNPBO encodes the inputted LNPs. Besides molar ratios, featurizing lipid identity involves generating encodings of Morgan fingerprints, Mordred descriptors, or LiON model fingerprints. The inputted LNPs are expected to follow a column format similar to what we introduced in LNPDB. We detail this input organization in the Usage section, where we walk through examples that illustrate what LNPBO can do.

Future version of LNPBO will also do these things:

Suggest initial LNPs to make based on ingredient list
Make landscape plots

Getting started / installation

To set up LNPBO locally, follow these steps.

We will first install this repository.

git clone https://github.com/evancollins1/LNPBO.git

We will next create the conda environment lnpbo.

conda create -n lnpbo python=3.11
conda activate lnpbo
conda install -c conda-forge chemprop=2.2.2 bayesian-optimization=3.2.0 mordred pandas jupyter

In order to use LiON fingerprint encodings, we will also need to create the conda environment lnp_ml as introduced in our prior repos. Make sure to conda deactivate from lnpbo before running the below chunk.

conda create -n lnp_ml python=3.8
conda activate lnp_ml
pip install chemprop==1.7.0

Usage

We will now walk through the capabilities of LNPBO by discussing three example use cases, each with descriptions in this README as well as Jupyter notebooks to process the provided example data (see /examples folder).

Example 1: varying ratios
Example 2: varying ratios, helper lipid
Example 3: varying rations, helper lipid, ionizable lipid

Note that LNPBO can also handle other things (e.g., varying cholesterol and PEG lipid), but these three minimal examples provide the framework to understand what LNPBO can do. Also note that the transfection results for the example data provided in this repo are only for exercise purposes, as they have been synthetically generated. Note that IL denotes ionizable lipid, HL denotes helper lipid, CHL denotes cholesterol, and PEG denotes PEG lipid.

Example 1: varying ratios

For example 1, let's say a scientist made and tested an initial batch of LNPs where only ratios were varied. More specifically, let's say that they kept constant the lipid identities (cKK-E12 IL, DOPE HL, Cholesterol, DMG-PEG2000) and only varied some lipid molar ratios (IL, HL, CHL) and the IL:mRNA mass ratio. This example could reflect when a scientist is confident in the lipid identities but wants to optimize their relative amounts. They now want LNPBO to suggest some number of new LNPs to make for the next batch based on the initial batch.

Take note of the organization of /examples/example1/example1.csv, which is the spreadsheet prepared by the scientist for the initial batch (round 0). This is the expected input organization for LNPBO. For each row (i.e., LNP formulation), the scientist should specify the following columns (i.e., structure-function data) in a .csv spreadsheet:

IL_name: name of ionizable lipid
IL_SMILES: SMILES of ionizable lipid
IL_molratio: molar ratio of ionizable lipid
IL_to_nucleicacid_massratio: ionizable lipid-to-nucleic acid mass ratio
HL_name: name of helper lipid
HL_SMILES: SMILES of helper lipid
HL_molratio: molar ratio of helper lipid
CHL_name: name of cholesterol
CHL_SMILES: SMILES of cholesterol
CHL_molratio: molar ratio of cholesterol
PEG_name: name of PEG lipid
PEG_SMILES: SMILES of PEG lipid
PEG_molratio: molar ratio of PEG lipid
Experiment_value: functional value (e.g., transfection luminescence)

In cases like this one where the lipid identities are not being varied, entering the SMILES data is optional.

For LNPBO to suggest new LNPs for the next batch based on this initial batch, follow the instructions below. Note that the code chunks below are also provided in /examples/example1/example1.ipynb.

Load packages and example1.csv (i.e., the spreadsheet prepared by the scientist for the initial batch).

import pandas as pd
import sys
from pathlib import Path
project_root = Path.cwd().resolve().parents[2]
sys.path.append(str(project_root))
from LNPBO.data.dataset import Dataset
from LNPBO.space.formulation import FormulationSpace
from LNPBO.optimization.optimizer import Optimizer

# Load dataset
dataset = Dataset.from_lnpdb_csv("example1.csv")

Next, encode the dataset and build the FormulationSpace object to be used for BO. encoding_csv_path specifies the directory for saving the encodings.

# Encode dataset
encoded_dataset = dataset.encode_dataset(
    encoding_csv_path="example1_encodings.csv",
)

# Build FormulationSpace for BO
space = FormulationSpace.from_dataset(encoded_dataset)

Next, we will initialize the Optimizer. space specifies the FormulationSpace. type specifies the acquisition function type, either "UCB" (upper confidence bound) or "EI" (expected improvement). If type="UCB", then its kappa value should be defined. If type="EI", then its xi value should be defined. kappa and xi balance exploration vs. exploitation, with higher values favoring exploration, as outlined nicely here. alpha specifies a small constant added to the diagonal of the kernel matrix of the Gaussian Process (GP), modeling experimental noise, with higher values reflecting noisier data. random_seed specifies the random seed. batch_size specifies the number of LNPs to suggest for the next batch.

# Initialize Optimizer
optimizer = Optimizer(
    space=space,
    type="UCB",
    kappa=5.0,
    alpha=1e-6,
    random_seed=42,
    batch_size=24
)

Finally, to suggest batch_size number of new LNPs for the next batch, run the following. example1_round1.csv specifies the directory for saving the BO results. Note that the resulting .csv appends the batch_size number of new LNPs to the bottom of the originally-inputted dataset (i.e., example1.csv). The Experiment_value (e.g., transfection luminescence) column for these new LNPs will be left blank, as it's now the scientist's job to test the suggested LNPs in lab.

# Perform first round of BO suggestions
round1_suggestions = optimizer.suggest(
    output_csv="example1_round1.csv"
)

If the scientist does ultimately come back with the Experiment_value results for the new round 1 LNPs, then it's possible for LNPBO to suggest another batch of LNPs based on the updated results. This use case is not described in example 1 but it is described in example 2, so see below if applicable.

Example 2: varying ratios, helper lipid

For example 2, let's say a scientist made and tested an initial batch of LNPs where ratios and HL identity were varied. More specifically, let's say that They kept constant the IL identity, CHL identity, PEG identity, and PEG molar ratio. They varied the HL identity, IL molar ratio, HL molar ratio, CHL molar ratio, and IL:mRNA mass ratio. This example could reflect when a scientist is confident in the IL identity but wants to explore alternative HLs and optimize their relative amounts. They now want LNPBO to suggest some number of new LNPs to make for the next batch based on the initial batch.

Take note of the organization of /examples/example2/example2.csv, which is the spreadsheet prepared by the scientist for the initial batch (round 0). See example 1 above for details about the organization.

For LNPBO to suggest new LNPs for the next batch based on this initial batch, follow the instructions below. Note that the code chunks below are also provided in /examples/example2/example2.ipynb.

Load packages and example2.csv (i.e., the spreadsheet prepared by the scientist for the initial batch).

import pandas as pd
import sys
from pathlib import Path
project_root = Path.cwd().resolve().parents[2]
sys.path.append(str(project_root))
from LNPBO.data.dataset import Dataset
from LNPBO.space.formulation import FormulationSpace
from LNPBO.optimization.optimizer import Optimizer

# Load dataset
dataset = Dataset.from_lnpdb_csv("example2.csv")

Next, encode the dataset and build the FormulationSpace object to be used for BO. HL_n_pcs_morgan specifies the number of principal components to encode the HLs. HL_n_pcs_mordred specifies the number of principal components to encode the HLs. If both Morgan and Mordred are specified, then they will be combined. encoding_csv_path specifies the directory for saving the encodings.

# Encode dataset
encoded_dataset = dataset.encode_dataset(
    HL_n_pcs_morgan=5,
    HL_n_pcs_mordred=5,
    encoding_csv_path="example2_encodings.csv",
)

# Build FormulationSpace for BO
space = FormulationSpace.from_dataset(encoded_dataset)

Next, we will initialize the Optimizer. See example 1 above for details about the options for Optimizer.

# Initialize Optimizer
optimizer = Optimizer(
    space=space,
    type="UCB",
    kappa=5.0,
    alpha=1e-6,
    random_seed=42,
    batch_size=24
)

Finally, to suggest batch_size number of new LNPs for the next batch, run the following. example2_round1.csv specifies the directory for saving the BO results. Note that the resulting .csv appends the batch_size number of new LNPs to the bottom of the originally-inputted dataset (i.e., example2.csv). The Experiment_value (e.g., transfection luminescence) column for these new LNPs will be left blank, as it's now the scientist's job to test the suggested LNPs in lab.

# Perform first round of BO suggestions
round1_suggestions = optimizer.suggest(
    output_csv="example2_round1.csv"
)

If the scientist does ultimately come back with the Experiment_value results for the new round 1 LNPs, then it's possible for LNPBO to suggest another batch of LNPs based on the updated results. To do this, run the following, where in this example the scientist has saved these round 1 results in example2_round1_w_results.csv. The new suggested LNPs for round 2 are included in the outputted example2_round2.csv. Note that you can also adjust the optimizer at this point too (e.g., change kappa value or batch size).

# Wrap into Dataset
encoded_dataset = optimizer.update("example2_round1_w_results.csv")

# Suggest round 2
round2_suggestions = optimizer.suggest(output_csv="example2_round2.csv")

Example 3: varying ratios, helper lipid, ionizable lipid

For example 3, let's say a scientist made and tested an initial batch of LNPs where molar ratios and IL & HL identities were varied. More specifically, let's say that they kept constant the CHL identity, PEG identity, PEG molar ratio, and IL:mRNA mass ratio. They varied the IL identity, HL identity, IL molar ratio, HL molar ratio, and CHL molar ratio. This example could reflect when a scientist wants to explore a library of ILs with different HLs and optimize their relative amounts. They now want LNPBO to suggest some number of new LNPs to make for the next batch based on the initial batch.

Take note of the organization of /examples/example3/example3.csv, which is the spreadsheet prepared by the scientist for the initial batch (round 0). See example 1 above for details about the organization.

Load packages and example3.csv (i.e., the spreadsheet prepared by the scientist for the initial batch).

import pandas as pd
import sys
from pathlib import Path
project_root = Path.cwd().resolve().parents[2]
sys.path.append(str(project_root))
from LNPBO.data.dataset import Dataset
from LNPBO.space.formulation import FormulationSpace
from LNPBO.optimization.optimizer import Optimizer

# Load dataset
dataset = Dataset.from_lnpdb_csv("example3.csv")

Next, encode the dataset and build the FormulationSpace object to be used for BO. HL_n_pcs_morgan specifies the number of principal components to encode the HLs, analagous for IL_n_pcs_morgan. HL_n_pcs_mordred specifies the number of principal components to encode the HLs, analagous for IL_n_pcs_mordred. If both Morgan and Mordred are specified for IL or HL, then they will be combined. encoding_csv_path specifies the directory for saving the encodings.

# Encode dataset
encoded_dataset = dataset.encode_dataset(
    HL_n_pcs_morgan=5,
    HL_n_pcs_mordred=5,
    IL_n_pcs_morgan=5,
    IL_n_pcs_mordred=5,
    encoding_csv_path="example3_encodings.csv",
)

# Build FormulationSpace for BO
space = FormulationSpace.from_dataset(encoded_dataset)

Alternatively, rather than encoding IL chemistry with Morgan fingerprints and Mordred descriptors, fingerprints from the LiON deep learning model can be used. These fingerprints are extracted from the penultimate linear layer of the LiON deep learning model trained on LNPDB, as introduced in our prior repo.

# Encode dataset
encoded_dataset = dataset.encode_dataset(
    HL_n_pcs_morgan=5,
    HL_n_pcs_mordred=5,
    IL_n_pcs_lion=5,
    encoding_csv_path="example3_encodings.csv",
)

# Build FormulationSpace for BO
space = FormulationSpace.from_dataset(encoded_dataset)

Next, we will initialize the Optimizer. See example 1 above for details about the options for Optimizer.

# Initialize Optimizer
optimizer = Optimizer(
    space=space,
    type="UCB",
    kappa=5.0,
    alpha=1e-6,
    random_seed=42,
    batch_size=24
)

Finally, to suggest batch_size number of new LNPs for the next batch, run the following. example3_round1.csv specifies the directory for saving the BO results. Note that the resulting .csv appends the batch_size number of new LNPs to the bottom of the originally-inputted dataset (i.e., example3.csv). The Experiment_value (e.g., transfection luminescence) column for these new LNPs will be left blank, as it's now the scientist's job to test the suggested LNPs in lab.

# Perform first round of BO suggestions
round1_suggestions = optimizer.suggest(
    output_csv="example3_round1.csv"
)

If the scientist does ultimately come back with the Experiment_value results for the new round 1 LNPs, then it's possible for LNPBO to suggest another batch of LNPs based on the updated results. This use case is not described in example 1 but it is described in example 3, so see above if applicable.

Citation

A self-driving lab for the directed evolution of mRNA lipid nanoparticles

Evan Collins, Rajith S. Manan, Samuel Detmer, Till Muser, Easwer Raman, Richard Zhu, Srinivas Balagopal, Yanshu Shi, Jungyong Ji, Akash Gupta, Yizong Hu, Suthathip Trongjit, Wontaek Chung, Haseeb Mughal, Jacob Witten, Anna Lapteva, Ashwin Pasupathy, Wonpil Im, Robert Langer, Daniel G. Anderson

Under review

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cli		cli
data		data
examples		examples
optimization		optimization
preprocessing		preprocessing
space		space
utils		utils
.DS_Store		.DS_Store
LICENSE		LICENSE
LNPBO.png		LNPBO.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lipid Nanoparticle Bayesian Optimization (LNPBO)

Description

Table of contents

Getting started / installation

Usage

Example 1: varying ratios

Example 2: varying ratios, helper lipid

Example 3: varying ratios, helper lipid, ionizable lipid

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lipid Nanoparticle Bayesian Optimization (LNPBO)

Description

Table of contents

Getting started / installation

Usage

Example 1: varying ratios

Example 2: varying ratios, helper lipid

Example 3: varying ratios, helper lipid, ionizable lipid

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages