-
Notifications
You must be signed in to change notification settings - Fork 109
Description
Hi,
I'm using Chroma's SubstructureConditioner to generate proteins while keeping a specific active site fixed from an input structure. However, even when I specify a maximum chain length (e.g., 220 residues), Chroma often generates proteins much longer than expected.
I want to fix a set of residues in an active site from my input protein, while letting Chroma generate the rest of the sequence de novo, ensuring that the final protein does not exceed 220 residues.
Expected Behavior
-
The active site residues remain fixed.
-
The surrounding structure is generated de novo.
-
The final protein has exactly 220 residues.
Observed Behavior
-
Chroma respects the fixed residues.
-
The newly generated part makes the total length much larger than 220 residues.
-
The chain_lengths=[220] parameter is not strictly enforced.
Here’s an example of what I’m doing:
PDB_ID = "7M8W"
chroma = Chroma()
# Load the input protein
protein = Protein.from_PDBID(PDB_ID, device="cpu")
# Fixing the active site residues
selection_string = "not (resid 87 or resid 90 or resid 97 or resid 107 or resid 111 or resid 112 or resid 170 or resid 171 or resid 172 or resid 173 or resid 174 or resid 180 or resid 181 or resid 182 or resid 183 or resid 184 or resid 210 or resid 262 or resid 286 or resid 287 or resid 290 or resid 294)"
# Apply the SubstructureConditioner
substructure_conditioner = conditioners.SubstructureConditioner(
protein=protein,
backbone_model=chroma.backbone_network,
selection=selection_string
).to("cpu")
# Generate new structure
new_protein, trajectories = chroma.sample(
protein_init=protein,
chain_lengths=[220],
conditioner=substructure_conditioner,
langevin_factor=2.0,
langevin_isothermal=True,
inverse_temperature=8.0,
sde_func="langevin",
steps=200,
full_output=True,
)
# Save the generated structure
new_protein.to("generated_protein.cif")
Is this behavior expected when using SubstructureConditioner? Is there a way to enforce a strict length limit on the generated protein?