Skip to content

Chroma Generates Proteins Longer Than Desired When Using SubstructureConditioner #63

@josefgon00

Description

@josefgon00

Hi,
I'm using Chroma's SubstructureConditioner to generate proteins while keeping a specific active site fixed from an input structure. However, even when I specify a maximum chain length (e.g., 220 residues), Chroma often generates proteins much longer than expected.

I want to fix a set of residues in an active site from my input protein, while letting Chroma generate the rest of the sequence de novo, ensuring that the final protein does not exceed 220 residues.

Expected Behavior

  • The active site residues remain fixed.

  • The surrounding structure is generated de novo.

  • The final protein has exactly 220 residues.

Observed Behavior

  • Chroma respects the fixed residues.

  • The newly generated part makes the total length much larger than 220 residues.

  • The chain_lengths=[220] parameter is not strictly enforced.

Here’s an example of what I’m doing:


PDB_ID = "7M8W"
chroma = Chroma()

# Load the input protein
protein = Protein.from_PDBID(PDB_ID, device="cpu")

# Fixing the active site residues
selection_string = "not (resid 87 or resid 90 or resid 97 or resid 107 or resid 111 or resid 112 or resid 170 or resid 171 or resid 172 or resid 173 or resid 174 or resid 180 or resid 181 or resid 182 or resid 183 or resid 184 or resid 210 or resid 262 or resid 286 or resid 287 or resid 290 or resid 294)"

# Apply the SubstructureConditioner
substructure_conditioner = conditioners.SubstructureConditioner(
    protein=protein,
    backbone_model=chroma.backbone_network,
    selection=selection_string
).to("cpu")

# Generate new structure
new_protein, trajectories = chroma.sample(
    protein_init=protein,
    chain_lengths=[220],  
    conditioner=substructure_conditioner,
    langevin_factor=2.0,
    langevin_isothermal=True,
    inverse_temperature=8.0,
    sde_func="langevin",
    steps=200,
    full_output=True,
)

# Save the generated structure
new_protein.to("generated_protein.cif")

Is this behavior expected when using SubstructureConditioner? Is there a way to enforce a strict length limit on the generated protein?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions