Getting started

Encoding the input

Modular systems

For assembly line-like biosynthesis, e.g. in NRP and PK, the Raichu Input will be in the form of a ClusterRepresentation:

Input = ClusterRepresentation(modules: List [ModuleRepresentation(type: Str “PKS” , 
                                                subtype: Str “PKS_CIS , 
                                                substrate: Str “ACETYL_COA” , 
                                                domains: List [DomainRepresentation
                                                                (gene name: Str, 
                                                                domain type: Str “AT”, 
                                                                active: Bool , 
                                                                used: Bool)...]...] , 
                            tailoring Enzymes: List [TailoringRepresentation
                                                    (gene name: Str, 
                                                    tailoring enzyme type: Str  "HYDROXYLASE",  
                                                    tailoring sites: List[["C_3"],["C_10"])...]

Tailoring enzymes do not need to be present, as in this example (the number 5 in the iterative module stands for the number of iterations it will do):

    cluster_repr = ClusterRepresentation([ModuleRepresentation("PKS", "PKS_CIS", "ACETYL_COA",
                                                               [DomainRepresentation("Gene 1", 'AT', None, None, True,
                                                                                     True),
                                                                DomainRepresentation("Gene 1", 'ACP', None, None, True,
                                                                                     True)
                                                                ]),
                                          ModuleRepresentation("PKS", "PKS_ITER", "METHYLMALONYL_COA",
                                                               [DomainRepresentation("Gene 1", 'KS',
                                                                                     None, None, True,
                                                                                     True),
                                                                DomainRepresentation("Gene 1", 'AT', None, None, True,
                                                                                     True),
                                                                DomainRepresentation("Gene 1", 'AT', None, None, True,
                                                                                     False),
                                                                DomainRepresentation("Gene 1", 'DH', None, None, True,
                                                                                     True),
                                                                DomainRepresentation("Gene 1", 'ER', None, None, True,
                                                                                     True),
                                                                DomainRepresentation("Gene 1", 'ACP', None, None, True,
                                                                                     True)
                                                                ], 5),
                                         ModuleRepresentation("PKS", "PKS_CIS", "METHYLMALONYL_COA",
                                                              [DomainRepresentation("Gene 1", 'KS',
                                                                                    None, None, True,
                                                                                    True),
                                                               DomainRepresentation("Gene 1", 'AT', None, None, True,
                                                                                    True),
                                                               DomainRepresentation("Gene 1", 'TE',
                                                                                    None, None, True,
                                                                                    True)

                                                               ])]
                                         )

Modular systems can also be encoded in a tab-separated file. A template for this can be found at examples/example_tab_separated_input/cluster.txt. This cluster can be run as follows:

python raichu/validation/rerun_cluster.py examples/example_tab_separated_input/

Additionally, antiSMASH (v 7) output genbank files can be read in and drawn automatically by using antismash.py <input.gbk> <output.svg>. To generate a seperated input file from antiSMASH output use the parse_antismash_to_cluster_file function in general.py

RiPPs

RiPPs are encoded in a RiPP_Cluster by specifying the complete precursor (just for drawing marble representations), the precursor used for chemical reactions, the macrocyclizations, cleavage sites and the tailoring enzymes:

    ripp_cluster = RiPP_Cluster("trpA", "mkaekslkayawyiwy", "mkaekslkayawyiwy",
                                cleavage_sites=[CleavageSiteRepresentation("Y", 10, "follower")],
                                tailoring_enzymes_representation=[
                                    TailoringRepresentation("p450", "REDUCTASE_DOUBLE_BOND_REDUCTION", [["C_139", "C_138"]]), 
                                    TailoringRepresentation("p450", "P450_OXIDATIVE_BOND_FORMATION", [["C_139", "N_134"], ["C_120", "N_102"],["C_138", "C_107"]])])

The CleavageSiteRepresentation needs the amino acid type and number after which to cleave and an instruction on what part of the structure to keep. The MacrocyclizationRepresentation always takes two atoms, for more complex cyclization patterns, multiple cyclizations can be chained behind each other in the macrocyclizations list. #Terpenoids Terpenoids are encoded in a Terpene_Cluster by specifying the precursor, the cyclizations and the class of terpene cyclase:

terpene_cluster = Terpene_Cluster("limonene_synthase", "GERANYL_PYROPHOSPHATE",
                                      macrocyclisations=[MacrocyclizationRepresentation("C_13", "C_8")],
                                      terpene_cyclase_type="Class_1",
                                      tailoring_enzymes_representation=[
                                          TailoringRepresentation("pseudo_isomerase", "ISOMERASE_DOUBLE_BOND_SHIFT",
                                                                  [["C_13", "C_14", "C_14", "C_15"]]),
                                          TailoringRepresentation("prenyltransferase", "PRENYLTRANSFERASE",
                                                                  [["C_16"]], "DIMETHYLALLYL")])

The MacrocyclizationRepresentation always takes two atoms, for more complex cyclization patterns, multiple cyclizations can be chained behind each other in the macrocyclizations list.

##Alkalois Alkaloids are encoded in an Alkaloid_Cluster by specifying the precursor amino acid and tailoring reactions:

    alkaloid_cluster = Alkaloid_Cluster("phenylalanine",tailoring_enzymes_representation=[
                            TailoringRepresentation("pseudo_decarboxylase", "DECARBOXYLASE", [["C_9"]]),
                            TailoringRepresentation("pseudo_hydroxylase", "PRENYLTRANSFERASE", [["C_7"]], "DIMETHYLALLYL"),
                            TailoringRepresentation("pseudo_decarboxylase", "HALOGENASE", [["C_10"]], "Cl"),
                            TailoringRepresentation("pseudo_hydroxylase", "HYDROXYLATION", [["C_6"]]),
                            TailoringRepresentation("methyltransferase", "METHYLTRANSFERASE", [["N_12"], ["C_7"], ["O_25"]]),
                                                                                            ])

The MacrocyclizationRepresentation always takes two atoms, for more complex cyclization patterns, multiple cyclizations can be chained behind each other in the macrocyclizations list.

Tailoring enzymes

Tailoring reactions are encoded including the name of the reaction and the atoms on which to react on like so:

TailoringRepresentation("tpdF", "THIOPEPTIDE_CYCLASE",[["C_3", "C_89"]])

It is important to know, that the sites are encoded as a list of list of all atoms for one reaction, so a hydroxylase working on two carbon atoms consecutively would be encoded like this:

TailoringRepresentation("tpdF", "HYDROXYLASE",[["C_3"], ["C_89"]])

To get all possible tailoring reactions, one can use the get_tailoring_sites(structure, enzyme_name) from run_raichu.py.

Running RAIChU

For modular systems, the draw_cluster_from_modular_cluster_representation(cluster) function from general.py can be used. It will use the build_cluster() function to initialize the cluster, compute the structures, do the tailoring enzyme tailoring, draw the product and then visualize the whole cluster. For non-modular systems, there are very similar functions called draw_terpene_structure_from_terpene_cluster(terpene_cluster) draw_alkaloid_structure_from_alkaloid_cluster(alkaloid_cluster) draw_ripp_structure_from_ripp_cluster(ripp_cluster) availiable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting started

Encoding the input

Modular systems

RiPPs

Tailoring enzymes

Running RAIChU

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally