-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
Hi,
The perturbations in Frangieh 2021 processing notebook are processed using this regex line
adata.obs['perturbation_name'] = [re.sub('[_123]+', '', s) for s in adata.obs['sgRNA']]
This should be changed to '_[123]+'. This is meant to strip the guide number, right now it also strips out any 1 2 or 3 in the gene.
Unique genes in adata.obs["sgRNA"]
Index(['A2M_1', 'A2M_2', 'A2M_3', 'ACSL3_1', 'ACSL3_2', 'ACSL3_3', 'ACTA2_1',
'ACTA2_2', 'ACTA2_3', 'AEBP1_1',
...
'VDAC2_3', 'WBP2_1', 'WBP2_2', 'WBP2_3', 'WNT7A_1', 'WNT7A_2',
'WNT7A_3', 'XAGE1A_1', 'XAGE1A_2', 'XAGE1A_3'],
dtype='object', length=818)
In the processed data file the current line changes the first gene A2M turns into AM. This loses information and makes matching guides to the data difficult.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels