CFF file format

I attempted to use the convert to cff helper script provided, however the format outputted is not matching the expected output and it appears the wiki is outdated on how to utilize the tool. The convert_cff helper script returns a bit of a mess, where sample names are cut off, columns are merged incorrectly, and it doesnt have all the columns that are "mandatory" for CFF format (t_gene1 on seems to be missing).

I made my own script to exactly match the format on the wiki: 
`cff_format <- c("chr1","pos1","strand1","chr2","pos2","strand2","library","sample_name",
                "sample_type","disease","tool",'split_cnt',"span_cnt","t_gene1","t_area1",
                "t_gene2","t_area2")`

However, when I try to run the metafusion.sh, the "reformat" step changes my "strand1" and "strand2" columns to NA columns, then when that is passed onto "renamed" step, I get EMPTY files.  I also get a whole bunch of errors.

````
    except: raise ValueError("CFF Column pos1 value " + tmp[1] + " is not a valid integer\nInvalid entry: " + cff_line)
ValueError: CFF Column pos1 value pos1 is not a valid integer
Invalid entry: chr1	pos1	NA	chr2	pos2	NA	library	sample_name	sample_type	disease	tool	split_cnt	span_cnt	t_gene1	t_area1	t_gene2	t_area2

Annotate cff, extract sequence surrounding breakpoint
2345953 annotations from /juno/work/ccs/pintoa1/fusion_report/metafusion/MetaFusion/reference_files/ens_known_genes.renamed.ENSG.bed loaded.
29.4318819046 sec. elapsed.
Warning: Input gene annotations include multiple chr, strand, or regions (5Mb away). Skipping current gene annotation.
set([('CKS1B', 'chr1', 'f'), ('CKS1B', 'chr5', 'r')])
Warning: Input gene annotations include multiple chr, strand, or regions (5Mb away). Skipping current gene annotation.
set([('MIR4461', 'chr5', 'f'), ('MIR4461', 'chr5', 'r')])
Warning: Input gene annotations include multiple chr, strand, or regions (5Mb away). Skipping current gene annotation.
set([('C2orf27A', 'chr2', 'f'), ('C2orf27A', 'chr2', 'r')])
[.....x500]
MetaFusion.sh: line 116: [: -eq: unary operator expected
MetaFusion.sh: line 121: [: -eq: unary operator expected
MetaFusion.sh: line 127: [: -eq: unary operator expected
Merge cff by genes and breakpoints
Traceback (most recent call last):
  File "/juno/work/ccs/pintoa1/fusion_report/metafusion/MetaFusion/scripts/intersect_breakpoints_and_gene_names.py", line 41, in <module>
    df = intersect_fusions_by_breakpoints()
  File "/juno/work/ccs/pintoa1/fusion_report/metafusion/MetaFusion/scripts/intersect_breakpoints_and_gene_names.py", line 20, in intersect_fusions_by_breakpoints
    fusion=pygeneann.CffFusion(lines[0])
IndexError: list index out of range
Error in read.table(fid_intersection_file, header = TRUE, stringsAsFactors = F) : 
  no lines available in input
Execution halted
Traceback (most recent call last):
  File "/juno/work/ccs/pintoa1/fusion_report/metafusion/MetaFusion/scripts/generate_cluster_file.py", line 93, in <module>
    fusion=pygeneann.CffFusion(lines[0])
IndexError: list index out of range
````

After the "reann" step, my cff file is completely empty and metafusion runs on all the empty files.  I have successfully run your test CFF files through Metafusion, however cannot get a real example working.  

Would it be possible for an update to the wiki to explain the exact format of CFF, whether or not NA's are allowed, the data type (int, string etc), and whether or not "disease" is important for analysis?  At the moment we are putting NAs in the disease slot. 

Im assuming that I am NOT supposed to have a header ing a cff format and that it MUST be in the order I specified above? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CFF file format #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CFF file format #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions