C++ preprocess (m6A) creates a large number of temporary files

Hi all,

Not sure if this is a duplicate of #38.

I've been trying to get CHEUI up and running but have been running into an issue with the preprocessing. Using the C++ preprocessing script (compiled on GCC v11.3, RHEL 9.4, commit `7b422f7808a3c2ffff56a9ead33a199824753b4e`), I have been running it on a 858GB `nanopolish eventalign` file, generated from a ~2GB .fastq of reads.

In this instance, I noticed that the preprocessing script was producing an absurd amount of temporary files - over 7 million, before the HPC file quota ran out and the script was killed. (This was much to the chagrin of my university's HPC admin, and I promptly received a very strongly worded email advising me not to generate so many temporary files! 😅)

I've attached a small selection of ~1000 of these temporary files for debugging purposes, if it interests you. Each file seems to be very small - a few lines max, based off of my n = 10 sample size.

The aligned events file was called using:
```sh
nanopolish eventalign -t {threads} \
    --reads {input.reads} \
    --bam {input.bam} \
    --genome {REF} \
    --scale-events --signal-index --samples --print-read-names > {output}
```
and I was calling preprocess using:
```sh
# must first be in this path, or else the program crashes
cd $PATH_TO_CHEUI_PREPROCESS_DIR

./CHEUI -i $INPUT -m ../../kmer_models/model_kmer.csv -n {threads} --m6A -o $OUTPUT
```

See the attached sample of temporary files below: [out_A_signals+IDs.zip](https://github.com/user-attachments/files/17177384/out_A_signals%2BIDs.zip)

Let me know if there's anything else that you need.
Ollie

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++ preprocess (m6A) creates a large number of temporary files #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

C++ preprocess (m6A) creates a large number of temporary files #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions