Skip to content

Conversation

@AidanMar
Copy link

The primary focus should be on the files located within pipeline3. The other's were earlier iterations that can be ignored. This pipeline is a major overhaul of the original pipeline. The pipeline use built using snakemake which calls the files in the scripts directory to synthesise the data set. This version removes a lot of the unnecessary writing to disc, and ad hoc data structures and vectorises a lot of the processes that were using too many loops before. The heavy lifting is done by pandas and numpy.

The next steps are to process the data the pipeline has output with CNNs. In addition to this, the current pipeline should have more extensive, and cleaner commenting, along with documentation to ensure that it can be read by future users.

AidanMar added 30 commits May 27, 2021 08:06
…and one for performing the downloads. The snakefile has been adapted to incorporate both of these steps into the procedure
…eparate jobs to download cds, pep and gtf files. Each job uses the new downloader.sh script, using diffferent input arguments
…eny matrix. Scripts used to use dir_name of data_homology for homology databases. I have renamed this to homology_databases and modified the python scripts to work with the new naming convention
…way up to the run process_negative.py scripts successfully
…ifferent databases and supply those download links
…at not all databases have the same sets of species available. New snakemake re-config and and adaptation of the ftpy.py script. The user should now run ftpy.py before running the snake file. After that point, the pipeline should be determinitistic
…s should be a list of the species which are in the intersection of all the different input databases
Aiden Marshall and others added 30 commits August 15, 2021 11:56
…jobs kept being submitted with too little memory. Rectified the snakefile to fix this
Initial commit
remove long read output from the file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant