genomes_log

A set of scripts to automate the download and tracking of assemblies and annotations.

organize_ncbi.py

This script takes an unzipped ncbi datasets folder and a target location. The ncbi folder is copied into the target folder following the target_folder/species/assembly/file structure. Additionally metadata are saved. Currently the scripts only works with *fna and *gff3 files.

Dependencies

NCBI dataset API
python and pandas

An example ncbi comand to get all SAR assemblies is:

# dowload a dataset of multiple species 
datasets download genome taxon 2698737 --assembly-level chromosome,complete --annotated --reference  --include genome,gff3
# or download a single assembly
datasets download genome accession GCF_028858775.2 --include gff3,genome
# unzip archive
unzip ncbi_dataset.zip
# organize the assemblies in the target folder (this is the folder where the species folder will be created)
python organize_ncbi.py --ncbi ncbi_dataset --target target_folder/

get_snapshot.py

This script takes a folder with the structure of species/assemblies/file and it produces a snapshot to use in case someone wants to recreate the same folder structure. The snapshot is saved in the snapshot/ foder and can be tracked using git.

python get_snapshot.py target_folder

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
snapshot		snapshot
README.md		README.md
get_snapshot.py		get_snapshot.py
organize_ncbi.py		organize_ncbi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

genomes_log

organize_ncbi.py

Dependencies

get_snapshot.py

About

Uh oh!

Releases

Packages

Languages

apollo994/genomes_log

Folders and files

Latest commit

History

Repository files navigation

genomes_log

organize_ncbi.py

Dependencies

get_snapshot.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages