BioMetaGenie is an advanced bioinformatics toolkit designed to streamline and integrate the processes of downloading,
processing, and analyzing genomic data. By consolidating multiple powerful tools into one intuitive CLI interface,
BioMetaGenie provides an efficient end-to-end workflow for genomic data preprocessing, significantly easing
the workload for researchers.

- Efficient Data Retrieval: Seamlessly download genomic data from NCBI using the SRA Toolkit.
- Automated Conversion: Effortlessly convert downloaded files to FASTQ format.
- High-Quality Trimming: Utilize TrimGalore for high-quality read trimming.
- Detailed Sequence Reporting: Generate detailed sequence status reports with Seqkit.
- Read Merging: Automatically merge paired-end reads into cohesive sequences.
- In-Depth Analysis: Leverage Parallel Meta for comprehensive sequence abundance and count analysis.
To get started with BioMetaGenie, follow these instructions:
-
Prerequisites:
- Ensure that Poetry is installed for dependency management.
-
Clone the Repository:
git clone https://github.com/nasif-raihan/BioMetaGenie.git cd BioMetaGenie -
Setup Third-Party Dependencies:
- Install the required tools by executing the setup main::
cd third_party bash setup-for-linux.sh
Currently, the setup main only supports Linux distributions. Contributions are welcome to extend cross-platform compatibility by creating
setup-for-win.shfor Windows andsetup-for-mac.shfor macOS. - Install the required tools by executing the setup main::
-
Make Usearch11 Executable:
chmod +x usearch11.0.667_i86linux32 cd .. -
Configuration:
- Place your sample names or SRA accession numbers in the
SRA_list.txtfile located in theinputdirectory.
- Place your sample names or SRA accession numbers in the
BioMetaGenie simplifies complex workflows into a single command.
After installation, run the following command from root directory (BioMetaGenie) to execute the entire process:
make install
poetry shell
make run- Download SRA:
python main.py download_sra SRR123456
- Convert to FASTQ:
python main.py convert_to_fastq SRR123456
- Download and process a list:
python main.py process_sra_list
- Trim sequences:
python main.py trim
- Get sample stats:
python main.py get_sample_stats
- Merge reads:
python main.py merge_reads sample123
- Analyze:
python main.py analyze
All the outputs will be stored in the output directory.
We welcome contributions to BioMetaGenie! To contribute, please fork the repository and submit a pull request. Ensure that your code adheres to the project's coding standards and includes appropriate tests.
BioMetaGenie is released under the MIT License. For details, see the LICENSE file.
For support or inquiries, please use the Issues section on GitHub.
BioMetaGenie is committed to simplifying and accelerating genomic data processing, enabling researchers to concentrate on their analyses rather than on data management tasks.