Skip to content

1. Scorpio CreateDb Module

Mohammad Saleh Refahi edited this page Nov 17, 2024 · 2 revisions

Scorpio CreateDb User Guide

Introduction

The createdb module is used to create a database for the Scorpio model. This guide provides instructions on how to use the createdb.py script, including the necessary arguments and example usage.

Script Usage

Arguments

The script accepts the following arguments:

  • --scorpio_model (str): Path to the Scorpio model
  • --output (str): Output DB Folder
  • --db_fasta (str): Fasta File
  • --val_fasta (str): Validation Fasta File
  • --max_len (int): Maximum Length of Sequence
  • --batch_size (int): Batch size (default: 60)
  • --db_embedding (str): Embedding File
  • --val_embedding (str): Validation Embedding File
  • --metadata (str): Metadata File
  • --cal_kmer_freq (bool): Calculate Kmer Frequency (default: False)
  • --num_distance (int): Number of Distances which is parameter for confidence score training (default: 2000)
  • --num_device (int): Number of devices to use (default: 1)
  • --required_memory_gb (int): Required memory in GB (default: 79)

Example Command

To run the createdb(createdb.py) module, use the following command:

scorpio createdb  --scorpio_model "./models/Scorpio-6Freq" \
                     --db_fasta "./data/data-amr/train.fasta" \
                     --val_fasta "./data/data-amr/val.fasta" \
                     --metadata "./data/data-amr/metadata.csv" \
                     --cal_kmer_freq True \
                     --output "db_path" \
                     --max_len 2096 \
                     --num_distance 2000 \
                     --batch_size 100

Clone this wiki locally