-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hi! I am setting up FIRE with Arabidopsis Thaliana data (TAIR10), also produced with nanopore. I have been able to run PacBio and nanopore data from humans without any issues. However, I am having an error while using the Arabidopsis samples. It seems like an issue with the contigs' names, perhaps. In the TAIR10, the names we have are 1,2,3,4,5, Pt, and Mt. I have tried running without the keep_chromosomes parameter, even when I try to set the keep_chromosomes parameter to omit the Mt contig, it still has problems.
keep_chrosomosomes -> "^(1|2|3|4|5|Pt)+$" and "^(1|2|3|4|5|Mt|Pt)+$"
I modified the ref and ref_name parameters in the config.yaml to adjust for the genome.
Error:
localrule coverage:
input: temp/AT_control.filtered.nuc/coverage/AT_control.filtered.nuc-v0.1.1.bed.gz
output: results/AT_control.filtered.nuc/additional-outputs-v0.1.1/coverage/AT_control.filtered.nuc-v0.1.1-median-coverage.txt, results/AT_control.filtered.nuc/additional-outputs-v0.1.1/coverage/AT_control.filtered.nuc-v0.1.1-minimum-coverage.txt, results/AT_control.filtered.nuc/additional-outputs-v0.1.1/coverage/AT_control.filtered.nuc-v0.1.1-maximum-coverage.txt
jobid: 0
benchmark: results/AT_control.filtered.nuc/additional-outputs-v0.1.1/benchmarks/coverage/AT_control.filtered.nuc.txt
reason: Forced execution
wildcards: sm=AT_control.filtered.nuc, v=v0.1.1
resources: mem_mb=65536, mem_mib=62500, disk_mb=4096, disk_mib=3907, tmpdir=/scratch/local/jobs/31068711, runtime=200, slurm_account=pi-spott, slurm_partition=caslake
Activating conda environment: ../../../dveracruz/bin/snakemake_conda_envs/04f36a1cabb48e10bcbd66f83d9ec8ab_
Activating conda environment: ../../../dveracruz/bin/snakemake_conda_envs/04f36a1cabb48e10bcbd66f83d9ec8ab_
Traceback (most recent call last):
File "/project/spott/1_Shared_projects/AT_Fiber_seq/FIRE/.snakemake/scripts/tmpibtfejrf.cov.py", line 72, in
df = polars_read()
File "/project/spott/1_Shared_projects/AT_Fiber_seq/FIRE/.snakemake/scripts/tmpibtfejrf.cov.py", line 53, in polars_read
pl.read_csv(
File "/project/spott/dveracruz/bin/snakemake_conda_envs/04f36a1cabb48e10bcbd66f83d9ec8ab_/lib/python3.10/site-packages/polars/utils/deprecation.py", line 91, in wrapper
return function(*args, **kwargs)
File "/project/spott/dveracruz/bin/snakemake_conda_envs/04f36a1cabb48e10bcbd66f83d9ec8ab/lib/python3.10/site-packages/polars/utils/deprecation.py", line 91, in wrapper
return function(*args, **kwargs)
File "/project/spott/dveracruz/bin/snakemake_conda_envs/04f36a1cabb48e10bcbd66f83d9ec8ab/lib/python3.10/site-packages/polars/utils/deprecation.py", line 91, in wrapper
return function(*args, **kwargs)
File "/project/spott/dveracruz/bin/snakemake_conda_envs/04f36a1cabb48e10bcbd66f83d9ec8ab/lib/python3.10/site-packages/polars/io/csv/functions.py", line 499, in read_csv
df = read_csv_impl(
File "/project/spott/dveracruz/bin/snakemake_conda_envs/04f36a1cabb48e10bcbd66f83d9ec8ab/lib/python3.10/site-packages/polars/io/csv/functions.py", line 645, in _read_csv_impl
pydf = PyDataFrame.read_csv(
polars.exceptions.ComputeError: could not parse Mt as dtype i64 at column 'column_1' (column number 1)
The current offset in the file is 1325113890 bytes.
You might want to try:
- increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),
- specifying correct dtype with the `dtypes` argument
- setting `ignore_errors` to `True`,
- adding `Mt` to the `null_values` list.
Original error: ```remaining bytes non-empty```
RuleException:
CalledProcessError in file /project/spott/1_Shared_projects/AT_Fiber_seq/FIRE/workflow/rules/coverages.smk, line 48:
Command 'source /software/python-anaconda-2024.10-el8-x86_64/bin/activate '/project/spott/dveracruz/bin/snakemake_conda_envs/04f36a1cabb48e10bcbd66f83d9ec8ab_'; set -euo pipefail; python /project/spott/1_Shared_projects/AT_Fiber_seq/FIRE/.snakemake/scripts/tmpibtfejrf.cov.py' returned non-zero exit status 1.
[Sun May 18 15:54:45 2025]
Error in rule coverage:
jobid: 0
input: temp/AT_control.filtered.nuc/coverage/AT_control.filtered.nuc-v0.1.1.bed.gz
output: results/AT_control.filtered.nuc/additional-outputs-v0.1.1/coverage/AT_control.filtered.nuc-v0.1.1-median-coverage.txt, results/AT_control.filtered.nuc/additional-outputs-v0.1.1/coverage/AT_control.filtered.nuc-v0.1.1-minimum-coverage.txt, results/AT_control.filtered.nuc/additional-outputs-v0.1.1/coverage/AT_control.filtered.nuc-v0.1.1-maximum-coverage.txt
conda-env: /project/spott/dveracruz/bin/snakemake_conda_envs/04f36a1cabb48e10bcbd66f83d9ec8ab_