-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Command run:
nextflow run -params-file params.yml -c custom.config -profile uppmax /proj/rosling_storage/AMF/b2017181_nobackup/merce/try_RepeatDefeaters/RepeatDefeaters_ccandik
I tried to check the .command files from the process ANNOTATE_REPEATS, but they gave me no more extra information about what might be the problem.
Error message:
[41/fe26b5] process > RENAME_REPEAT_MODELER_SEQUENCES [100%] 1 of 1 ✔
[9e/bd9447] process > PFAM_TRANSPOSIBLE_ELEMENT_SEARCH [100%] 1 of 1 ✔
[f8/66daec] process > BUILD_PROTEIN_REF_BLAST_DB (1) [100%] 1 of 1 ✔
[d4/4238ee] process > BLASTX_AND_FILTER (1) [100%] 2 of 2 ✔
[bf/5590b1] process > PFAM_SCAN (2) [100%] 2 of 2 ✔
[49/5e98ba] process > ANNOTATE_REPEATS [100%] 1 of 1, failed: 1 ✘
[ca/cb7da2] process > BUILD_TREP_BLAST_DB (1) [100%] 1 of 1 ✔
[- ] process > TREP_BLASTN -
[- ] process > ADD_TREP_ANNOTATION -
[c2/3f37c5] process > CUSTOM_HMM_SCAN (2) [100%] 2 of 2 ✔
[3e/db88fa] process > MERGE_DOMAIN_TABLE (2) [100%] 2 of 2 ✔
[- ] process > REANNOTATE_REPEATS -
[- ] process > BUILD_ANNOTATED_LIB_BLAST_DB -
[- ] process > RECIPROCAL_BLASTN -
[- ] process > REDUNDANT_HITS -
Error executing process > 'ANNOTATE_REPEATS'
Caused by:
Process `ANNOTATE_REPEATS` terminated with an error exit status (1)
Command executed:
# Find unclassified consensus with TE domains
for TBL in ccandi_k.minus.predicted.pfamtbl ccandi_k.plus.predicted.pfamtbl; do
# grep #1 : Find unclassified consensus
# grep #2 : which have TE domains
# cut + uniq : and extract their id's
grep -i "#unknown" "$TBL" | \
tee "${TBL}.unclassified" | \
grep -i -w -f Pfam.Proteins_wTE_Domains.seqid | \
tee -a ccandi_k.Unclassified_consensus_TEs | \
cut -f1 -d"#" | uniq > "${TBL/.pfamtbl/.unclassified_ids}"
done
# Concatenate ids of consensus with TE domains from both strands
cat *.unclassified_ids | uniq > ccandi_k.Unclassified_consensus_TEs.ids
# Find unclassified consensus without TE domains
for UNCLASSIFIED in *.unclassified; do
# grep : Remove consensus which have TE domains
# cut + uniq : and extract their id's
grep -v -f ccandi_k.Unclassified_consensus_TEs.ids "$UNCLASSIFIED" | \
tee "${UNCLASSIFIED}.TEpurged" | \
cut -f1 -d'#' | uniq > "${UNCLASSIFIED}.TEpurged.ids"
done
# Use shell expansion to expand plus and minus strand files for unsorted inner join
grep -f *.TEpurged.ids > ccandi_k.consensus.both.strand
# ccandi_k.consensus.both.strand : Unclassified consensus sequences that have
# non-TE domains detected in both strands.
# These are tricky to annotate.
# In consensus without TE domains, remove consensus with non-TE domains on both strands
# (leaving consensus with non-TE domains on a single-strand)
for TEPURGED in *.TEpurged; do
# grep : Remove consensus with non-TE domains on both strands
# awk : then remove consensus shorter than 100 amino acids
# cut + uniq : and extract their id's
grep -v -f ccandi_k.consensus.both.strand "$TEPURGED" | \
awk '$11 >= 100' | tee "$TEPURGED.mono" | \
cut -f1 -d'#' | uniq > "$TEPURGED.mono.ids"
done
# Make a copy of repeat library to be modified.
cp ccandi_k.fasta ccandi_k.renamed.fasta
# Rename repeat model based on strand evidence.
for CONSENSUS in *.mono.ids; do
# while : for each consensus id
# echo : record id as renamed
# NAMEHASH : create a name suffix from the pfam domain table
# OLDNAME : find old name from repeat consensus library
# sed : replace Unknown with NAMEHASH
while read -r SEQID; do
echo "$SEQID" >> ccandi_k.renamed
NAMEHASH=$( grep "${SEQID}#" "${CONSENSUS/.ids/}" | \
tr -s " " " " | cut -f7 | \
sort | uniq | \
paste -s -d '-' )
OLDNAME=$( grep "${SEQID}#" ccandi_k.fasta | cut -c2- )
sed -i "s|$OLDNAME|${OLDNAME%Unknown}$NAMEHASH|g" ccandi_k.renamed.fasta
done < "$CONSENSUS"
done
cat <<-END_VERSIONS > versions.yml
"ANNOTATE_REPEATS":
awk : $( awk -W version |& head -n1 )
cat : $( cat --version |& head -n1 )
cut : $( cut --version |& head -n1 )
grep : $( grep --version |& head -n1 )
paste: $( paste --version |& head -n1 )
sed : $( sed --version |& head -n1 )
sort : $( sort --version |& head -n1 )
tee : $( tee --version |& head -n1 )
uniq : $( uniq --version |& head -n1 )
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command wrapper:
nxf-scratch-dir r49:/scratch/25481205/nxf.wzUqn0kDFO
Work dir:
/proj/rosling_storage/AMF/b2017181_nobackup/merce/try_RepeatDefeaters/RepeatDefeaters_ccandik/49/5e98bafe759cc9966cd2a7f2eae1b2
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
Params file
## The absolute path (full path, begins with / ) to the input data
## Repeat modeler library
repeat_modeler_fasta : '/proj/rosling_storage/AMF/comparative_genomics/annotation_v4/repeats/ccandi_k_combined_idrenamed_short/repeatmodeler/RM_21933.ThuApr221305072021/consensi.fa'
## Species short name for renaming sequences
species_short_name : 'ccandi_k'
## Workflow outputs
## The absolute path (full path, begins with / ) to the results folder
results : '/proj/rosling_storage/AMF/b2017181_nobackup/merce/try_RepeatDefeaters/results'
## Optional inputs (Remove # to uncomment)
## protein reference
protein_reference :
- 'https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz'
# - '<additional reference1>'
# - '<additional reference2>'
## Path to key words (Describes PFAM entries with TE domains)
transposon_keywords : "./assets/pfam_te_domain_keywords.txt"
## Path to key words blacklist (Describes PFAM entries with TE domains that should be removed)
transposon_blacklist : "./assets/te_domain_keyword_blacklist.txt"
## Path to PFAM accession list of proteins with TE domains (skips PFAM_TRANSPOSIBLE_ELEMENT_SEARCH process)
#pfam_proteins_with_te_domain_list : '$baseDir/assets/pfam_te_domain_keywords.txt'
#pfam_proteins_with_te_domain_list : '$baseDir/assets/Pfam_R32.Proteins_wTE_Domains.seqid'
## PFAM HMM database path
pfam_hmm_db : "ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam32.0/Pfam-A.hmm.gz"
pfam_hmm_dat : "ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam32.0/Pfam-A.hmm.dat.gz"
## PFAM-A database path
pfam_a_db : "ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam32.0/Pfam-A.full.uniprot.gz"
## Workflow package manager configuration
## Use conda instead of containers
# enable_conda : false
## When using singularity, construct image from a docker image
# singularity_pull_docker_container : false
## Uppmax cluster configuration
## UPPMAX project - Needed only when running on an UPPMAX cluster
project : 'snic2022-5-42'
## Convenience for adding additional cluster options to UPPMAX
##clusterOptions : ''
Custom.config file
// Nextflow configuration
// The absolute path (full path, begins with / ) to the work directory ( where intermediate results are stored )
// If you have a SNIC Storage allocation, use the nobackup folder in there.
workDir = '/proj/rosling_storage/AMF/b2017181_nobackup/merce/try_RepeatDefeaters/RepeatDefeaters_ccandik'
// Resume analysis from the last complete process executions (not from the beginning).
resume = true
// Uncomment to enable workflow reporting
// Workflow reporting
timeline {
enabled = true
file = "/proj/rosling_storage/AMF/b2017181_nobackup/merce/try_RepeatDefeaters/RepeatDefeaters_ccandik/pipeline_info/execution_timeline.html"
}
report {
enabled = true
file = "/proj/rosling_storage/AMF/b2017181_nobackup/merce/try_RepeatDefeaters/RepeatDefeaters_ccandik/pipeline_info/execution_report.html"
}
trace {
enabled = true
file = "/proj/rosling_storage/AMF/b2017181_nobackup/merce/try_RepeatDefeaters/RepeatDefeaters_ccandik/pipeline_info/execution_trace.txt"
}
dag {
enabled = true
file = "/proj/rosling_storage/AMF/b2017181_nobackup/merce/try_RepeatDefeaters/RepeatDefeaters_ccandik/pipeline_info/pipeline_dag.svg"
}
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels