Skip to content

UnicodeDecodeError #38

@lvclark

Description

@lvclark

I am running into an issue with some (but not all) models that I have generated using ESM3. It seems that the Stride output includes some non-UTF-8 characters, so then the downstream Python steps fail to read the file. The error is:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa5 in position 1029: invalid start byte

(or a different byte and position depending on the input file).

I have attached my input PDB and the file that was generated by Stride.

ESM3_LmjF.03.0850.pdb
ESM3_LmjF.03.0850.pdb_ss.txt

For full reproducibility, I built Chainsaw using this Dockerfile:

FROM python:3.11

RUN apt-get update && apt-get upgrade -y

COPY chainsaw /opt/chainsaw

WORKDIR /opt/chainsaw

RUN bash setup.sh

Then converted it to Apptainer. I ran it with the command:

apptainer exec --nv \
  --bind /data/hps/assoc/private/isp_annot/user/lclar5:/mnt \
  /data/hps/assoc/private/isp_annot/container/chainsaw_9ced6e6.sif \
  /opt/chainsaw/chswEnv/bin/python /opt/chainsaw/get_predictions.py \
  --structure_file /mnt/chainsaw_docker/data/models/ESM3_LmjF.03.0850.pdb \
  --output /mnt/chainsaw_docker/results/ESM3_LmjF.03.0850_chainsaw.tsv \
  --min_domain_length 100 --min_ss_components 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions