compare_gtf_to_ER_01ksb.py crashes when gene_id contains substring "ER"

## Bug: `compare_gtf_to_ER_01ksb.py` crashes when `gene_id` contains substring "ER"

### Summary
`utilities/compare_gtf_to_ER_01ksb.py` raises a `ValueError` during ER sorting if any `gene_id` includes the substring `"ER"` (e.g., `CHAER1`). The script constructs ER labels as `"{gene_id}:ER{n}"`, but the sorting key uses `split("ER")`, which breaks when `"ER"` appears earlier in the `gene_id`.

### Steps to reproduce
1) Ensure the ER GTF contains a gene whose `gene_id` includes `"ER"`, e.g.:

```gtf
chr4	TranD	exon	56697216	56697886	.	-	.	transcript_id "CHAER1"; gene_id "CHAER1";
```

2) Run the script:

```bash
python utilities/compare_gtf_to_ER_01ksb.py \
  -i <input.gtf> \
  -er <er.gtf> \
  -o <outdir>
```

### Expected behavior
Script completes successfully and writes the output CSVs:
- `*_infoERP.csv`
- `*_flagER.csv`

### Actual behavior
Script crashes while building `geneDct`:

```
ValueError: invalid literal for int() with base 10: '1:'
```

### Where it fails
The error occurs at the ER sorting step:

```python
geneDct = dict(geneERDf.groupby('gene_id').apply(
    lambda x: sorted(set(x['ER']), key=lambda x: int(x.split("ER")[1]))))
```

When `gene_id = "CHAER1"`, the script generates ER IDs like `CHAER1:ER1`.
Then `"CHAER1:ER1".split("ER")` returns `["CHA", "1:", "1"]`, so `split("ER")[1] == "1:"`, and `int("1:")` raises `ValueError`.

### Cause
The ER parsing assumes the first `"ER"` in the string is the ER suffix delimiter, but `"ER"` can appear inside `gene_id`. Using `split("ER")` is not robust for the constructed ER label format.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compare_gtf_to_ER_01ksb.py crashes when gene_id contains substring "ER" #10

Bug: `compare_gtf_to_ER_01ksb.py` crashes when `gene_id` contains substring "ER"

Summary

Steps to reproduce

Expected behavior

Actual behavior

Where it fails

Cause

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

compare_gtf_to_ER_01ksb.py crashes when gene_id contains substring "ER" #10

Description

Bug: compare_gtf_to_ER_01ksb.py crashes when gene_id contains substring "ER"

Summary

Steps to reproduce

Expected behavior

Actual behavior

Where it fails

Cause

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug: `compare_gtf_to_ER_01ksb.py` crashes when `gene_id` contains substring "ER"