fix: struct_conn bond matching when auth_asym_id differs from label_a… #74

N283T · 2026-01-17T12:15:55Z

fix(io): struct_conn bond matching when auth_asym_id differs from label_asym_id

When ptnr_label_seq_id is "." for non-polymers, the code correctly falls back to ptnr_auth_seq_id to determine the residue ID. However, the matching logic was using atom_array.res_id (which contains label_seq_id values) instead of auth_seq_id, causing inter-chain bonds to be missed.

This fix:

Extracts auth_seq_id from atom_array annotations
Tracks which partner uses auth_seq_id fallback
Uses the appropriate res_ids array for matching each partner

Fixes 35 affected PDB entries including 9b5c, 8an9, 1lgc.

📋 PR Checklist

This PR is tagged as a draft if it is still under development and not ready for review.

This avoids auto-triggering the slower tests in the CI and needlessly wasting resources.
I have ensured that all my commits follow angular commit message conventions.
Format: <type>[optional scope]: <subject>
Example: fix(af3): add missing crop transform to the af3 pipeline

This affects semantic versioning as follows:
- fix: patch version increment (0.0.1 → 0.0.2)
- feat: minor version increment (0.0.1 → 0.1.0)
- BREAKING CHANGE: major version increment (0.0.1 → 1.0.0)
- All other types do not affect versioning
The format ensures readable changelogs through auto-generation from commit messages.
I have run make format on the codebase before submitting the PR (this autoformats the code and lints it).
I have named the PR in angular PR message format as well (c.f. above), with a sensible tag line that summarizes all the changes in the PR.

This is useful as the name of the PR is the default name of the commit that will be used if you merge with a squash & merge.
Format: <type>[optional scope]: <subject>
Example: fix(af3): add missing crop transform to the af3 pipeline

ℹ️ PR Description

What changes were made and why?

Background: mmCIF label_seq_id vs auth_seq_id

In mmCIF format, there are two residue numbering systems:

label_seq_id: The official mmCIF identifier (primary key for cross-references)
auth_seq_id: Author-assigned residue number (matches PDB format)

For non-polymer entities (ligands, ions, etc.), label_seq_id is typically "." (undefined), while auth_seq_id contains the actual residue number.

The Bug

In get_struct_conn_bonds() (bonds.py:428-458):

When ptnr_label_seq_id is ".", the code correctly falls back to ptnr_auth_seq_id to get the residue ID (e.g., 201)
However, the atom matching logic compares this against atom_array.res_id, which contains label_seq_id values
For non-polymers loaded via load_any(), res_id is -1 (from label_seq_id=".")
The comparison 201 == -1 fails, causing inter-chain bonds to be missed

Why this affects load_any() but not parse()

parse(): Uses build_template_atom_array() which sets res_id = auth_seq_id for non-polymers
load_any(): Uses biotite directly, which follows mmCIF spec (res_id = label_seq_id)

biotite's behavior is correct per mmCIF specification. The bug is in atomworks' get_struct_conn_bonds() which assumes res_id always contains auth_seq_id-compatible values.

The Fix

Extract auth_seq_id annotation from atom_array
Track which bond partner required auth_seq_id fallback
Use auth_seq_id for matching when fallback was used, otherwise use res_id

How were the changes tested?

Unit test: Added regression test test_struct_conn_auth_seq_id_fallback.py with 3 representative PDB entries (9b5c, 8an9, 1lgc)
Comprehensive verification: Tested all 35 affected PDB entries with a verification script comparing struct_conn declarations against detected bonds:
- Before fix: load_any() detected 66/116 inter-chain bonds (56.9%)
- After fix: load_any() detected 116/116 inter-chain bonds (100%)
Existing tests: All existing bond-related tests pass

Additional Notes

Affected PDB entries (35 total):

8an9, 8ano, 8aoo, 9b5c, 9b5f, 9b5g, 9b5h, 9b5i, 9b5j, 9b5m,
9b5p, 9b5q, 9b5r, 9b5s, 9b5t, 6ecd, 6ece, 6ecf, 9hfl, 1lgc,
4lke, 4lkf, 1loc, 5n22, 7new, 5ngq, 5nwk, 9o7j, 7p8q, 4pei,
6s7g, 3to6, 4uzq, 6x5r, 6x5s

Files changed:

src/atomworks/io/utils/bonds.py: Fix matching logic
tests/io/components/test_struct_conn_auth_seq_id_fallback.py: Regression test (new)

…sym_id When ptnr_label_seq_id is "." for non-polymers, the code correctly falls back to ptnr_auth_seq_id to determine the residue ID. However, the matching logic was using atom_array.res_id (which contains label_seq_id values) instead of auth_seq_id, causing inter-chain bonds to be missed. This fix: - Extracts auth_seq_id from atom_array annotations - Tracks which partner uses auth_seq_id fallback - Uses the appropriate res_ids array for matching each partner Fixes 35 affected PDB entries including 9b5c, 8an9, 1lgc.

…reprocessing feat: MSA preprocessing; zst support

partrita pushed a commit to partrita/atomworks that referenced this pull request Jan 19, 2026

Merge pull request RosettaCommons#74 from baker-laboratory/feat/msa-p…

4d45b10

…reprocessing feat: MSA preprocessing; zst support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: struct_conn bond matching when auth_asym_id differs from label_a… #74

fix: struct_conn bond matching when auth_asym_id differs from label_a… #74

Uh oh!

N283T commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: struct_conn bond matching when auth_asym_id differs from label_a… #74

Are you sure you want to change the base?

fix: struct_conn bond matching when auth_asym_id differs from label_a… #74

Uh oh!

Conversation

N283T commented Jan 17, 2026

fix(io): struct_conn bond matching when auth_asym_id differs from label_asym_id

📋 PR Checklist

ℹ️ PR Description

What changes were made and why?

How were the changes tested?

Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant