Skip to content

Conversation

@N283T
Copy link
Contributor

@N283T N283T commented Jan 17, 2026

fix(io): struct_conn bond matching when auth_asym_id differs from label_asym_id

When ptnr_label_seq_id is "." for non-polymers, the code correctly falls back to ptnr_auth_seq_id to determine the residue ID. However, the matching logic was using atom_array.res_id (which contains label_seq_id values) instead of auth_seq_id, causing inter-chain bonds to be missed.

This fix:

  • Extracts auth_seq_id from atom_array annotations
  • Tracks which partner uses auth_seq_id fallback
  • Uses the appropriate res_ids array for matching each partner

Fixes 35 affected PDB entries including 9b5c, 8an9, 1lgc.

📋 PR Checklist

  • This PR is tagged as a draft if it is still under development and not ready for review.

    This avoids auto-triggering the slower tests in the CI and needlessly wasting resources.

  • I have ensured that all my commits follow angular commit message conventions.

    Format: <type>[optional scope]: <subject>
    Example: fix(af3): add missing crop transform to the af3 pipeline

    This affects semantic versioning as follows:

    • fix: patch version increment (0.0.1 → 0.0.2)
    • feat: minor version increment (0.0.1 → 0.1.0)
    • BREAKING CHANGE: major version increment (0.0.1 → 1.0.0)
    • All other types do not affect versioning

    The format ensures readable changelogs through auto-generation from commit messages.

  • I have run make format on the codebase before submitting the PR (this autoformats the code and lints it).

  • I have named the PR in angular PR message format as well (c.f. above), with a sensible tag line that summarizes all the changes in the PR.

    This is useful as the name of the PR is the default name of the commit that will be used if you merge with a squash & merge.
    Format: <type>[optional scope]: <subject>
    Example: fix(af3): add missing crop transform to the af3 pipeline


ℹ️ PR Description

What changes were made and why?

Background: mmCIF label_seq_id vs auth_seq_id

In mmCIF format, there are two residue numbering systems:

  • label_seq_id: The official mmCIF identifier (primary key for cross-references)
  • auth_seq_id: Author-assigned residue number (matches PDB format)

For non-polymer entities (ligands, ions, etc.), label_seq_id is typically "." (undefined), while auth_seq_id contains the actual residue number.

The Bug

In get_struct_conn_bonds() (bonds.py:428-458):

  1. When ptnr_label_seq_id is ".", the code correctly falls back to ptnr_auth_seq_id to get the residue ID (e.g., 201)
  2. However, the atom matching logic compares this against atom_array.res_id, which contains label_seq_id values
  3. For non-polymers loaded via load_any(), res_id is -1 (from label_seq_id=".")
  4. The comparison 201 == -1 fails, causing inter-chain bonds to be missed

Why this affects load_any() but not parse()

  • parse(): Uses build_template_atom_array() which sets res_id = auth_seq_id for non-polymers
  • load_any(): Uses biotite directly, which follows mmCIF spec (res_id = label_seq_id)

biotite's behavior is correct per mmCIF specification. The bug is in atomworks' get_struct_conn_bonds() which assumes res_id always contains auth_seq_id-compatible values.

The Fix

  1. Extract auth_seq_id annotation from atom_array
  2. Track which bond partner required auth_seq_id fallback
  3. Use auth_seq_id for matching when fallback was used, otherwise use res_id

How were the changes tested?

  1. Unit test: Added regression test test_struct_conn_auth_seq_id_fallback.py with 3 representative PDB entries (9b5c, 8an9, 1lgc)

  2. Comprehensive verification: Tested all 35 affected PDB entries with a verification script comparing struct_conn declarations against detected bonds:

    • Before fix: load_any() detected 66/116 inter-chain bonds (56.9%)
    • After fix: load_any() detected 116/116 inter-chain bonds (100%)
  3. Existing tests: All existing bond-related tests pass

Additional Notes

Affected PDB entries (35 total):

8an9, 8ano, 8aoo, 9b5c, 9b5f, 9b5g, 9b5h, 9b5i, 9b5j, 9b5m,
9b5p, 9b5q, 9b5r, 9b5s, 9b5t, 6ecd, 6ece, 6ecf, 9hfl, 1lgc,
4lke, 4lkf, 1loc, 5n22, 7new, 5ngq, 5nwk, 9o7j, 7p8q, 4pei,
6s7g, 3to6, 4uzq, 6x5r, 6x5s

Files changed:

  • src/atomworks/io/utils/bonds.py: Fix matching logic
  • tests/io/components/test_struct_conn_auth_seq_id_fallback.py: Regression test (new)

…sym_id

When ptnr_label_seq_id is "." for non-polymers, the code correctly falls
back to ptnr_auth_seq_id to determine the residue ID. However, the
matching logic was using atom_array.res_id (which contains label_seq_id
values) instead of auth_seq_id, causing inter-chain bonds to be missed.

This fix:
- Extracts auth_seq_id from atom_array annotations
- Tracks which partner uses auth_seq_id fallback
- Uses the appropriate res_ids array for matching each partner

Fixes 35 affected PDB entries including 9b5c, 8an9, 1lgc.
partrita pushed a commit to partrita/atomworks that referenced this pull request Jan 19, 2026
…reprocessing

feat: MSA preprocessing; zst support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant