fix: struct_conn bond matching when auth_asym_id differs from label_a… #74
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix(io): struct_conn bond matching when auth_asym_id differs from label_asym_id
When ptnr_label_seq_id is "." for non-polymers, the code correctly falls back to ptnr_auth_seq_id to determine the residue ID. However, the matching logic was using atom_array.res_id (which contains label_seq_id values) instead of auth_seq_id, causing inter-chain bonds to be missed.
This fix:
Fixes 35 affected PDB entries including 9b5c, 8an9, 1lgc.
📋 PR Checklist
This PR is tagged as a draft if it is still under development and not ready for review.
I have ensured that all my commits follow angular commit message conventions.
I have run
make formaton the codebase before submitting the PR (this autoformats the code and lints it).I have named the PR in angular PR message format as well (c.f. above), with a sensible tag line that summarizes all the changes in the PR.
ℹ️ PR Description
What changes were made and why?
Background: mmCIF label_seq_id vs auth_seq_id
In mmCIF format, there are two residue numbering systems:
label_seq_id: The official mmCIF identifier (primary key for cross-references)auth_seq_id: Author-assigned residue number (matches PDB format)For non-polymer entities (ligands, ions, etc.),
label_seq_idis typically "." (undefined), whileauth_seq_idcontains the actual residue number.The Bug
In
get_struct_conn_bonds()(bonds.py:428-458):ptnr_label_seq_idis ".", the code correctly falls back toptnr_auth_seq_idto get the residue ID (e.g., 201)atom_array.res_id, which containslabel_seq_idvaluesload_any(),res_idis -1 (fromlabel_seq_id=".")201 == -1fails, causing inter-chain bonds to be missedWhy this affects
load_any()but notparse()parse(): Usesbuild_template_atom_array()which setsres_id = auth_seq_idfor non-polymersload_any(): Uses biotite directly, which follows mmCIF spec (res_id = label_seq_id)biotite's behavior is correct per mmCIF specification. The bug is in atomworks'
get_struct_conn_bonds()which assumesres_idalways containsauth_seq_id-compatible values.The Fix
auth_seq_idannotation fromatom_arrayauth_seq_idfallbackauth_seq_idfor matching when fallback was used, otherwise useres_idHow were the changes tested?
Unit test: Added regression test
test_struct_conn_auth_seq_id_fallback.pywith 3 representative PDB entries (9b5c, 8an9, 1lgc)Comprehensive verification: Tested all 35 affected PDB entries with a verification script comparing
struct_conndeclarations against detected bonds:load_any()detected 66/116 inter-chain bonds (56.9%)load_any()detected 116/116 inter-chain bonds (100%)Existing tests: All existing bond-related tests pass
Additional Notes
Affected PDB entries (35 total):
Files changed:
src/atomworks/io/utils/bonds.py: Fix matching logictests/io/components/test_struct_conn_auth_seq_id_fallback.py: Regression test (new)