feat(mmcif/assembly): Initialize from_mmcif by khs01654 · Pull Request #17 · seoklab/gmol-base

khs01654 · 2026-02-24T11:32:01Z

Checklist

Have you checked that there aren't other open pull requests for the same issue?
Have you linked the issue(s) you are working on (if any)?

If the change is related to the source code, tests, or build environments, please also check the following:

Does pytest -vs pass without any errors and warnings (at the project root)?
Does mypy --pretty pass without any errors and warnings (at the project root)?

If you added new feature(s), then also check the following:

Did you also add corresponding tests?

Linked Issues

Link the tracking issue(s) of this PR, with the auto-close keywords here:

Closes #...

Note

Medium Risk
Touches core mmCIF export and introduces non-trivial filtering that must keep multiple interdependent tables consistent; mistakes could silently produce invalid or incomplete mmCIF output.

Overview
Assembly now stores the originating Mmcif (metadata) and propagates it through filter, transform, and join, so derived assemblies retain mmCIF context.

Assembly.to_mmcif is expanded to emit additional mmCIF header/experimental blocks and more complete atom_site/struct_conn fields, including placeholder pdbx_struct_assembly/pdbx_struct_oper_list identity data.

Mmcif gains filter/filter_chains to subset a structure while consistently pruning dependent tables (entities, schemes, connections, assembly gens, branch links), plus a new Mmcif.to_mmcif() writer to round-trip the parsed model back to mmCIF text.

^{Written by Cursor Bugbot for commit 1b19fc2. This will update automatically on new commits. Configure here.}

codecov · 2026-02-24T11:33:08Z

Codecov Report

❌ Patch coverage is 5.88235% with 112 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.42%. Comparing base (dfc92a6) to head (1b19fc2).

Files with missing lines	Patch %	Lines
src/gmol/base/data/mmcif/parse.py	4.46%	107 Missing ⚠️
src/gmol/base/data/mmcif/assembly.py	28.57%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #17      +/-   ##
==========================================
- Coverage   82.58%   77.42%   -5.17%     
==========================================
  Files          14       14              
  Lines        1654     1772     +118     
  Branches      229      258      +29     
==========================================
+ Hits         1366     1372       +6     
- Misses        227      339     +112     
  Partials       61       61

Files with missing lines	Coverage Δ
src/gmol/base/data/mmcif/assembly.py	`77.29% <28.57%> (-0.74%)`	⬇️
src/gmol/base/data/mmcif/parse.py	`68.39% <4.46%> (-22.95%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

bbh-pharm · 2026-02-24T11:34:50Z

bug가 좀 있습니다..! 다시 쓰는 과정에서 없어지는 field가 있어서 읽을 때 에러가 나요....! @jnooree

src/gmol/base/data/mmcif/assembly.py

cursor · 2026-02-24T11:47:17Z

Bugbot Autofix prepared fixes for 3 of the 3 bugs found in the latest run.

✅ Fixed: Polymer chains with branches misclassified as Ligand
- Added conditional check to only set chain type to Ligand for branch_scheme if the chain is not already typed as a polymer from poly_seq_scheme.
✅ Fixed: Duplicate seqres entries for polymer chains with branches
- Added deduplication logic to filter out branch schemes with seq_ids already present in poly_schemes before concatenating.
✅ Fixed: StopIteration crash on empty or invalid CIF file
- Wrapped next(read_cif()) in try/except to catch StopIteration and raise a descriptive ValueError instead.

Or push these changes by commenting:

@cursor push e941e73c8a

Preview (e941e73c8a)

diff --git a/src/gmol/base/data/mmcif/assembly.py b/src/gmol/base/data/mmcif/assembly.py
--- a/src/gmol/base/data/mmcif/assembly.py
+++ b/src/gmol/base/data/mmcif/assembly.py
@@ -549,7 +549,12 @@
         """
         from nuri.fmt import cif_ddl2_frame_as_dict, read_cif
 
-        data = next(read_cif(cif_file)).data
+        try:
+            data = next(read_cif(cif_file)).data
+        except StopIteration:
+            raise ValueError(
+                f"Empty or invalid CIF file: {cif_file}"
+            ) from None
         mmcif_dict = cif_ddl2_frame_as_dict(data)
 
         # Build CCD from embedded chem_comp data
@@ -644,9 +649,11 @@
         for chain_id, seq_infos in poly_seq_scheme.items():
             chain_types[chain_id] = polymer_mol_type(seq_infos, ccd)
         for chain_id in branch_scheme:
-            chain_types[chain_id] = MolType.Ligand
+            if chain_id not in chain_types:
+                chain_types[chain_id] = MolType.Ligand
         for chain_id in nonpoly_scheme:
-            chain_types[chain_id] = MolType.Ligand
+            if chain_id not in chain_types:
+                chain_types[chain_id] = MolType.Ligand
 
         # Fallback for chains with no scheme data
         for chain_id in struct_asym:
@@ -691,10 +698,15 @@
 
         # Set seqres and branches for each chain
         for cid, chain in chains.items():
+            poly_schemes = poly_seq_scheme.get(cid, [])
+            poly_seq_ids = {s.seq_id for s in poly_schemes}
+            branch_schemes = [
+                s
+                for s in branch_scheme.get(cid, [])
+                if s.seq_id not in poly_seq_ids
+            ]
             chain.seqres = _select_het_seq_scheme(
-                poly_seq_scheme.get(cid, [])
-                + branch_scheme.get(cid, [])
-                + nonpoly_scheme.get(cid, []),
+                poly_schemes + branch_schemes + nonpoly_scheme.get(cid, []),
                 residues,
             )
             chain.branches = _map_branches(

_{This Bugbot Autofix run was free. To enable Autofix for future PRs, go to the Cursor dashboard.}

…mcif_single

bbh-pharm · 2026-02-24T16:56:21Z

원래 Assembly.to_mmcif로 써도 다시 load_mmcif_single로 다시 읽을 수 없었는데, 그 부분 수정했습니다. (쓰는 field들 확인 부탁드립니다).
Mmcif class의 역할을 제대로 이해했는지 모르겠는데 metadata 같은 역할이었던 것 같습니다. 근데 Assembly cif를 읽을 때 다시 assembly로 만드는게 번거로워서 filter랑 mmcif 쓰는 부분 추가했는데, 이것도 확인 부탁드립니다.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-24T17:01:24Z

src/gmol/base/data/mmcif/assembly.py

+                    "0.0",
+                )
+            ],
+        )


Missing pdbx_struct_assembly_gen block in Assembly.to_mmcif()

High Severity

Assembly.to_mmcif() writes pdbx_struct_assembly and pdbx_struct_oper_list blocks but never writes the required pdbx_struct_assembly_gen block that links them together. When the output is read back, the _join_bioassembly validator finds no pdbx_struct_assembly_gen entries, leaving each BioAssembly with an empty assembly_gens list. This causes mmcif_assemblies() to call _apply_transforms with an empty generators list, ultimately crashing in Assembly.join([]) via np.concatenate on an empty list. The analogous Mmcif.to_mmcif() method correctly writes pdbx_struct_assembly_gen.

feat(mmcif/assembly): Initialize from_mmcif

98265a8

khs01654 assigned bbh-pharm and khs01654 Feb 24, 2026

khs01654 requested review from a team as code owners February 24, 2026 11:32

khs01654 added the feature New feature or request label Feb 24, 2026

bbh-pharm added the DO NOT MERGE Don't merge yet label Feb 24, 2026

cursor bot reviewed Feb 24, 2026

View reviewed changes

src/gmol/base/data/mmcif/assembly.py Outdated Show resolved Hide resolved

src/gmol/base/data/mmcif/assembly.py Outdated Show resolved Hide resolved

src/gmol/base/data/mmcif/assembly.py Outdated Show resolved Hide resolved

bbh-pharm added 2 commits February 25, 2026 01:09

feat(data/assembly): make Assembly.to_mmcif output readable by load_m…

6b43dad

…mcif_single

feat(data/mmcif): add filtering and write method in Mmcif

1b19fc2

cursor bot reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat(mmcif/assembly): Initialize from_mmcif#17

feat(mmcif/assembly): Initialize from_mmcif#17
khs01654 wants to merge 3 commits intomainfrom
mmcif-assembly

khs01654 commented Feb 24, 2026 •

edited by cursor bot

Loading

Uh oh!

codecov bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

bbh-pharm commented Feb 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot commented Feb 24, 2026

Uh oh!

bbh-pharm commented Feb 24, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

khs01654 commented Feb 24, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Linked Issues

Uh oh!

codecov bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bbh-pharm commented Feb 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot commented Feb 24, 2026

Uh oh!

bbh-pharm commented Feb 24, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Missing pdbx_struct_assembly_gen block in Assembly.to_mmcif()

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

khs01654 commented Feb 24, 2026 •

edited by cursor bot

Loading

codecov bot commented Feb 24, 2026 •

edited

Loading

Missing `pdbx_struct_assembly_gen` block in `Assembly.to_mmcif()`