Skip to content

Conversation

@jamesmkrieger
Copy link
Contributor

Reading CIF files into PyMOL and saving them back out again in MMCIF format gives a file that ends with an atomic line with a new line character on the end.

The previous code for finding the end of atom blocks and assign the stop variable that in turn sets acount was previously resulting in this last line not being counted.

This causes 2 problems:

  1. The mask of all True based on acount for when altloc='all' had the wrong size so the files couldn't be parsed and gave an error:
(prody-github) jkrieger@CNB-WHIPPLE ~/software/scipion3/software/em/prody-github/ProDy (main) $ ipython
Python 3.10.18 | packaged by conda-forge | (main, Jun  4 2025, 14:45:41) [GCC 13.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.30.0 -- An enhanced Interactive Python. Type '?' for help.
Ctrl click to launch VS Code Native REPL

In [1]: from prody import *

In [2]: ag = parseMMCIF('mmcif_3o21_pymol.cif', altloc='all', unite_chains=False)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 ag = parseMMCIF('mmcif_3o21_pymol.cif', altloc='all', unite_chains=False)

File ~/software/scipion3/software/em/prody-github/ProDy/prody/proteins/ciffile.py:133, in parseMMCIF(pdb, **kwargs)
    131     kwargs['title'] = title
    132 cif = openFile(pdb, 'rt')
--> 133 result = parseMMCIFStream(cif, chain=chain, segment=segment, **kwargs)
    134 cif.close()
    135 if unite_chains:

File ~/software/scipion3/software/em/prody-github/ProDy/prody/proteins/ciffile.py:240, in parseMMCIFStream(stream, **kwargs)
    237 if header or biomol or secondary:
    238     hd = getCIFHeaderDict(lines)
--> 240 _parseMMCIFLines(ag, lines, model, chain, subset, altloc, 
    241                  segment, unite_chains, report)
    243 if ag.numAtoms() > 0:
    244     LOGGER.report('{0} atoms and {1} coordinate set(s) were '
    245                   'parsed in %.2fs.'.format(ag.numAtoms(),
    246                                             ag.numCoordsets() - n_csets))

File ~/software/scipion3/software/em/prody-github/ProDy/prody/proteins/ciffile.py:521, in _parseMMCIFLines(atomgroup, lines, model, chain, subset, altloc_torf, segment, unite_chains, report)
    519     atomgroup.addCoordset(coordinates[mask][:modelSize])
    520 else:
--> 521     atomgroup._setCoords(coordinates[mask][:modelSize])
    523 atomgroup.setNames(atomnames[mask][:modelSize])
    524 atomgroup.setResnames(resnames[mask][:modelSize])

IndexError: boolean index did not match indexed array along dimension 0; dimension is 12793 but corresponding boolean dimension is 12792
  1. When altloc='A', we don't get the last row of atoms parsed, so there is one fewer atom than when parsing the original mmcif:
In [1]: from prody import *

In [2]: ag2 = parseMMCIF('mmcif_3o21_pymol.cif', altloc='A', unite_chains=False)
@> 12792 atoms and 1 coordinate set(s) were parsed in 0.26s.

In [3]: ag = parseMMCIF('prody/tests/datafiles/mmcif_3o21.cif', altloc='A', unite_chains=False)
@> 12793 atoms and 1 coordinate set(s) were parsed in 0.23s.
@> Secondary structures were assigned to 960 residues.

With the fix, both cases work fine and give the right number of atoms:

In [1]: from prody import *

In [2]: ag = parseMMCIF('mmcif_3o21_pymol.cif', altloc='A', unite_chains=False)
@> 12793 atoms and 1 coordinate set(s) were parsed in 0.28s.

In [3]: ag = parseMMCIF('mmcif_3o21_pymol.cif', altloc='all', unite_chains=False)
@> 12793 atoms and 1 coordinate set(s) were parsed in 0.25s.

@AnthonyBogetti
Copy link
Member

The changes have been tested and are confirmed to work. Thanks, James.

@jamesmkrieger
Copy link
Contributor Author

Great

@jamesmkrieger
Copy link
Contributor Author

Shall we wait for the tests PR to be merged and confirm the tests pass here too?

@AnthonyBogetti
Copy link
Member

Shall we wait for the tests PR to be merged and confirm the tests pass here too?

Yes, that sounds like a good plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants