Skip to content

Conversation

@stoneandbone
Copy link

[HZ384] Update: Enhanced Hinge Identification Function

This update refines the hinge site detection algorithm based on the methodology of [HZ384] H. Zhang, M. Gur, I. Bahar (2024), PNAS 121(49): e2414333121.

Summary of changes:

  1. Adaptive mode selection: If a GNM model is provided without specifying n_modes, the number of modes is now automatically determined to achieve 33% cumulative variance. When GNM modes are supplied, all provided modes are used for hinge detection.
  2. Hinge detection: Hinges are identified from crossover points and residues within a defined threshold band in the eigenvector.
  3. Region merging: Overlapping or adjacent hinge regions are merged, with optional user-defined spacing tolerance.
  4. Transient reduction: Transient hinge regions are filtered out to improve robustness.
  5. Terminal trimming: Hinges at N- or C-terminal ends, where loops may be presented, may be trimmed based on user preference.
  6. Output: Returns a list of hinges by mode for downstream analysis.

@AnthonyBogetti
Copy link
Member

@jamesmkrieger, this is the code from Frank for finding Hinges (PNAS ref in the main block of the PR) that I mentioned before. @stoneandbone very kindly implemented it. Does this overall implementation look okay to you?

Copy link
Contributor

@jamesmkrieger jamesmkrieger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I just have a few changes, mainly for efficiency.

I didn’t understand the bits with the bands and merging regions but I imagine they are right.

It would also be good to have some unit tests that verify the type and shape of the output as well as the numerical values and how they are affected by the parameters.

@jamesmkrieger
Copy link
Contributor

Thanks very much for making this

@AnthonyBogetti
Copy link
Member

Thanks @jamesmkrieger! I will work with @stoneandbone to address these.

@stoneandbone
Copy link
Author

@jamesmkrieger I have addressed all your suggested fixes, and @AnthonyBogetti and I added a unit test. Please review at your convenience! :)

@AnthonyBogetti
Copy link
Member

AnthonyBogetti commented Dec 2, 2025

The (random) test failures here are due to the pfam tests which should be fixed in #2189 maybe we can merge that one first.

@jamesmkrieger
Copy link
Contributor

The (random) test failures here are due to the pfam tests which should be fixed in #2189 maybe we can merge that one first.

I agree

AnthonyBogetti
AnthonyBogetti previously approved these changes Dec 3, 2025
6) Return list of hinges by mode.
:param modes: GNM model or ModeSet
:type modes: GNM or Mode instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the format that sphinx can read like in other doc strings. Also put ModeSet again in type and put a more meaningful description than the types in the bit above, linking to what was said above about selecting modes.

Double check that param is ok too. We usually put arg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user could also provide vectors as inputs directly

:param modes: GNM model or ModeSet
:type modes: GNM or Mode instance
:param n_modes: Number of modes to consider. Defaults to all.
:type n_modes: int or None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if or is ok. We usually put a comma. Again remember this should be machine readable as well as human readable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe handle default of None in the bit above as well or instead e.g. defaults to all modes if left as None

:param space: spacing between hinges. Defaults to None. Higher space value will minimize the number of local hinges while retaining global hinges.
:type space: int or None
:param trim: Whether to hinges at the N- or C-terminal ends by Protein length // 20. Defaults to False.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps there could be a parameter for how much counts as terminal with options for a fraction as a float less than 1 or a number of residues as an int from 1 onwards. Dividing by 20 is huge for some proteins e.g. 1000//20=50 residues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having atoms as inputs would also allow you to handle multiple chains, which is currently lacking. At the moment there would be merging across chain boundaries

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d suggest the default as 5 residues

raise TypeError("Input must be a GNM model or ModeSet instance.")
else:
if gnm._kirchhoff is None:
raise ValueError("Kirchhoff matrix not built. Build Kirchhoff matrix before calculating modes.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a structure could be provided to build the Kirchhoff internally. Providing a gnm object with the Kirchoff built is a bit counterintuitive


vecs = gnm[:n_modes].getEigvecs()
else:
### Use all provided modes for Hinge detection ###
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally simplify the nested if else logic or otherwise document it with a few more comments. It looks like this is the ModeSet case. You could maybe also put this input handling in a helper function so you can return the modes array. It could be easier for readability and could also be a good output to have for diagnostics.

it could also be good to return the calculated and selected gnm modes as an option more globally as well.

also, there seems to be no handling of the case where a gnm object is provided with modes calculated. In that case they should be used directly

vecs = gnm.getArray()


def detectHinges(v, threshold):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d suggest making this a separate function rather than a nested one so people can run it with their own vectors and threshold. It would also help with readability and maintainability

for i in crx:
r = [i, i + 1]

if abs(v[i]) > band and abs(v[i + 1]) > band:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again simplify nested if else blocks. ChatGPT suggests this. Please check if it makes sense

band = np.sqrt(1 / len(v)) / threshold
regs = []

for i in crx:
    # Start hinge region around the sign change
    start = i
    end = i + 1

    # Expand left
    j = start
    while j >= 0 and abs(v[j]) < band:
        start = j
        j -= 1

    # Expand right
    j = end
    while j < len(v) and abs(v[j]) < band:
        end = j
        j += 1

    # Store inclusive region
    regs.append(list(range(start, end + 1)))

"""
[HZ384] H Zhang, M Gur, I Bahar (2024) Global hinge sites of proteins as target sites for drug binding Proc Natl Acad Sci USA 121 (49), e2414333121
Updated hinge identification based on [HZ384]. This will:
1) If GNM model is provided without specification of n_modes, number of modes will be determined to achieve 33% cumulative variance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a different percentage could also be used as a decimal input to the number of modes

Copy link
Contributor

@jamesmkrieger jamesmkrieger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to reorganise for readability and usability. We’d have two step functions outside the main function that could be helpful in their own right (see below)

  • one for subsetting modes by cumulative variance that could be helpful for PCA and maybe ANM modes too
  • One for calculating hinges for a single mode

please include better control of termini and multiple chains. Also handle calculation of kirchhoff inside and provided gnm with already calculated mode

the tests would also need cases for multiple chains e.g. 3kg2 that we analysed in Dutta et 2015

I haven’t had the chance to look over the tests or to test it and I don’t think I will be able to. If you and @AnthonyBogetti are happy after these changes, then I am too

@AnthonyBogetti
Copy link
Member

@stoneandbone Please see and address James' most recent round of comments.

@AnthonyBogetti AnthonyBogetti dismissed jamesmkrieger’s stale review December 19, 2025 19:23

James' changes were addressed, and due to lack of time on his end for a re-review, we have dismissed the review.

@AnthonyBogetti AnthonyBogetti requested review from jamesmkrieger and removed request for jamesmkrieger December 19, 2025 19:24
@AnthonyBogetti
Copy link
Member

Miraculously, the current tests all passed.

@stoneandbone
Copy link
Author

stoneandbone commented Dec 19, 2025 via email

return gnm.getArray()

if not isinstance(gnm, GNMBase):
raise TypeError("Input must be a GNM model or ModeSet instance.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe allow gnm=None if atoms is not None as a new gnm can be instantiated? People are unlikely to input a newly instantiated gnm object themselves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants