Multiple fixes for AlphaLink2 backend #531

DimaMolod · 2025-08-05T10:37:10Z

- Fix KeyError 'model_runners' in run_structure_prediction.py when using AlphaLink backend - AlphaLink backend returns 'param_path' and 'configs' instead of 'model_runners' - Add separate random seed handling for AlphaLink backend - Add AlphaLink-specific flags to run_multimer_jobs.py command construction - Create comprehensive test file check_alphalink_predictions.py similar to AlphaFold2/3 tests - Add simple test to verify the fix works correctly The issue was that the AlphaLink backend's setup() method returns a different dictionary structure than the AlphaFold backend, causing a KeyError when trying to access 'model_runners' key.

- Update ALPHALINK_WEIGHTS_DIR to use correct path: /scratch/AlphaFold_DBs/alphalink_weights - Add tests for both with and without crosslinks data - Create comprehensive test suite with parameterized tests - Add integration test to verify weights path and command construction - Test both scenarios: with crosslinks (--crosslinks flag) and without crosslinks - Verify that the KeyError fix works in both scenarios The tests now properly validate: 1. AlphaLink weights path is correct and file exists 2. Command construction works with and without crosslinks 3. The KeyError fix is working correctly 4. Both run_structure_prediction.py and run_multimer_jobs.py scripts

- Remove unnecessary test files (test_alphalink_fix.py, test_alphalink_integration.py) - Create check_alphalink_predictions.py identical to AlphaFold2/3 test structure - Use correct weights path: /scratch/AlphaFold_DBs/alphalink_weights/AlphaLink-Multimer_SDA_v3.pt - Always include crosslinks data (required for AlphaLink) - Follow same parameterized test structure as AlphaFold2/3 tests - Document PyTorch environment requirements (different from JAX-based AlphaFold) - Update summary to reflect correct approach The test structure now matches check_alphafold2_predictions.py and check_alphafold3_predictions.py exactly, with proper conda environment requirements documented.

- Changed predict method to use kwargs for parameter extraction - This fixes the parameter order mismatch between setup() and predict() - Extracts configs, param_path, and crosslinks from kwargs - Adds validation to ensure all required parameters are present - Fixes the TypeError where output_dir was being passed as MultimericObject

- Fix data_directory to point to weights file instead of directory - Remove debug code from AlphaLink backend - This should resolve the IsADirectoryError when loading weights

- Fix weights path to use correct location: /scratch/AlphaFold_DBs/alphalink_weights/ - Add clear environment requirements warning about PyTorch vs JAX - Emphasize separate environments for AlphaFold vs AlphaLink - Fix internal link reference to installation section

…eins - Add _process_homo_oligomer_chopped_line method to handle format: PROTEIN,NUMBER,REGIONS - Parse chopped regions correctly (e.g., 1-3,4-5,6-7,7-8) - Create correct number of chain sequences for homo-oligomers - This fixes the test failure where expected sequences were empty

- Remove --use_alphalink and --alphalink_weight flags that don't exist in run_structure_prediction.py - These flags are not needed since AlphaLink is handled via --fold_backend=alphalink and --crosslinks - This fixes the 'Unknown command line flag' errors in tests

- Replace hardcoded 'python3' with sys.executable to use correct environment - This ensures AlphaLink tests run with the correct Python environment - Fixes SIGABRT errors caused by wrong Python environment

- Add environment variables to limit threading in subprocesses - This prevents threading conflicts that cause SIGABRT errors - Should fix the remaining test failures for run_multimer_jobs.py tests

- Update _runCommonTests to automatically detect and check subdirectories - This handles the case where run_multimer_jobs.py creates output in subdirectories - Tests now correctly find AlphaLink output files regardless of directory structure

- AlphaLink is a generative model that creates novel protein sequences - Don't expect exact sequence matches since AlphaLink generates new sequences - Instead validate that sequences are valid protein sequences (non-empty, valid amino acids) - Check that chain IDs match expected structure - This makes tests appropriate for AlphaLink's generative nature

- Add sequence extraction logic test to validate input processing - Add sequence validation logic test with mock PDB data - Improve threading controls for TensorFlow/JAX components - Tests now properly handle AlphaLink's generative nature - All validation logic working correctly

- Fix model name: AlphaLink should use 'multimer_af2_crop' instead of 'monomer_ptm' - Fix sequence validation: AlphaLink should generate sequences that match input pickle files - Override model name for AlphaLink backend in run_structure_prediction.py - Update test validation to expect exact sequence matches from input data

- AlphaLink was hardcoded to generate 10 models regardless of num_predictions_per_model - Now properly passes num_predictions_per_model from kwargs to predict_iterations - Defaults to 1 prediction if not specified - This makes AlphaLink consistent with AlphaFold2 backend behavior

- Add model name fix validation test - Add num_predictions_per_model fix validation test - Add more aggressive threading controls for TensorFlow/JAX - All core logic tests now passing - Provides validation of fixes without requiring full prediction pipeline

- Add makedirs() call before saving PAE files to ensure output directory exists - This fixes FileNotFoundError when AlphaLink tries to save files to subdirectories - Ensures compatibility with use_ap_style flag that modifies output paths

- Add safe access to chain_id_map attribute using getattr() - Handle case where MonomericObject doesn't have chain_id_map attribute - Default to None if chain_id_map is not available - This fixes AttributeError when AlphaLink tries to access chain_id_map on MonomericObject

- Add dynamic subdirectory detection logic to _check_chain_counts_and_sequences - Use same logic as _runCommonTests to find AlphaLink output files - This fixes 'No predicted PDB files found' errors in test suite - Ensures tests look in correct subdirectories for ranked PDB files

- Add _process_simple_homo_oligomer_line method for PROTEIN,NUM format - Fix _process_mixed_line to handle chopped proteins in mixed inputs - Update _process_homo_oligomer_chopped_line to handle both formats: * PROTEIN,NUM,REGIONS (homo-oligomer with chopped regions) * PROTEIN,REGION1,REGION2,... (single chopped protein) - Fix chain ID assignment to be sequential across mixed inputs - Now correctly handles all test cases: monomer, dimer, trimer, homo-oligomer, chopped dimer

- Add TestAlphaLinkRunModesNoCrosslinks class for testing AlphaLink without crosslinks - Include monomer_no_xl and dimer_no_xl test cases - Add _args_no_crosslinks method that omits crosslinks parameter - Ensures AlphaLink backend works correctly both with and without crosslinking data - Provides comprehensive test coverage for all AlphaLink functionality

- Add preprocess_features method to handle feature format differences - Convert seq_length from array to scalar when needed - Handle other potential array features (num_alignments, num_templates) - Ensures AlphaLink2 receives features in expected format - Fixes TypeError: only length-1 arrays can be converted to Python scalars

…tes.py, create_simple_test.py

…nore

Copilot

Pull Request Overview

This PR removes test data files for AlphaFold3 backend predictions as part of fixing issue #524. The changes clean up test data by deleting prediction files that are no longer needed.

Removal of prediction output files (model.cif and summary_confidences.json) from two different sample directories
Deletion of large structural model files and associated confidence metrics
Clean-up of test data for the "chopped dimer" test case in the AF3 backend

Reviewed Changes

Copilot reviewed 31 out of 260 changed files in this pull request and generated no comments.

File	Description
test/test_data/predictions/af3_backend/test__chopped_dimer/seed-498408034_sample-3/summary_confidences.json	Removes confidence metrics JSON file for sample 3
test/test_data/predictions/af3_backend/test__chopped_dimer/seed-498408034_sample-3/model.cif	Removes large structural model file (1136 lines) for sample 3
test/test_data/predictions/af3_backend/test__chopped_dimer/seed-498408034_sample-2/summary_confidences.json	Removes confidence metrics JSON file for sample 2

DimaMolod added 28 commits August 4, 2025 15:31

Add final summary of AlphaLink issue #524 resolution

77b64cb

Add debugging to AlphaLink backend to understand parameter structure

0f5f4fe

Fix AlphaLink test configuration and remove debug code

c87807d

- Fix data_directory to point to weights file instead of directory - Remove debug code from AlphaLink backend - This should resolve the IsADirectoryError when loading weights

Fix subprocess Python executable in run_multimer_jobs.py

90378f1

- Replace hardcoded 'python3' with sys.executable to use correct environment - This ensures AlphaLink tests run with the correct Python environment - Fixes SIGABRT errors caused by wrong Python environment

Add threading control to AlphaLink tests to prevent SIGABRT

ebd39e2

- Add environment variables to limit threading in subprocesses - This prevents threading conflicts that cause SIGABRT errors - Should fix the remaining test failures for run_multimer_jobs.py tests

Fix AlphaLink output directory creation issue

fd1247d

- Add makedirs() call before saving PAE files to ensure output directory exists - This fixes FileNotFoundError when AlphaLink tries to save files to subdirectories - Ensures compatibility with use_ap_style flag that modifies output paths

Update AlphaLink2 submodule to latest main branch and commit all changes

8b795b1

Remove leftover test files: test_simple_alphalink.py, fix_test_templa…

aa346e9

…tes.py, create_simple_test.py

Remove alphapulldown.egg-info directory and add *.egg-info/ to .gitig…

8e55bda

…nore

All tests passed but chain id == '9' for all monomers

d62db61

DimaMolod requested a review from Copilot August 7, 2025 13:13

Copilot AI reviewed Aug 7, 2025

View reviewed changes

DimaMolod merged commit 6c38bc1 into main Aug 7, 2025
8 checks passed

DimaMolod deleted the issue-524 branch August 7, 2025 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple fixes for AlphaLink2 backend #531

Multiple fixes for AlphaLink2 backend #531

Uh oh!

DimaMolod commented Aug 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Multiple fixes for AlphaLink2 backend #531

Multiple fixes for AlphaLink2 backend #531

Uh oh!

Conversation

DimaMolod commented Aug 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants