-
Notifications
You must be signed in to change notification settings - Fork 28
Description
我跑了eval_samples.sh的最后一步,这是我的命令:
$ export precomputed_arrays='/home/liang/temp/precomputed/precomputed_arrays'; accelerate launch --multi_gpu --main_process_port 29511 --num_processes 1 /home/liang/SurfDock/inference_accelerate.py --data_csv /home/liang/SurfDock/data/eval_sample_dirs/SurfDock_eval_samples/txnl1_human_10panx_acetate_24c/input_csv_files/txnl1_human_10panx_acetate_24c.csv --model_dir /home/liang/SurfDock/model_weights/docking --ckpt best_ema_inference_epoch_model.pt --confidence_model_dir /home/liang/SurfDock/model_weights/posepredict --confidence_ckpt best_model.pt --save_docking_result --mdn_dist_threshold_test 3.0 --esm_embeddings_path /home/liang/SurfDock/data/eval_sample_dirs/SurfDock_eval_samples/txnl1_human_10panx_acetate_24c/txnl1_human_10panx_acetate_24c_embedding/esm_embedding_pocket_output/esm_embedding_pocket_output_for_train/esm2_3billion_pdbbind_embeddings.pt --run_name /home/liang/SurfDock/model_weights/posepredict_test_dist_3.0 --project txnl1_human_10panx_acetate_24c --out_dir /home/liang/SurfDock/docking_result/txnl1_human_10panx_acetate_24c --batch_size 40 --batch_size_molecule 1 --samples_per_complex 40 --save_docking_result_number 40 --head_index 0 --tail_index 10000 --inference_mode evaluate --wandb_dir /home/liang/temp/docking_result/test_workdir
出现了以下问题。(前置步骤确认没问题,能正常生成surface和embedding文件)
The following values were not passed to accelerate launch and had defaults used instead:
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
/home/liang/temp/precomputed/precomputed_arrays
/home/liang/temp/precomputed/precomputed_arrays
2025-06-27 14:51:27.579 | INFO | main::90 - Runing inference script in path: /home/liang/SurfDock/bash_scripts
2025-06-27 14:51:27.579 | INFO | main::91 - Runing inference with args: Namespace(config=None, data_csv='/home/liang/SurfDock/data/eval_sample_dirs/SurfDock_eval_samples/txnl1_human_10panx_acetate_24c/input_csv_files/txnl1_human_10panx_acetate_24c.csv', model_dir='/home/liang/SurfDock/model_weights/docking', ckpt='best_ema_inference_epoch_model.pt', confidence_model_dir='/home/liang/SurfDock/model_weights/posepredict', confidence_ckpt='best_model.pt', save_docking_result=True, ligand_to_pocket_center=False, keep_input_pose=False, use_noise_to_rank=False, num_cpu=None, run_name='/home/liang/SurfDock/model_weights/posepredict_test_dist_3.0', project='txnl1_human_10panx_acetate_24c', surface_path='/PDBBind_processed_8A_surface/', esm_embeddings_path='/home/liang/SurfDock/data/eval_sample_dirs/SurfDock_eval_samples/txnl1_human_10panx_acetate_24c/txnl1_human_10panx_acetate_24c_embedding/esm_embedding_pocket_output/esm_embedding_pocket_output_for_train/esm2_3billion_pdbbind_embeddings.pt', out_dir='/home/liang/SurfDock/docking_result/txnl1_human_10panx_acetate_24c', batch_size=40, batch_size_molecule=1, cache_path='/PDBBIND/cache_PDBBIND_pocket_8A', data_dir='/PDBBIND/PDBBind_pocket_8A/', split_path='/data/splits/timesplit_test', no_overlap_names_path='~/data/splits/timesplit_test_no_rec_overlap', no_model=False, no_random=False, no_final_step_noise=False, ode=False, wandb=False, wandb_dir='/home/liang/temp/docking_result/test_workdir', inference_steps=20, limit_complexes=0, num_workers=1, num_process=20, tqdm=False, save_visualisation=False, samples_per_complex=40, save_docking_result_number=40, actual_steps=None, inference_mode='evaluate', head_index=0, tail_index=10000, ligandsMaxAtoms=80, random_seed=42, force_optimize=False, mdn_dist_threshold_test=3.0)
device cuda:0 is used!
2025-06-27 14:51:33.152 | INFO | main:main_function:165 - loaded model weight for score model
2025-06-27 14:51:38.111 | INFO | main:main_function:188 - t schedule:[1. 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35
0.3 0.25 0.2 0.15 0.1 0.05]
2025-06-27 14:51:38.112 | INFO | main:main_function:189 - Loading data ...........
0%| | 0/1 [00:00<?, ?it/s]2025-06-27 14:51:38.119 | INFO | datasets.process_mols:read_molecule:659 - Can't kekulize mol. Unkekulized atoms: 40 83 84 85 86 87 88 89 90
2025-06-27 14:51:38.120 | INFO | datasets.process_mols:read_molecule:660 - RDKit was unable to read the molecule.
2025-06-27 14:51:38.128 | INFO | datasets.process_mols:read_molecule:659 - Can't kekulize mol. Unkekulized atoms: 40 83 84 85 86 87 88 89 90
2025-06-27 14:51:38.128 | INFO | datasets.process_mols:read_molecule:660 - RDKit was unable to read the molecule.
2025-06-27 14:51:38.128 | INFO | score_in_place_dataset.score_dataset:get_complex:161 - ligs: /home/liang/SurfDock/data/eval_sample_dirs/SurfDock_eval_samples/txnl1_human_10panx_acetate_24c/txnl1_human_10panx_acetate_24c_data/txnl1_human_10panx_acetate_24c/txnl1_human_10panx_acetate_24c_ligand.sdf
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.82s/it]
2025-06-27 14:51:39.949 | INFO | score_in_place_dataset.score_dataset:get_complex:171 - Processing txnl1████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.82s/it]
2025-06-27 14:51:39.977 | INFO | datasets.process_mols:extract_receptor_structure:359 - Found 33 LM embeddings for 33 residues
2025-06-27 14:51:40.437 | INFO | main:main_function:260 - Protein txnl1_human_10panx_acetate_24c_protein_processed_8A Size of test dataset: 1
2025-06-27 14:52:41.863 | INFO | datasets.process_mols:read_molecule:659 - Can't kekulize mol. Unkekulized atoms: 40 83 84 85 86 87 88 89 90 | 0/1 [00:00<?, ?it/s]
2025-06-27 14:52:41.864 | INFO | datasets.process_mols:read_molecule:660 - RDKit was unable to read the molecule.
2025-06-27 14:52:41.869 | WARNING | main:main_function:370 - Using non corrected RMSD because of the error:
2025-06-27 14:52:42.541 | ERROR | main:main_function:407 - Failed on :['txnl1_txnl1_human_10panx_acetate_24c_ligand.sdf_file_inner_idx_0'], error of :operands could not be broadcast together with shapes (93,3) (89,3)
2025-06-27 14:53:42.674 | INFO | datasets.process_mols:read_molecule:659 - Can't kekulize mol. Unkekulized atoms: 40 83 84 85 86 87 88 89 90
2025-06-27 14:53:42.676 | INFO | datasets.process_mols:read_molecule:660 - RDKit was unable to read the molecule.
2025-06-27 14:53:42.680 | WARNING | main:main_function:370 - Using non corrected RMSD because of the error:
2025-06-27 14:53:43.393 | ERROR | main:main_function:407 - Failed on :['txnl1_txnl1_human_10panx_acetate_24c_ligand.sdf_file_inner_idx_0'], error of :operands could not be broadcast together with shapes (93,3) (89,3)
2025-06-27 14:54:41.219 | INFO | datasets.process_mols:read_molecule:659 - Can't kekulize mol. Unkekulized atoms: 40 83 84 85 86 87 88 89 90
2025-06-27 14:54:41.220 | INFO | datasets.process_mols:read_molecule:660 - RDKit was unable to read the molecule.
2025-06-27 14:54:41.223 | WARNING | main:main_function:370 - Using non corrected RMSD because of the error:
2025-06-27 14:54:41.692 | ERROR | main:main_function:407 - Failed on :['txnl1_txnl1_human_10panx_acetate_24c_ligand.sdf_file_inner_idx_0'], error of :operands could not be broadcast together with shapes (93,3) (89,3)
2025-06-27 14:55:38.663 | INFO | datasets.process_mols:read_molecule:659 - Can't kekulize mol. Unkekulized atoms: 40 83 84 85 86 87 88 89 90
2025-06-27 14:55:38.663 | INFO | datasets.process_mols:read_molecule:660 - RDKit was unable to read the molecule.
2025-06-27 14:55:38.667 | WARNING | main:main_function:370 - Using non corrected RMSD because of the error:
2025-06-27 14:55:39.124 | ERROR | main:main_function:407 - Failed on :['txnl1_txnl1_human_10panx_acetate_24c_ligand.sdf_file_inner_idx_0'], error of :operands could not be broadcast together with shapes (93,3) (89,3)
2025-06-27 14:56:34.652 | INFO | datasets.process_mols:read_molecule:659 - Can't kekulize mol. Unkekulized atoms: 40 83 84 85 86 87 88 89 90
2025-06-27 14:56:34.653 | INFO | datasets.process_mols:read_molecule:660 - RDKit was unable to read the molecule.
2025-06-27 14:56:34.656 | WARNING | main:main_function:370 - Using non corrected RMSD because of the error:
2025-06-27 14:56:35.115 | ERROR | main:main_function:407 - Failed on :['txnl1_txnl1_human_10panx_acetate_24c_ligand.sdf_file_inner_idx_0'], error of :operands could not be broadcast together with shapes (93,3) (89,3)
2025-06-27 14:57:29.965 | INFO | datasets.process_mols:read_molecule:659 - Can't kekulize mol. Unkekulized atoms: 40 83 84 85 86 87 88 89 90
2025-06-27 14:57:29.966 | INFO | datasets.process_mols:read_molecule:660 - RDKit was unable to read the molecule.
2025-06-27 14:57:29.969 | WARNING | main:main_function:370 - Using non corrected RMSD because of the error:
2025-06-27 14:57:30.424 | ERROR | main:main_function:407 - Failed on :['txnl1_txnl1_human_10panx_acetate_24c_ligand.sdf_file_inner_idx_0'], error of :operands could not be broadcast together with shapes (93,3) (89,3)
2025-06-27 14:57:30.425 | ERROR | main:main_function:411 - Skip by five times Failed on :['txnl1_txnl1_human_10panx_acetate_24c_ligand.sdf_file_inner_idx_0'], error of :operands could not be broadcast together with shapes (93,3) (89,3)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [05:49<00:00, 349.40s/it]
2025-06-27 14:57:30.681 | INFO | main:main_function:441 - Protein txnl1_human_10panx_acetate_24c_protein_processed_8A used time: 352.5637400150299█████████████████████| 1/1 [05:49<00:00, 349.40s/it]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [05:52<00:00, 352.57s/it]
2025-06-27 14:57:30.686 | INFO | main:main_function:445 - Docking time used for one moleculer: 352.5684072971344
2025-06-27 14:57:30.686 | INFO | main:main_function:446 - Docking time used: 352.5684072971344
2025-06-27 14:57:30.686 | INFO | main:main_function:447 - Sampling conformers number: 40
2025-06-27 14:57:30.686 | INFO | main:main_function:448 - Output conformers number: 40
2025-06-27 14:57:30.687 | INFO | main:main_function:449 - Docking output molecule number: 1
这里似乎有两个问题。第一,分子式无法凯库勒化,似乎是rdkit的问题。我修改源代码,使程序即使无法凯库勒化也继续往下,不知道对后续有没有影响;第二,error of :operands could not be broadcast together with shapes (93,3) (89,3),显示数组维度不匹配。
请问这应该怎么解决?
我的sdf文件获得方法是,先用别的方法对接获得复合物的pdb文件,通过pymol转换格式导出。
另外,改软件是否能自动搜索口袋?我看到代码里设置了只能搜索ligand附近8A的口袋,这就势必无法覆盖整个蛋白质表面,而只能在邻近搜索。(来自SurfDock/comp_surface/prepare_target/computeTargetMesh_test_samples.py 中的compute_inp_surface() 函数)。请问原本设计的就是这样的吗?