Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes the NMF workflow by fixing memory issues, improving JSON serialization, and enhancing the evaluation process. The changes focus on handling sparse matrices efficiently and preventing memory crashes when processing large datasets.
- Removes memory-intensive sparse matrix conversion that was causing crashes
- Adds explicit float conversion for JSON serialization compatibility
- Improves tqdm integration with joblib for better progress tracking
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| group_sample_indices = [ | ||
| idx for idx, sample_id in enumerate(all_samples_list) | ||
| if sample_to_cancer_type_map.get(sample_id) in group_cancer_codes | ||
| if (c_type := sample_to_cancer_type_map.get(sample_id)) and c_type[:4] in group_cancer_codes |
There was a problem hiding this comment.
The walrus operator assignment and slicing c_type[:4] could cause an IndexError if c_type is shorter than 4 characters. Consider adding a length check: if (c_type := sample_to_cancer_type_map.get(sample_id)) and len(c_type) >= 4 and c_type[:4] in group_cancer_codes
| if (c_type := sample_to_cancer_type_map.get(sample_id)) and c_type[:4] in group_cancer_codes | |
| if (c_type := sample_to_cancer_type_map.get(sample_id)) and len(c_type) >= 4 and c_type[:4] in group_cancer_codes |
| selected_samples = [all_samples_list[idx] for idx in group_sample_indices] | ||
|
|
||
| # Export selected sample IDs to preprocessed_data | ||
| export_path = os.path.join("preprocessed_data", f"selected_samples_{group_name}.json") |
There was a problem hiding this comment.
The hardcoded "preprocessed_data" directory path may not exist, causing the file write operation to fail. Consider using os.makedirs(os.path.dirname(export_path), exist_ok=True) before writing the file or making the directory path configurable.
| export_path = os.path.join("preprocessed_data", f"selected_samples_{group_name}.json") | |
| export_path = os.path.join("preprocessed_data", f"selected_samples_{group_name}.json") | |
| os.makedirs(os.path.dirname(export_path), exist_ok=True) |
No description provided.