Skip to content

Update nmf_workflow.py#1

Draft
vedatonuryilmaz wants to merge 1 commit intoMVPfrom
fix-paralleljobs
Draft

Update nmf_workflow.py#1
vedatonuryilmaz wants to merge 1 commit intoMVPfrom
fix-paralleljobs

Conversation

@vedatonuryilmaz
Copy link
Copy Markdown
Owner

No description provided.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the NMF workflow by fixing memory issues, improving JSON serialization, and enhancing the evaluation process. The changes focus on handling sparse matrices efficiently and preventing memory crashes when processing large datasets.

  • Removes memory-intensive sparse matrix conversion that was causing crashes
  • Adds explicit float conversion for JSON serialization compatibility
  • Improves tqdm integration with joblib for better progress tracking

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

group_sample_indices = [
idx for idx, sample_id in enumerate(all_samples_list)
if sample_to_cancer_type_map.get(sample_id) in group_cancer_codes
if (c_type := sample_to_cancer_type_map.get(sample_id)) and c_type[:4] in group_cancer_codes
Copy link

Copilot AI Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The walrus operator assignment and slicing c_type[:4] could cause an IndexError if c_type is shorter than 4 characters. Consider adding a length check: if (c_type := sample_to_cancer_type_map.get(sample_id)) and len(c_type) >= 4 and c_type[:4] in group_cancer_codes

Suggested change
if (c_type := sample_to_cancer_type_map.get(sample_id)) and c_type[:4] in group_cancer_codes
if (c_type := sample_to_cancer_type_map.get(sample_id)) and len(c_type) >= 4 and c_type[:4] in group_cancer_codes

Copilot uses AI. Check for mistakes.
selected_samples = [all_samples_list[idx] for idx in group_sample_indices]

# Export selected sample IDs to preprocessed_data
export_path = os.path.join("preprocessed_data", f"selected_samples_{group_name}.json")
Copy link

Copilot AI Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded "preprocessed_data" directory path may not exist, causing the file write operation to fail. Consider using os.makedirs(os.path.dirname(export_path), exist_ok=True) before writing the file or making the directory path configurable.

Suggested change
export_path = os.path.join("preprocessed_data", f"selected_samples_{group_name}.json")
export_path = os.path.join("preprocessed_data", f"selected_samples_{group_name}.json")
os.makedirs(os.path.dirname(export_path), exist_ok=True)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants