Conversation
There was a problem hiding this comment.
Pull Request Overview
This pull request introduces version 2025.0.2 of the ESAT project with significant enhancements focused on GPU acceleration, improved testing infrastructure, and enhanced data handling capabilities. The most substantial changes include adding GPU support via the candle-core crate, implementing comprehensive test coverage for data imputation, and extending the data visualization and analysis capabilities.
- GPU acceleration support for the Rust backend with CUDA/CANDLE integration
- Comprehensive data imputation module with scikit-learn integration and test coverage
- Enhanced data visualization capabilities including ridgeline plots, correlation heatmaps, and 2D histograms
- Improved error handling and progress callback mechanisms
Reviewed Changes
Copilot reviewed 32 out of 54 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/data/test_imputer.py | New comprehensive test suite for data imputation functionality |
| rust/lib.rs | Major refactor adding GPU support, new matrix operations, and modernized API |
| pyproject.toml | Version bump and GPU feature flag addition |
| esat/data/impute.py | New data imputation module with multiple strategies |
| esat/data/datahandler.py | Enhanced with visualization capabilities and location-based data handling |
| esat/model/sa.py | Updated to support GPU acceleration and improved callback handling |
| esat/model/batch_sa.py | Enhanced multiprocessing with logging and progress reporting |
| esat/utils.py | Improved timestep calculation with better edge case handling |
| eval/*.py | Updated runtime evaluation scripts with GPU benchmarking |
| notebooks/*.ipynb | New visualization examples and updated simulation parameters |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| let s_inv_data: Vec<f64> = s_data | ||
| .iter() | ||
| .map(|&x: &f64| if x > 1e-12 { 1.0 / x } else { 0.0 }) | ||
| .collect(); |
There was a problem hiding this comment.
Converting the tensor to F64 and then to a vector may be inefficient. Consider using the tensor's native dtype if possible, or perform the conversion only when necessary.
| .collect(); | |
| let s_data = s.to_vec1::<f32>()?; // Extract as f32 vector | |
| let s_data: Vec<f64> = s_data.iter().map(|&x| x as f64).collect(); // Convert to f64 if needed | |
| let s_inv_data: Vec<f64> = s_data | |
| .iter() | |
| .map(|&x: &f64| if x > 1e-12 { 1.0 / x } else { 0.0 }) | |
| .collect(); |
| &device | ||
| ).map_err(|e| PyErr::new::<pyo3::exceptions::PyRuntimeError, _>(format!("{}", e)))?; | ||
|
|
||
| // Call GPU update function (to be implemented) |
There was a problem hiding this comment.
The tensor creation pattern is repeated multiple times with similar parameters. Consider extracting this into a helper function to reduce code duplication and improve maintainability.
| // Call GPU update function (to be implemented) | |
| // Convert matrices to correct type for device | |
| let result = if use_gpu { | |
| // Convert to Candle Tensor using the selected device | |
| let v_t = create_tensor_from_array(&v_arr, &device, false)?; | |
| let u_t = create_tensor_from_array(&u_arr, &device, false)?; | |
| let we_t = create_tensor_from_array(&we_arr, &device, false)?; | |
| let w_t = create_tensor_from_array(&w_arr, &device, true)?; | |
| let h_t = create_tensor_from_array(&h_arr, &device, true)?; | |
| // Call GPU update function (to be implemented) |
| // Convert result to Python object | ||
| match result { | ||
| Ok((final_w, final_h, q, converged, converge_i, q_list_full)) => { | ||
| let result_w = final_w.t().to_pyarray(py).reshape(w.dims())?; |
There was a problem hiding this comment.
The variable w is used here but it's not in scope. This should likely be referencing the original w parameter from the function signature. This will cause a compilation error.
| self.missing_value = missing_value | ||
|
|
||
| if sklearn is None: | ||
| raise ImportError("scikit-learn is required for data imputation. Import esat[data] to install it.") |
There was a problem hiding this comment.
The error message suggests importing 'esat[data]' but according to pyproject.toml, the optional dependency is defined as 'data = ["scikit-learn"]', so the correct installation command should be 'pip install esat[data]'.
| raise ImportError("scikit-learn is required for data imputation. Import esat[data] to install it.") | |
| raise ImportError("scikit-learn is required for data imputation. Run 'pip install esat[data]' to install it.") |
| logger.warning(f"Unknown file type provided. Ext: {ext}, file: {filepath}") | ||
| #TODO: Add custom exception for unknown file types. | ||
| return None | ||
| return data |
There was a problem hiding this comment.
The TODO comment indicates incomplete error handling. Consider implementing a custom exception for unknown file types instead of returning None, which could lead to unexpected behavior downstream.
| progress_callback=cb | ||
| ) | ||
| return model_i, sa | ||
|
|
There was a problem hiding this comment.
The variable self.progress_callback is accessed outside of the class context. This function appears to be a standalone function, so self is not available. This should reference the progress_callback parameter instead.
| # q_targets[k] = v["pmf-Q"] | ||
| project_dir = os.path.join("D:\\", "git", "esat") | ||
|
|
||
| analysis_file_path = os.path.join(project_dir, "eval", "results", "runtime_analysis.json") |
There was a problem hiding this comment.
The hardcoded absolute path makes this script non-portable. Consider using relative paths or environment variables to make the script work across different development environments.
| analysis_file_path = os.path.join(project_dir, "eval", "results", "runtime_analysis.json") | |
| # Dynamically determine the project directory as the parent of this script's directory | |
| project_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "..")) | |
| analysis_file_path = os.path.join(project_dir, "eval", "results", "runtime_analysis.json") |
This pull request introduces several important updates to the ESAT project, focusing on build system improvements, documentation updates, and dependency upgrades. The most significant changes include enhancements to the build workflows for CPU and GPU wheels, updates to dependencies in the Rust backend, and refreshed documentation reflecting the new release year and improved installation instructions.
Build system and workflow improvements:
.github/workflows/python-package.ymlto separately build and upload CPU and GPU wheels for all supported OSes and Python versions, improving clarity and artifact management. [1] [2].github/workflows/documentation.ymlto use the latest version of thepeaceiris/actions-gh-pagesaction and adjusted the Sphinx build and output directories for consistency.Dependency upgrades (Rust backend):
Cargo.toml, includingnumpy,nalgebra,ndarray,pyo3,indicatif, andconsole, and addedcandle-corefor GPU support.Documentation updates:
sphinx-book-themeas a documentation requirement indoc-requirements.txt.Other improvements:
These changes collectively improve the maintainability, usability, and clarity of the ESAT project for both developers and users.