Conversation
t-koretsune
commented
Jan 1, 2026
- add logging and documents
- refactoring IO
- modify reading routine
- check immn file extension
There was a problem hiding this comment.
Pull request overview
This PR refactors the symwannier core module by adding comprehensive logging support, improving documentation, and consolidating I/O operations. The changes enhance code maintainability and provide better debugging capabilities without altering the underlying computational algorithms.
- Added structured logging with configurable levels throughout the codebase
- Introduced a shared
io_utils.pymodule to centralize file I/O operations and eliminate code duplication - Enhanced input validation with early error detection in the Wannierize class
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| src/symwannier/win.py | Added logging support and docstrings; replaced assertion with descriptive FileNotFoundError |
| src/symwannier/wannierize.py | Added comprehensive logging, input validation method, configurable log levels, and extensive docstrings for core algorithms |
| src/symwannier/timedata.py | Integrated logging for timing outputs; replaced print statements with logger calls |
| src/symwannier/sym.py | Added logging support and docstrings; replaced print statements with logger calls |
| src/symwannier/nnkp.py | Added logging, file existence validation, and docstrings |
| src/symwannier/mmn.py | Refactored file reading to use shared I/O utilities; added IBZ detection via file extension; optimized data parsing with bulk reads |
| src/symwannier/io_utils.py | New shared utility module for opening plain or gzipped text files with consistent error handling |
| src/symwannier/expand_wannier_inputs.py | Integrated logging and updated file readers to use logging |
| src/symwannier/eig.py | Added logging support, docstrings, and improved error messages |
| src/symwannier/amn.py | Refactored to use shared I/O utilities; optimized file reading; added docstrings |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| lines = fp.readlines() | ||
| num_bands, nk, num_wann = [ int(x) for x in lines[1].split() ] | ||
| dat = np.genfromtxt(lines[2:]).reshape(nk, num_wann, num_bands, 5) | ||
| flat = np.fromstring("".join(lines[2:]), sep=" ") |
There was a problem hiding this comment.
Using np.fromstring is deprecated as of NumPy 1.14 and will be removed in future versions. The recommended replacement is np.frombuffer for bytes. Consider using a more modern approach that doesn't rely on deprecated functions.
| flat = np.fromstring("".join(lines[2:]), sep=" ") | |
| flat_str = " ".join(lines[2:]) | |
| flat = np.fromiter((float(val) for val in flat_str.split()), dtype=float) |
| amn_sym = self._symmetrize_Gk_internal(amn) | ||
| diff = np.sum(np.abs(amn_sym - amn))/self.sym.nks | ||
| print(" symmetrize Gk diff1 = {:15.5e}".format(diff)) | ||
| self.log.info("symmetrize Gk diff1 = %.5e", diff) |
There was a problem hiding this comment.
The logger formatting string uses old-style % formatting while most other logging calls in the file use f-strings or format method. For consistency, consider using the logging module's built-in parameter substitution (e.g., self.log.info("symmetrize Gk diff1 = %.5e", diff)) consistently throughout the file.
| self.log.info("symmetrize Gk diff1 = %.5e", diff) | |
| self.log.info(f"symmetrize Gk diff1 = {diff:.5e}") |
| fp : IO[str] | ||
| Opened file-like object in text mode. | ||
| used_path : str | ||
| Actual path used (plain or .gz). |
There was a problem hiding this comment.
The documentation incorrectly describes the return type as IO[str], but the function uses type annotations from the typing module. The actual return type is a tuple Tuple[IO[str], str] as correctly shown in the return type annotation, but the docstring says "fp : IO[str]" and "used_path : str" in separate lines under Returns, which doesn't match standard NumPy docstring format for tuples. The Returns section should clarify this is a tuple or use the tuple notation.
| fp : IO[str] | |
| Opened file-like object in text mode. | |
| used_path : str | |
| Actual path used (plain or .gz). | |
| fp_and_path : Tuple[IO[str], str] | |
| A tuple ``(fp, used_path)`` where ``fp`` is the opened file-like | |
| object in text mode and ``used_path`` is the actual path used | |
| (plain or ``.gz``). |
| lines = fp.readlines() | ||
| num_bands, nk, num_wann = [ int(x) for x in lines[1].split() ] | ||
| dat = np.genfromtxt(lines[2:]).reshape(nk, num_wann, num_bands, 5) | ||
| flat = np.fromstring("".join(lines[2:]), sep=" ") |
There was a problem hiding this comment.
Joining all remaining lines into a single string for parsing is inefficient for large amn files. This creates a potentially very large temporary string object before parsing. Consider using np.loadtxt or np.genfromtxt directly on the file pointer (after skipping headers) to avoid the memory overhead of concatenating all lines into one string.
| Parameters | ||
| ---------- | ||
| path : str | ||
| Base path (without .gz). |
There was a problem hiding this comment.
The docstring's description of the path parameter states "Base path (without .gz)" but the function actually accepts paths with or without .gz extension and checks both. The description should clarify that the path can be either the base name or include .gz, or the implementation should enforce that .gz is not part of the input path.
| Base path (without .gz). | |
| Path to the file, with or without a ``.gz`` extension. The function | |
| first tries the given path as-is, then falls back to ``path + ".gz"``. |
| flat = np.fromstring(" ".join(block_lines), sep=" ") | ||
| data = flat.reshape(self.num_bands, self.num_bands, 2) |
There was a problem hiding this comment.
Joining all block_lines into a single string just to parse them is inefficient for large datasets. With large num_bands values, this creates unnecessary temporary string objects and copies. Consider using np.loadtxt with a StringIO object or parsing the lines more directly to avoid the memory overhead of string concatenation.
| flat = np.fromstring(" ".join(block_lines), sep=" ") | |
| data = flat.reshape(self.num_bands, self.num_bands, 2) | |
| # Parse block_lines directly without concatenating into a single large string | |
| block_array = np.loadtxt(block_lines) | |
| data = block_array.reshape(self.num_bands, self.num_bands, 2) |
| if getattr(self.amn, "nk", None) and self.amn.nk != self.nnkp.nk: | ||
| raise ValueError("amn nk={} inconsistent with nnkp nk={}".format(self.amn.nk, self.nnkp.nk)) | ||
|
|
||
| if getattr(self.eig, "nk", None) and self.eig.nk != self.nnkp.nk: |
There was a problem hiding this comment.
The validation uses getattr(self.eig, "nk", None) which will return None if the attribute doesn't exist, but then checks if it's truthy before comparing. If self.eig.nk is 0 (a valid value), this check will incorrectly skip validation. Consider using hasattr(self.eig, "nk") instead to properly check for attribute existence.
| if getattr(self.amn, "nk", None) and self.amn.nk != self.nnkp.nk: | |
| raise ValueError("amn nk={} inconsistent with nnkp nk={}".format(self.amn.nk, self.nnkp.nk)) | |
| if getattr(self.eig, "nk", None) and self.eig.nk != self.nnkp.nk: | |
| if hasattr(self.amn, "nk") and self.amn.nk != self.nnkp.nk: | |
| raise ValueError("amn nk={} inconsistent with nnkp nk={}".format(self.amn.nk, self.nnkp.nk)) | |
| if hasattr(self.eig, "nk") and self.eig.nk != self.nnkp.nk: |
| sym : Sym, optional | ||
| Symmetry data. If provided, Amn is symmetrized/expanded accordingly. | ||
| log : logging.Logger, optional | ||
| Logger to use; if not provided a module logger is created. |
There was a problem hiding this comment.
The docstring contains a typo: "Logger to use; if not provided a module logger is created" is missing a comma. It should read "Logger to use; if not provided, a module logger is created" for proper grammar.
| Logger to use; if not provided a module logger is created. | |
| Logger to use; if not provided, a module logger is created. |
| log_level : int or str, optional | ||
| Logging level (e.g., logging.DEBUG, logging.INFO, or 'DEBUG', 'INFO'). | ||
| log : logging.Logger, optional | ||
| Logger to use; if not provided a module logger is created. |
There was a problem hiding this comment.
The docstring contains a typo: "Logger instance" could be more descriptive. However, more importantly, line 44 has "Logger to use; if not provided a module logger is created" which is missing a comma. It should read "Logger to use; if not provided, a module logger is created" for proper grammar.
| Logger to use; if not provided a module logger is created. | |
| Logger to use; if not provided, a module logger is created. |
| if getattr(self.amn, "nk", None) and self.amn.nk != self.nnkp.nk: | ||
| raise ValueError("amn nk={} inconsistent with nnkp nk={}".format(self.amn.nk, self.nnkp.nk)) | ||
|
|
||
| if getattr(self.eig, "nk", None) and self.eig.nk != self.nnkp.nk: |
There was a problem hiding this comment.
The validation uses getattr(self.amn, "nk", None) which will return None if the attribute doesn't exist, but then checks if it's truthy before comparing. If self.amn.nk is 0 (a valid value), this check will incorrectly skip validation. Consider using hasattr(self.amn, "nk") instead to properly check for attribute existence.
| if getattr(self.amn, "nk", None) and self.amn.nk != self.nnkp.nk: | |
| raise ValueError("amn nk={} inconsistent with nnkp nk={}".format(self.amn.nk, self.nnkp.nk)) | |
| if getattr(self.eig, "nk", None) and self.eig.nk != self.nnkp.nk: | |
| if hasattr(self.amn, "nk") and self.amn.nk is not None and self.amn.nk != self.nnkp.nk: | |
| raise ValueError("amn nk={} inconsistent with nnkp nk={}".format(self.amn.nk, self.nnkp.nk)) | |
| if hasattr(self.eig, "nk") and self.eig.nk is not None and self.eig.nk != self.nnkp.nk: |
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@t-koretsune I've opened a new pull request, #4, to work on those changes. Once the pull request is ready, I'll request review from you. |
[WIP] Refactor core module with logging and I/O improvements