refactor/core cleanup by t-koretsune · Pull Request #3 · wannier-utils-dev/symWannier

t-koretsune · 2026-01-01T23:16:12Z

add logging and documents
refactoring IO
modify reading routine
check immn file extension

Copilot

Pull request overview

This PR refactors the symwannier core module by adding comprehensive logging support, improving documentation, and consolidating I/O operations. The changes enhance code maintainability and provide better debugging capabilities without altering the underlying computational algorithms.

Added structured logging with configurable levels throughout the codebase
Introduced a shared io_utils.py module to centralize file I/O operations and eliminate code duplication
Enhanced input validation with early error detection in the Wannierize class

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
src/symwannier/win.py	Added logging support and docstrings; replaced assertion with descriptive FileNotFoundError
src/symwannier/wannierize.py	Added comprehensive logging, input validation method, configurable log levels, and extensive docstrings for core algorithms
src/symwannier/timedata.py	Integrated logging for timing outputs; replaced print statements with logger calls
src/symwannier/sym.py	Added logging support and docstrings; replaced print statements with logger calls
src/symwannier/nnkp.py	Added logging, file existence validation, and docstrings
src/symwannier/mmn.py	Refactored file reading to use shared I/O utilities; added IBZ detection via file extension; optimized data parsing with bulk reads
src/symwannier/io_utils.py	New shared utility module for opening plain or gzipped text files with consistent error handling
src/symwannier/expand_wannier_inputs.py	Integrated logging and updated file readers to use logging
src/symwannier/eig.py	Added logging support, docstrings, and improved error messages
src/symwannier/amn.py	Refactored to use shared I/O utilities; optimized file reading; added docstrings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-01T23:22:49Z

src/symwannier/amn.py

        lines = fp.readlines()
        num_bands, nk, num_wann = [ int(x) for x in lines[1].split() ]
-        dat = np.genfromtxt(lines[2:]).reshape(nk, num_wann, num_bands, 5)
+        flat = np.fromstring("".join(lines[2:]), sep=" ")


Using np.fromstring is deprecated as of NumPy 1.14 and will be removed in future versions. The recommended replacement is np.frombuffer for bytes. Consider using a more modern approach that doesn't rely on deprecated functions.

Suggested change

flat = np.fromstring("".join(lines[2:]), sep=" ")

flat_str = " ".join(lines[2:])

flat = np.fromiter((float(val) for val in flat_str.split()), dtype=float)

Copilot · 2026-01-01T23:22:50Z

src/symwannier/amn.py

        amn_sym = self._symmetrize_Gk_internal(amn)
        diff = np.sum(np.abs(amn_sym - amn))/self.sym.nks
-        print("  symmetrize Gk diff1 = {:15.5e}".format(diff))
+        self.log.info("symmetrize Gk diff1 = %.5e", diff)


The logger formatting string uses old-style % formatting while most other logging calls in the file use f-strings or format method. For consistency, consider using the logging module's built-in parameter substitution (e.g., self.log.info("symmetrize Gk diff1 = %.5e", diff)) consistently throughout the file.

Suggested change

self.log.info("symmetrize Gk diff1 = %.5e", diff)

self.log.info(f"symmetrize Gk diff1 = {diff:.5e}")

Copilot · 2026-01-01T23:22:50Z

src/symwannier/io_utils.py

+    fp : IO[str]
+        Opened file-like object in text mode.
+    used_path : str
+        Actual path used (plain or .gz).


The documentation incorrectly describes the return type as IO[str], but the function uses type annotations from the typing module. The actual return type is a tuple Tuple[IO[str], str] as correctly shown in the return type annotation, but the docstring says "fp : IO[str]" and "used_path : str" in separate lines under Returns, which doesn't match standard NumPy docstring format for tuples. The Returns section should clarify this is a tuple or use the tuple notation.

Suggested change

fp : IO[str]

Opened file-like object in text mode.

used_path : str

Actual path used (plain or .gz).

fp_and_path : Tuple[IO[str], str]

A tuple ``(fp, used_path)`` where ``fp`` is the opened file-like

object in text mode and ``used_path`` is the actual path used

(plain or ``.gz``).

Copilot · 2026-01-01T23:22:50Z

src/symwannier/amn.py

        lines = fp.readlines()
        num_bands, nk, num_wann = [ int(x) for x in lines[1].split() ]
-        dat = np.genfromtxt(lines[2:]).reshape(nk, num_wann, num_bands, 5)
+        flat = np.fromstring("".join(lines[2:]), sep=" ")


Joining all remaining lines into a single string for parsing is inefficient for large amn files. This creates a potentially very large temporary string object before parsing. Consider using np.loadtxt or np.genfromtxt directly on the file pointer (after skipping headers) to avoid the memory overhead of concatenating all lines into one string.

Copilot · 2026-01-01T23:22:50Z

src/symwannier/io_utils.py

+    Parameters
+    ----------
+    path : str
+        Base path (without .gz).


The docstring's description of the path parameter states "Base path (without .gz)" but the function actually accepts paths with or without .gz extension and checks both. The description should clarify that the path can be either the base name or include .gz, or the implementation should enforce that .gz is not part of the input path.

Suggested change

Base path (without .gz).

Path to the file, with or without a ``.gz`` extension. The function

first tries the given path as-is, then falls back to ``path + ".gz"``.

Copilot · 2026-01-01T23:22:52Z

src/symwannier/mmn.py

+            flat = np.fromstring(" ".join(block_lines), sep=" ")
+            data = flat.reshape(self.num_bands, self.num_bands, 2)


Joining all block_lines into a single string just to parse them is inefficient for large datasets. With large num_bands values, this creates unnecessary temporary string objects and copies. Consider using np.loadtxt with a StringIO object or parsing the lines more directly to avoid the memory overhead of string concatenation.

Suggested change

flat = np.fromstring(" ".join(block_lines), sep=" ")

data = flat.reshape(self.num_bands, self.num_bands, 2)

# Parse block_lines directly without concatenating into a single large string

block_array = np.loadtxt(block_lines)

data = block_array.reshape(self.num_bands, self.num_bands, 2)

Copilot · 2026-01-01T23:22:52Z

src/symwannier/wannierize.py

+        if getattr(self.amn, "nk", None) and self.amn.nk != self.nnkp.nk:
+            raise ValueError("amn nk={} inconsistent with nnkp nk={}".format(self.amn.nk, self.nnkp.nk))
+
+        if getattr(self.eig, "nk", None) and self.eig.nk != self.nnkp.nk:


The validation uses getattr(self.eig, "nk", None) which will return None if the attribute doesn't exist, but then checks if it's truthy before comparing. If self.eig.nk is 0 (a valid value), this check will incorrectly skip validation. Consider using hasattr(self.eig, "nk") instead to properly check for attribute existence.

Suggested change

if getattr(self.amn, "nk", None) and self.amn.nk != self.nnkp.nk:

raise ValueError("amn nk={} inconsistent with nnkp nk={}".format(self.amn.nk, self.nnkp.nk))

if getattr(self.eig, "nk", None) and self.eig.nk != self.nnkp.nk:

if hasattr(self.amn, "nk") and self.amn.nk != self.nnkp.nk:

raise ValueError("amn nk={} inconsistent with nnkp nk={}".format(self.amn.nk, self.nnkp.nk))

if hasattr(self.eig, "nk") and self.eig.nk != self.nnkp.nk:

Copilot · 2026-01-01T23:22:53Z

src/symwannier/amn.py

+        sym : Sym, optional
+            Symmetry data. If provided, Amn is symmetrized/expanded accordingly.
+        log : logging.Logger, optional
+            Logger to use; if not provided a module logger is created.


The docstring contains a typo: "Logger to use; if not provided a module logger is created" is missing a comma. It should read "Logger to use; if not provided, a module logger is created" for proper grammar.

Suggested change

Logger to use; if not provided a module logger is created.

Logger to use; if not provided, a module logger is created.

Copilot · 2026-01-01T23:22:53Z

src/symwannier/wannierize.py

+    log_level : int or str, optional
+        Logging level (e.g., logging.DEBUG, logging.INFO, or 'DEBUG', 'INFO').
+    log : logging.Logger, optional
+        Logger to use; if not provided a module logger is created.


The docstring contains a typo: "Logger instance" could be more descriptive. However, more importantly, line 44 has "Logger to use; if not provided a module logger is created" which is missing a comma. It should read "Logger to use; if not provided, a module logger is created" for proper grammar.

Suggested change

Logger to use; if not provided a module logger is created.

Logger to use; if not provided, a module logger is created.

Copilot · 2026-01-01T23:22:53Z

src/symwannier/wannierize.py

+        if getattr(self.amn, "nk", None) and self.amn.nk != self.nnkp.nk:
+            raise ValueError("amn nk={} inconsistent with nnkp nk={}".format(self.amn.nk, self.nnkp.nk))
+
+        if getattr(self.eig, "nk", None) and self.eig.nk != self.nnkp.nk:


The validation uses getattr(self.amn, "nk", None) which will return None if the attribute doesn't exist, but then checks if it's truthy before comparing. If self.amn.nk is 0 (a valid value), this check will incorrectly skip validation. Consider using hasattr(self.amn, "nk") instead to properly check for attribute existence.

Suggested change

if getattr(self.amn, "nk", None) and self.amn.nk != self.nnkp.nk:

raise ValueError("amn nk={} inconsistent with nnkp nk={}".format(self.amn.nk, self.nnkp.nk))

if getattr(self.eig, "nk", None) and self.eig.nk != self.nnkp.nk:

if hasattr(self.amn, "nk") and self.amn.nk is not None and self.amn.nk != self.nnkp.nk:

raise ValueError("amn nk={} inconsistent with nnkp nk={}".format(self.amn.nk, self.nnkp.nk))

if hasattr(self.eig, "nk") and self.eig.nk is not None and self.eig.nk != self.nnkp.nk:

t-koretsune · 2026-01-01T23:30:20Z

@copilot open a new pull request to apply changes based on the comments in this thread

Copilot · 2026-01-01T23:30:28Z

@t-koretsune I've opened a new pull request, #4, to work on those changes. Once the pull request is ready, I'll request review from you.

[WIP] Refactor core module with logging and I/O improvements

t-koretsune added 4 commits January 1, 2026 09:56

add logging and documents

f1d2e1b

refactoring IO

6f0e0f0

modify reading routine

5f12a88

check immn file extension

553a59c

t-koretsune requested a review from Copilot January 1, 2026 23:18

Copilot started reviewing on behalf of t-koretsune January 1, 2026 23:18 View session

Copilot AI reviewed Jan 1, 2026

View reviewed changes

Initial plan

79fd36b

Copilot AI mentioned this pull request Jan 1, 2026

[WIP] Refactor core module with logging and I/O improvements #4

Merged

8 tasks

Merge pull request #4 from wannier-utils-dev/copilot/sub-pr-3

58d646c

[WIP] Refactor core module with logging and I/O improvements

t-koretsune merged commit 781caff into develop Jan 1, 2026

t-koretsune deleted the refactor/core-cleanup branch January 2, 2026 04:52

	flat = np.fromstring("".join(lines[2:]), sep=" ")
	flat_str = " ".join(lines[2:])
	flat = np.fromiter((float(val) for val in flat_str.split()), dtype=float)

	self.log.info("symmetrize Gk diff1 = %.5e", diff)
	self.log.info(f"symmetrize Gk diff1 = {diff:.5e}")

	Base path (without .gz).
	Path to the file, with or without a ``.gz`` extension. The function
	first tries the given path as-is, then falls back to ``path + ".gz"``.

		flat = np.fromstring(" ".join(block_lines), sep=" ")
		data = flat.reshape(self.num_bands, self.num_bands, 2)

-            flat = np.fromstring(" ".join(block_lines), sep=" ")
-            data = flat.reshape(self.num_bands, self.num_bands, 2)
+            # Parse block_lines directly without concatenating into a single large string
+            block_array = np.loadtxt(block_lines)
+            data = block_array.reshape(self.num_bands, self.num_bands, 2)

	Logger to use; if not provided a module logger is created.
	Logger to use; if not provided, a module logger is created.

Conversation

t-koretsune commented Jan 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

t-koretsune commented Jan 1, 2026

Uh oh!

Copilot AI commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants