Enhance GUI and benchmarking tools, add wpr for post and scheduling curve has been improved by Hepbmstl · Pull Request #106 · chaobrain/brainevent

Hepbmstl · 2026-03-25T04:02:20Z

This pull request introduces a major performance and scalability improvement to the binary fully-connected matrix-vector (fcnmv) CUDA kernels, specifically optimizing the scatter mode for high fan-out (large n_conn) scenarios. The update adds new warp-per-row (WPR) CUDA kernels, implements an auto-dispatch mechanism to select between thread-per-row (TPR) and WPR based on problem size, and refines the Python interface accordingly. Additionally, there are minor improvements in code organization and initialization.

CUDA kernel improvements:

Added new WPR (warp-per-row) CUDA kernels (_bs_wpr_homo_kern, _bs_wpr_hetero_kern) for scatter mode, which improve performance for large n_conn by parallelizing atomicAdd operations across a warp.
Modified the scatter FFI entry points (binary_fcnmv_scatter_homo, binary_fcnmv_scatter_hetero) to auto-dispatch between TPR and WPR kernels based on a cubic threshold function of n_pre and n_conn, ensuring optimal kernel selection for different matrix sizes. [1] [2]
Registered WPR kernel variants for all supported data types (float32, float64, float16, bfloat16) in the kernel macro instantiations. [1] [2] [3] [4]

Python interface and logic:

Updated the Python kernel selection logic in _binary_fcnmv_cuda_kernel to use new kernel names and ensure the spikes array is always boolean for scatter/gather kernels.
Ensured spike input is consistently cast to boolean in the kernel call path, improving correctness and compatibility.

Documentation and code organization:

Updated kernel documentation in binary_fcnmv.cu to explain the new TPR/WPR auto-dispatch and performance crossover, replacing outdated comments. [1] [2]
Minor code cleanups and initialization improvements in benchmarking tools, including initializing a _tags dictionary in BenchmarkTools.py (renamed from CsvOutput.py).

These changes significantly improve the scalability and efficiency of the binary fcnmv scatter operation, especially for large, high-fan-out neural network layers.

Summary by Sourcery

Introduce warp-per-row CUDA scatter kernels with auto-dispatch, update Python bindings to use boolean spike inputs, and extend benchmarking/GUI tooling for backend-focused analysis and performance boundary exploration.

New Features:

Add warp-per-row scatter CUDA kernels for homogeneous and heterogeneous binary FCNMV operations across all supported data types.
Add tooling to generate VRAM-constrained (scale, connectivity) parameter grids and run boundary-focused COBA binary FCNMV benchmarks.
Introduce an interactive GUI app to visualize performance boundaries and speedups from benchmark CSVs.

Enhancements:

Auto-select between thread-per-row and warp-per-row scatter kernels in the CUDA FFI based on problem size, including early exit for empty inputs.
Ensure spikes are consistently cast to boolean in CUDA and pure-Python FCNMV call paths for scatter/gather kernels.
Improve benchmark result recording with persistent CSV tags and schema-preserving appends, and retarget GUI controls from data type to backend comparisons.

Documentation:

Refresh CUDA kernel documentation to describe the new scatter kernel strategies and performance crossover behavior.

The float-to-bool conversion of the mv operator will proceed after thorough testing

- Implemented `COBA_2005_binary_fcnmv_boundary_CsvOuput.py` for benchmarking post and pre-synaptic connection updates using JAX. - Created `boundary_dis.py` for a GUI application to visualize performance boundaries and speedup analysis with interactive features. - Developed `dev_COBA_binary_fcnmv.py` for benchmarking with various connection probabilities and numbers, enhancing simulation capabilities. - Added error handling and CSV recording functionalities to capture benchmarking results effectively.

sourcery-ai · 2026-03-25T04:02:26Z

Reviewer's Guide

Adds warp-per-row (WPR) CUDA kernels and an auto-dispatch path for binary fcnmv scatter, updates Python FFI/kernel selection to use boolean spikes consistently, and enhances benchmarking/GUI tooling for exploring performance boundaries and backend comparisons.

Sequence diagram for binary fcnmv scatter path with WPR/TPR auto-dispatch

sequenceDiagram
    participant Caller as python_op
    participant KernelSel as _binary_fcnmv_cuda_kernel
    participant JAX as jax_ffi
    participant FFIHomo as binary_fcnmv_scatter_homo
    participant FFIHetero as binary_fcnmv_scatter_hetero
    participant WPRKern as _bs_wpr_*_kern
    participant TPRKern as _bs_tpr_*_kern

    Caller->>KernelSel: call(transpose=True, mode, spikes, weights, indices)
    KernelSel->>KernelSel: build kernel_name using suffix _bool
    KernelSel->>KernelSel: spikes = u.math.asarray(spikes, dtype=bool)
    KernelSel->>JAX: jax.ffi.ffi_call(kernel_name, out_info)
    JAX-->>Caller: returns callable kernel(weights, indices, spikes_bool)

    Caller->>JAX: kernel(weights, indices, spikes_bool)
    JAX->>FFIHomo: binary_fcnmv_scatter_homo(weights, indices, spikes_bool, output, stream)
    JAX->>FFIHetero: binary_fcnmv_scatter_hetero(weights, indices, spikes_bool, output, stream)

    activate FFIHomo
    FFIHomo->>FFIHomo: read n_pre, n_conn, n_post from tensors
    FFIHomo->>FFIHomo: cudaMemsetAsync(output)
    FFIHomo->>FFIHomo: if n_pre == 0 return
    FFIHomo->>FFIHomo: compute bsz = 256
    alt n_conn * 2084000 > n_pre * 1539
        FFIHomo->>FFIHomo: warps_per_block = bsz / 32
        FFIHomo->>FFIHomo: n_blocks_wpr = ceil(n_pre / warps_per_block)
        FFIHomo->>WPRKern: launch _bs_wpr_homo_kern<<<n_blocks_wpr, 256>>>
    else
        FFIHomo->>FFIHomo: n_blocks_tpr = ceil(n_pre / bsz)
        FFIHomo->>TPRKern: launch _bs_tpr_homo_kern<<<n_blocks_tpr, 256>>>
    end
    deactivate FFIHomo

    activate FFIHetero
    FFIHetero->>FFIHetero: same dispatch logic for hetero weights
    alt n_conn * 2084000 > n_pre * 1539
        FFIHetero->>WPRKern: launch _bs_wpr_hetero_kern
    else
        FFIHetero->>TPRKern: launch _bs_tpr_hetero_kern
    end
    deactivate FFIHetero

    WPRKern-->>JAX: complete scatter accumulation
    TPRKern-->>JAX: complete scatter accumulation
    JAX-->>Caller: updated output tensor

Sequence diagram for CSV_record tagging and row enrichment

sequenceDiagram
    actor Dev as benchmark_script
    participant BT as BenchmarkToolsModule
    participant CSV as CSV_record

    Dev->>BT: csv_recorder = CSV_record(...)
    Dev->>CSV: add_tag("warp_or_thread", "tpr")
    CSV->>CSV: _tags["warp_or_thread"] = "tpr"
    CSV->>CSV: ensure fieldnames contains tag key

    loop for each measurement
        Dev->>CSV: add_row(raw_row)
        CSV->>CSV: merged_row = dict(_tags)
        CSV->>CSV: merged_row.update(raw_row)
        CSV->>CSV: extend fieldnames with keys from merged_row
        CSV->>CSV: rows.append(merged_row)
    end

    Dev->>CSV: record_finish(run_tag)
    CSV->>CSV: _write_csv(file_name, rows, fieldnames, mode)
    CSV-->>Dev: CSV file with tag columns merged into all rows

Class diagram for CSV_record benchmarking helper and PerformanceBoundaryApp GUI

classDiagram
    class CSV_record {
        +str name
        +str suffix
        +Path output_dir
        +bool append
        +list fieldnames
        +list rows
        +dict _tags
        +CSV_record(name, operator, suite, duration, conn)
        +_write_csv(file_name, rows, fieldnames, mode)
        +add_tag(tag_name, tag_value)
        +add_row(row)
        +single_COBA_data_add(operator, data_type, backend, mode, conn_num, scale, elapsed, rate, duration, homo)
        +print_header(operator, data_type, backend, mode, conn_num, duration, homo, prob)
        +print_table_header(show_conn)
        +print_row(scale, neuron_count, elapsed, rate, conn_num)
        +record_finish(tag)
    }

    class BenchmarkToolsModule {
        +generate_params(dis_type, _N, limit_gb, target_samples, data_size, scale_max, conn_max) list
        +memory_limit(conn_nums, scale, _N, limit, data_type) bool
    }

    class PerformanceBoundaryApp {
        -Tk root
        -DataFrame df
        -dict comboboxes
        -Notebook notebook
        +PerformanceBoundaryApp(root)
        +load_data()
        +update_plots()
        +export_image()
        +_setup_ui()
        +_on_compare_field_changed(event)
        +_rebuild_filter_row()
        +_on_auto_speedup_changed()
        +_subtitle(include_baseline) str
        +_render_scatter(tab, x, y, z, _N, limit_gb, x_min, x_max, y_max, z_label, cmap_name, subtitle)
        +_render_interpolation(tab, x, y, z, _N, limit_gb, x_min, x_max, y_max, z_label, cmap_name, subtitle)
        +_render_speedup_interp(tab, grid_x, grid_y, grid_z_masked, _N, limit_gb, x_min, x_max, y_max, subtitle)
        +_draw_boundaries(ax, _N, limit_gb, x_min, x_max)
        +_draw_contours_and_labels(ax, grid_x, grid_y, grid_z_masked, z_pts)
        +_draw_custom_contours(ax, grid_x, grid_y, grid_z_masked)
    }

    class COBABenchmarkScripts {
        +benchmark_post_conn(conn_num, conn_prob, data_type, duration, homo, backend, probs_or_conn, _N)
        +benchmark_pre_conn(conn_num, conn_prob, data_type, duration, homo, backend, probs_or_conn, _N)
    }

    BenchmarkToolsModule --> CSV_record : uses
    COBABenchmarkScripts --> CSV_record : instantiates
    COBABenchmarkScripts --> BenchmarkToolsModule : calls
    PerformanceBoundaryApp --> BenchmarkToolsModule : consumes CSV output

File-Level Changes

Change	Details	Files
Introduce WPR scatter CUDA kernels and auto-dispatch between TPR and WPR based on (n_pre, n_conn).	Add warp-per-row homo/hetero scatter kernels that distribute atomicAdd work across warp lanes. Instantiate WPR kernels for all supported dtypes alongside existing TPR variants. Change scatter FFI wrappers to zero outputs, guard n_pre==0, and choose WPR vs TPR via a cubic threshold function of n_pre and n_conn. Update kernel documentation to describe TPR vs WPR behavior and crossover as a function of n_pre and n_conn.	`brainevent/_fcn/binary_fcnmv.cu`
Make Python binary fcnmv interface always use boolean spikes and updated kernel names.	Switch scatter/gather kernel_name construction to use the `_bool` CUDA entry points regardless of original spike dtype. Ensure spikes are cast to boolean before JAX FFI calls in both the fast CUDA path and the generic path. Adjust comments to reflect new auto-dispatch and boolean spike semantics.	`brainevent/_fcn/binary.py`
Refactor and extend benchmarking CSV utilities to support tags, schema evolution, and memory-bounded parameter generation.	Rename CsvOutput module to BenchmarkTools and initialize an internal `_tags` dict on CSV_record instances. Make CSV writing robust to schema evolution by merging existing and new fieldnames when appending and using a default rest value. Add tag support so persistent metadata columns are automatically merged into subsequent rows. Provide helpers for generating valid (scale, conn_num) pairs under VRAM limits and a reusable memory_limit check.	`dev/fcn/BenchmarkTools.py`
Update COBA benchmarking scripts to use the new BenchmarkTools utilities and revised experiment configs.	Replace imports of the old CsvOutput module with dev.fcn.BenchmarkTools and use its CSV_record and memory_limit utilities. Adjust benchmark parameter grids (scales, conn_nums, probs), default main entrypoints, and output names for new experiments. Add new development/benchmark drivers for binary fcnmv, including VRAM boundary sweeps and Nsight-focused runs that use generate_params and tags.	`dev/fcn/COBA_2005_binary_fcnmv_CsvOuput.py` `dev/fcn/COBA_2005_binary_fcnmm_CsvOuput.py` `dev/fcn/COBA_2005_bitpack_binary_fcnmm_CsvOuput.py` `dev/fcn/COBA_2005_compact_binary_fcnmm_CsvOuput.py` `dev/fcn/dev_COBA_binary_fcnmv.py` `dev/fcn/COBA_2005_binary_fcnmv_boundary_CsvOuput.py`
Retarget the benchmarking GUI to compare backends instead of data types and align messages/filters accordingly.	Change default color/target selectors in latency/COBA tabs from `data_type` to `backend`. Use backend values for baseline/target selection in speedup and COBA heatmaps and update corresponding error messages. Remove data_type from heatmap filter skip set and ensure backend is treated as the primary comparison axis.	`dev/fcn/gui.py`
Add an interactive boundary and speedup visualization GUI for sparse matrix performance analysis.	Introduce PerformanceBoundaryApp, a Tkinter/matplotlib tool that loads CSVs and visualizes elapsed time and speedup over (scale, conn_num). Support raw scatter, interpolated elapsed time, and interpolated baseline/target speedup views with VRAM/feasibility boundaries and contour lines. Implement flexible filtering, comparison-field selection, hover tooltips, and export-to-image functionality.	`dev/fcn/boundary_dis.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

The WPR/TPR auto-dispatch threshold for scatter ((int64_t)n_conn * 2084000 > (int64_t)n_pre * 1539) is currently inlined as magic numbers inside the FFI macros; consider extracting this into a small helper or named constants with a short comment (e.g. referring to the fit from boundary_dis.py) so it’s easier to adjust and reason about across architectures.
In _binary_fcnmv_cuda_kernel the non-TPR path now always targets _bool kernels and casts spikes to boolean, but there is still leftover code computing spk_f in the original dtype that is immediately overwritten; it would be clearer to remove the dead path and centralize the boolean casting in one place so the semantics of spikes for both gather and scatter are unambiguous.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The WPR/TPR auto-dispatch threshold for scatter (`(int64_t)n_conn * 2084000 > (int64_t)n_pre * 1539`) is currently inlined as magic numbers inside the FFI macros; consider extracting this into a small helper or named constants with a short comment (e.g. referring to the fit from `boundary_dis.py`) so it’s easier to adjust and reason about across architectures.
- In `_binary_fcnmv_cuda_kernel` the non-TPR path now always targets `_bool` kernels and casts `spikes` to boolean, but there is still leftover code computing `spk_f` in the original dtype that is immediately overwritten; it would be clearer to remove the dead path and centralize the boolean casting in one place so the semantics of `spikes` for both gather and scatter are unambiguous.

## Individual Comments

### Comment 1
<location path="brainevent/_fcn/binary_fcnmv.cu" line_range="184" />
<code_context>
+    if (row >= n_pre) return;                                                                 \
+    if (!IS_ACTIVE(__ldg(&spikes[row]))) return;                                              \
+    const int32_t* i_row = indices + (size_t)row * n_conn;                                    \
+    ACC_T w0 = READ_W(weights[0]);                                                            \
+    for (int k = lane; k < n_conn; k += 32) {                                                 \
+        int idx = __ldg(&i_row[k]);                                                           \
</code_context>
<issue_to_address>
**issue (bug_risk):** Homogeneous WPR kernel uses weights[0] for all rows instead of weights[row].

`w0` is loaded as `READ_W(weights[0])`, so all rows share the same weight instead of using their own. Since `row = tid / 32`, this should likely be `READ_W(weights[row])` to keep per-row homogeneous weights; otherwise rows that need different weights will all use `weights[0]`, corrupting the results.
</issue_to_address>

### Comment 2
<location path="dev/fcn/dev_COBA_binary_fcnmv.py" line_range="103" />
<code_context>
+                        t1 = time.time()
+                        elapsed = t1 - t0
+                        csv_recorder.print_row(s, n, elapsed, float(rate))
+                        csv_recorder.single_COBA_data_add('fcnmv', data_type, back, 'post', cn, s, elapsed, float(rate), duration, homo=('homo' if homo else 'hetero'))
+
+                    except Exception as e:
</code_context>
<issue_to_address>
**issue (bug_risk):** NameError risk: `cn` is used in the prob-based branch where only `actual_conn_num` is defined.

In the `probs_or_conn != 'conn'` branch of `benchmark_post_conn`, within the `for s in scales:` loop, `single_COBA_data_add` is called with `cn`, which is not defined there. This will raise a `NameError` at runtime; it should likely use `actual_conn_num` instead.
</issue_to_address>

### Comment 3
<location path="brainevent/_fcn/binary.py" line_range="304" />
<code_context>
         else:
             spk_f = (spikes > 0).astype(weights.dtype)

+        spk_f = u.math.asarray(spikes, dtype=bool)

         if transpose:
</code_context>
<issue_to_address>
**question (bug_risk):** Overwriting `spk_f` with a bool array may break expected numeric behavior in the non-CUDA path.

In the CPU path, `spk_f` is first computed with the correct numeric dtype (`weights.dtype`), then immediately replaced by a boolean array via `u.math.asarray(spikes, dtype=bool)`. This negates the prior dtype handling and may change semantics (e.g., when multiplying by `weights` or relying on numeric values) or add implicit casts. If the CPU path should operate on booleans, the earlier dtype-specific logic can be removed; otherwise, this new assignment should be removed or use a separate variable instead of overwriting `spk_f`.
</issue_to_address>

### Comment 4
<location path="dev/fcn/boundary_dis.py" line_range="232" />
<code_context>
+            parts.append(f"Baseline [{cmp_field}]: {self.combo_baseline.get()}")
+        return "  │  ".join(parts)
+
+    def update_plots(self):
+        if self.df is None or self.df.empty:
+            return
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the repeated grid construction, axis styling, hover handling, speedup bounds, and filter-mask logic into small helper methods to shorten the main plotting methods and make the code easier to follow and test.

One targeted way to reduce complexity here without changing behavior is to extract some of the duplicated logic into small helpers. That keeps `PerformanceBoundaryApp` intact but makes the core methods shorter and easier to follow.

### 1. Centralize grid + validity computation

`update_plots` and `_render_interpolation` both build `grid_x/y`, run `griddata`, and apply the `_N` / `limit_gb` constraints. You can encapsulate that in a helper:

```python
def _make_valid_grid(self, x, y, z, _N, limit_gb, x_max, y_max):
    grid_x, grid_y = np.mgrid[0 : x_max * 1.1 : 200j, 0 : y_max * 1.2 : 200j]
    grid_z = griddata((x, y), z, (grid_x, grid_y), method='linear')

    limit_bytes = limit_gb * (1024 ** 3)
    valid = (grid_y <= grid_x * _N) & (grid_y * grid_x * 8 * _N <= limit_bytes)
    grid_z_masked = np.where(valid, grid_z, np.nan)

    return grid_x, grid_y, grid_z_masked
```

Usage:

```python
# in _render_interpolation
grid_x, grid_y, grid_z_masked = self._make_valid_grid(x, y, z, _N, limit_gb, x_max, y_max)
im = ax.pcolormesh(grid_x, grid_y, grid_z_masked, shading='auto', cmap=cmap_name)
...

# in update_plots speedup section (after computing grid_sp)
grid_x, grid_y, grid_sp_masked = self._make_valid_grid(
    xt, yt, grid_sp, _N, limit_gb, x_max, y_max
)
self._render_speedup_interp(self.tabs["Speedup Interp"], grid_x, grid_y, grid_sp_masked,
                            subtitle=sub_sp, **kw_common)
```

This removes the duplicated validity logic and ensures boundary conditions stay consistent.

### 2. Unify common axis styling

`_render_scatter`, `_render_interpolation`, and `_render_speedup_interp` repeat axis labels, limits, grid, and title construction.

Extract a simple `_style_axes` helper:

```python
def _style_axes(self, ax, title, subtitle, _N, x_min, x_max, y_max):
    ax.set_title(f"{title}\n{subtitle}", fontsize=10, loc='left', pad=8)
    ax.set_xlabel(f"Scale  (N = {_N:,} elements per scale unit)")
    ax.set_ylabel("Connection Number  (synapses / neuron)")
    ax.set_ylim(max(y_max * 1.2, 100), 0)
    ax.set_xlim(0, x_max * 1.1)
    ax.grid(True, linestyle='--', alpha=0.3)
```

Then in each render method:

```python
# _render_scatter
self._draw_boundaries(ax, _N, limit_gb, x_min, x_max)
self._style_axes(ax, title, subtitle, _N, x_min, x_max, y_max)

# _render_interpolation
self._draw_boundaries(ax, _N, limit_gb, x_min, x_max)
self._style_axes(ax, title, subtitle, _N, x_min, x_max, y_max)

# _render_speedup_interp
self._draw_boundaries(ax, _N, limit_gb, x_min, x_max)
self._style_axes(ax, title, subtitle, _N, x_min, x_max, y_max)
```

If you want the scatter grid to be a bit darker (`alpha=0.5`) you can keep that as a parameter:

```python
def _style_axes(self, ax, title, subtitle, _N, x_min, x_max, y_max, grid_alpha=0.3):
    ...
    ax.grid(True, linestyle='--', alpha=grid_alpha)
```

### 3. Generalize hover setup

`_setup_scatter_hover` and `_setup_interp_hover` are structurally identical except for how the value is resolved. You can factor them into a single `_setup_hover` and pass in a resolver:

```python
def _setup_hover(self, tab, z_label, resolver):
    canvas, ax = tab['canvas'], tab['ax']
    annot = ax.annotate(
        "", xy=(0, 0), xytext=(15, 15), textcoords="offset points",
        bbox=dict(boxstyle="round,pad=0.4", fc="lightyellow", ec="gray", alpha=0.92),
        fontsize=8, visible=False, zorder=10,
    )

    def on_hover(event):
        if event.inaxes != ax or event.xdata is None or event.ydata is None:
            if annot.get_visible():
                annot.set_visible(False)
                canvas.draw_idle()
            return
        resolved = resolver(event)
        if resolved is None:
            if annot.get_visible():
                annot.set_visible(False)
        else:
            x, y, val, outside = resolved
            annot.xy = (x, y)
            if outside:
                annot.set_text(
                    f"Scale:  {x:.4g}\n"
                    f"Conn:   {y:.0f}\n"
                    "[outside valid domain]"
                )
            else:
                annot.set_text(
                    f"Scale:  {x:.4g}\n"
                    f"Conn:   {y:.0f}\n"
                    f"{z_label}:  {val:.4g}"
                )
            annot.set_visible(True)
        canvas.draw_idle()

    tab['hover_cid'] = canvas.mpl_connect("motion_notify_event", on_hover)
```

Resolvers:

```python
def _setup_scatter_hover(self, tab, x, y, z, z_label):
    sc = tab['scatter']
    xc, yc, zc = x.copy(), y.copy(), z.copy()

    def resolver(event):
        cont, ind = sc.contains(event)
        if not cont:
            return None
        i = ind["ind"][0]
        return xc[i], yc[i], zc[i], False

    self._setup_hover(tab, z_label, resolver)

def _setup_interp_hover(self, tab, grid_x, grid_y, grid_z, z_label):
    gx_col = grid_x[:, 0].copy()
    gy_row = grid_y[0, :].copy()
    gz = grid_z.copy()

    def resolver(event):
        ix = int(np.argmin(np.abs(gx_col - event.xdata)))
        iy = int(np.argmin(np.abs(gy_row - event.ydata)))
        val = gz[ix, iy]
        outside = np.isnan(val)
        return event.xdata, event.ydata, val if not outside else np.nan, outside

    self._setup_hover(tab, z_label, resolver)
```

This keeps behavior but consolidates the shared annotation / event wiring code.

### 4. Extract speedup color bounds logic

`_render_speedup_interp` mixes UI state reading, defaults, and constraints. Extract that into a small helper so `_render_speedup_interp` focuses on rendering:

```python
def _compute_speedup_bounds(self, z_finite):
    if self.var_auto_speedup.get():
        if len(z_finite) > 0:
            yellow_min = min(float(z_finite.min()), 0.5)
            blue_max   = max(float(z_finite.max()), 1.5)
        else:
            yellow_min, blue_max = 0.5, 1.5
    else:
        try:
            yellow_min = float(self.entry_yellow_depth.get())
            blue_max   = float(self.entry_blue_depth.get())
        except ValueError:
            yellow_min, blue_max = 0.5, 2.0

    yellow_min = min(yellow_min, 0.9999)  # must be < 1.0
    blue_max   = max(blue_max,   1.0001)  # must be > 1.0
    if yellow_min >= blue_max:
        yellow_min, blue_max = 0.5, 2.0
    return yellow_min, blue_max
```

Then:

```python
z_finite = grid_z_masked[np.isfinite(grid_z_masked)]
yellow_min, blue_max = self._compute_speedup_bounds(z_finite)
grid_sp_clipped = np.where(
    np.isfinite(grid_z_masked),
    np.clip(grid_z_masked, yellow_min, blue_max),
    np.nan,
)
...
```

### 5. Decouple filter mask building from widgets

The filtering logic inside `update_plots` is simple but could be isolated to make `update_plots` shorter and more testable:

```python
def _build_filter_mask(self, df):
    mask = pd.Series(True, index=df.index)
    for col, cb in self.comboboxes.items():
        val = cb.get()
        if val:
            mask &= (df[col].astype(str) == str(val))
    return mask
```

Usage:

```python
df_f = self.df[self._build_filter_mask(self.df)]
```

These small extractions should significantly reduce the cognitive load of `update_plots` and the render methods, while preserving all existing behavior and keeping your current class structure.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-03-25T04:04:14Z

brainevent/_fcn/binary_fcnmv.cu

+    if (row >= n_pre) return;                                                                 \
+    if (!IS_ACTIVE(__ldg(&spikes[row]))) return;                                              \
+    const int32_t* i_row = indices + (size_t)row * n_conn;                                    \
+    ACC_T w0 = READ_W(weights[0]);                                                            \


issue (bug_risk): Homogeneous WPR kernel uses weights[0] for all rows instead of weights[row].

w0 is loaded as READ_W(weights[0]), so all rows share the same weight instead of using their own. Since row = tid / 32, this should likely be READ_W(weights[row]) to keep per-row homogeneous weights; otherwise rows that need different weights will all use weights[0], corrupting the results.

sourcery-ai · 2026-03-25T04:04:14Z

dev/fcn/dev_COBA_binary_fcnmv.py

+                        t1 = time.time()
+                        elapsed = t1 - t0
+                        csv_recorder.print_row(s, n, elapsed, float(rate))
+                        csv_recorder.single_COBA_data_add('fcnmv', data_type, back, 'post', cn, s, elapsed, float(rate), duration, homo=('homo' if homo else 'hetero'))


issue (bug_risk): NameError risk: cn is used in the prob-based branch where only actual_conn_num is defined.

In the probs_or_conn != 'conn' branch of benchmark_post_conn, within the for s in scales: loop, single_COBA_data_add is called with cn, which is not defined there. This will raise a NameError at runtime; it should likely use actual_conn_num instead.

sourcery-ai · 2026-03-25T04:04:14Z

brainevent/_fcn/binary.py

        else:
            spk_f = (spikes > 0).astype(weights.dtype)

+        spk_f = u.math.asarray(spikes, dtype=bool)


question (bug_risk): Overwriting spk_f with a bool array may break expected numeric behavior in the non-CUDA path.

In the CPU path, spk_f is first computed with the correct numeric dtype (weights.dtype), then immediately replaced by a boolean array via u.math.asarray(spikes, dtype=bool). This negates the prior dtype handling and may change semantics (e.g., when multiplying by weights or relying on numeric values) or add implicit casts. If the CPU path should operate on booleans, the earlier dtype-specific logic can be removed; otherwise, this new assignment should be removed or use a separate variable instead of overwriting spk_f.

dev/fcn/boundary_dis.py

…pact, Enhance the boundary dis

Hepbmstl added 16 commits March 12, 2026 21:51

update gui and csv record

10484eb

Update for recorder and COBA test

1ab1f62

update_homo

065c8d8

temp save

866d503

Align to main

22a37fc

Replace unbranch with wpr

e0d1ccb

temp save

84b22c0

CsvOutput and fixed coba benchmark.

c2515ee

The float-to-bool conversion of the mv operator will proceed after thorough testing

align to ori_main

e27924c

fix_little_bug

0ee0f95

Can choose the sparsity level to closer to the memory boundary

c6e98ed

Merge remote-tracking branch 'origin/main' into dev-fcn-optimizing

f762ef2

Update fcnmv homo type scatter

fad5449

Check the boolean type

c1f1999

Restructure the file structure

e9b95e7

sourcery-ai bot reviewed Mar 25, 2026

View reviewed changes

Add wpr for compact-post, Add large scale test for pytest-binary, com…

bff0dfe

…pact, Enhance the boundary dis

chaoming0625 merged commit d70fc10 into main Mar 30, 2026
7 checks passed

chaoming0625 deleted the dev-fcn-optimizing-3.21 branch March 30, 2026 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance GUI and benchmarking tools, add wpr for post and scheduling curve has been improved#106

Enhance GUI and benchmarking tools, add wpr for post and scheduling curve has been improved#106
chaoming0625 merged 17 commits intomainfrom
dev-fcn-optimizing-3.21

Hepbmstl commented Mar 25, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Mar 25, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Mar 25, 2026

Uh oh!

sourcery-ai bot Mar 25, 2026

Uh oh!

sourcery-ai bot Mar 25, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Hepbmstl commented Mar 25, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for binary fcnmv scatter path with WPR/TPR auto-dispatch

Sequence diagram for CSV_record tagging and row enrichment

Class diagram for CSV_record benchmarking helper and PerformanceBoundaryApp GUI

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hepbmstl commented Mar 25, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Mar 25, 2026 •

edited

Loading