fix: resolve entity unavailability, stale tokens, and unresponsive controls by hugo-brito · Pull Request #109 · cdpuk/ha-bestway

hugo-brito · 2026-04-05T20:49:25Z

Summary

Four code fixes for V02 (AWS IoT) device support, plus repo maintenance fixes for tests and CI. Addresses persistent entity unavailability, authentication failures, sluggish dashboard controls, and log spam from missing keys.

Files changed: 8 | +365 -34 | 78 tests passing

Code fixes

1. Ignore unreliable `is_online` flag (`entity.py`)

The Bestway/Gizwits cloud API frequently reports is_online: false even when the device is functioning normally and controllable via the official app. The API continues returning valid device state data regardless of this flag.

The base BestwayEntity.available property gated all entities on self.bestway_device.is_online, causing all spa entities to become permanently unavailable. The connectivity binary sensor continues to report the raw is_online value as a diagnostic indicator.

Fixes #89, #93, #100

2. Always re-authenticate on startup for AWS IoT (`init.py`)

Stored tokens expire server-side without any expiry field the integration can check. When a stale token existed, the integration skipped authentication and proceeded to refresh_bindings(), which failed with "Token is not authorized", leaving all entities unavailable until the next HA restart.

Now always requests a fresh token on startup. Re-authenticating is a single POST request (cheap).

Related: #86

3. Optimistic state updates for responsive controls (`switch.py`, `climate.py`, `select.py`)

Switch toggles waited for a full API round-trip (2-10s) before reflecting the state change in the UI. If the post-command refresh failed (e.g., temporary API timeout), the UI reverted the toggle, requiring the user to toggle twice.

Switches now use optimistic state tracking - the UI updates instantly, then a non-blocking background refresh confirms the actual state.
Climate and select entities now use async_request_refresh() (non-blocking, debounced) instead of async_refresh() (blocking).
Select (bubbles) was missing a refresh call entirely.

4. Safe attribute access for `Tunit` key (`climate.py`)

When the API returns a partial state (e.g., after a timeout or during startup before the first full poll), the Tunit key may be missing from attrs. Using dict indexing caused a KeyError that spammed the logs (~200 occurrences over 4 days). Changed to .get("Tunit", 1) to safely default to Celsius (matching the no-status fallback behavior).

Repo maintenance fixes

5. Defensive `async_unload_entry` (`init.py`)

Use .get(DOMAIN, {}).pop(entry.entry_id, None) to handle the case where hass.data[DOMAIN] was never populated (e.g., failed setup). Prevents KeyError during unload of partially-initialized entries.

6. Config flow test teardown (`test_config_flow.py`)

Patched both async_setup_entry and async_unload_entry in the bypass_setup_fixture to prevent teardown KeyError when HA manages entries that were never fully initialized.
Added verify_cleanup fixture override that joins daemon threads before the upstream thread assertion runs. The upstream verify_cleanup races with _run_safe_shutdown_loop from asyncio's shutdown_default_executor().

7. CI workflow fixes (`.github/workflows/test.yaml`)

Fixed two copy-paste bugs from the givenergy_local project:

branches: [master] -> branches: [main] (push-triggered CI never ran)
--cov custom_components.givenergy_local -> --cov custom_components.bestway (coverage measured wrong package)

Testing

Tested on a V02 Airjet spa (UltraFit, product_id T53NN8) over 5 days (April 5-10):

All 10 entities remain available across multiple HA restarts
Controls (power, filter, jets, bubbles, heater, temperature) respond instantly
Zero "Token is not authorized" errors over the monitoring period
Connectivity binary sensor correctly reports raw is_online value
Tunit KeyError log spam eliminated

Test suite: 78 passed, 0 errors

11 regression tests in test_availability_and_controls.py exercising the actual entity classes:

Entity availability with is_online=True, is_online=False, missing device, coordinator failure
Switch optimistic state: _assumed_state flag, optimistic turn on, state cleared on coordinator update
Climate Tunit safety: Tunit present (Celsius), Tunit=0 (Fahrenheit), Tunit missing (defaults to Celsius), no status (Celsius)

Compatibility

V01 (Gizwits) configurations continue to function - available property change applies to the shared base class, and climate/switch changes use the same patterns for both backends.
No migration required.

The Bestway/Gizwits cloud API frequently reports is_online as false even when the device is functioning normally and controllable via the official app. The API continues returning valid device state data regardless of this flag. This causes all spa entities to become permanently unavailable in Home Assistant despite the integration successfully polling data. Remove the is_online check from the base entity available property. The connectivity binary sensor continues to report the raw is_online value as a diagnostic indicator. Fixes cdpuk#89, cdpuk#93, cdpuk#100

Stored tokens expire server-side without any expiry field the integration can check. When a stale token is present, the integration skipped authentication and proceeded to refresh_bindings(), which failed with 'Token is not authorized', leaving all entities unavailable. Always request a fresh token on startup. Re-authenticating is a single POST request and avoids silent auth failures.

Switches now update the UI immediately on toggle, then schedule a non-blocking background refresh. Previously, toggles waited for a full API round-trip (2-10s) before reflecting the change, causing the UI to appear unresponsive or 'bounce' the toggle back. Changes: - switch.py: Add optimistic state tracking. UI updates instantly, cleared when real data arrives from coordinator poll. - climate.py: Replace blocking async_refresh() with non-blocking async_request_refresh() for hvac mode and temperature changes. - select.py: Add async_request_refresh() after bubbles selection (was missing entirely).

When the API returns a partial state (e.g., after a timeout or during startup before the first full poll), the Tunit key may be missing from attrs. Using dict indexing causes a KeyError that spams the logs. Use .get() to safely default to Celsius.

Cover the 4 fixes with unit tests: - Entity available when is_online=False (the core fix) - Entity unavailable when device missing or coordinator fails - Switch optimistic state tracking (_assumed_state, _optimistic_state) - Switch optimistic state cleared on coordinator update - Climate Tunit .get() safety with missing/present/zero values Tests follow existing repo patterns and require the same pytest-homeassistant-custom-component framework (CI only, not Windows).

_handle_coordinator_update calls async_write_ha_state which requires a real HA instance. Patch it in the unit test since we only need to verify the optimistic state is cleared.

test_entity_fixes.py -> test_availability_and_controls.py

hugo-brito · 2026-04-09T17:14:10Z

@cdpuk, could you please review?

cdpuk · 2026-04-09T20:42:45Z

Thanks for the collection of fixes - I don't have a V2 device so very much appreciate community contributions. I'll give this a quick check over the next few days. Could you just take a look at why the pre-commit checks are failing?

- Remove duplicate __init__ in switch.py (ruff F811) - Remove unused imports in tests (ruff F401) - Apply ruff formatting to entity.py, __init__.py, tests

test_config_flow: The bypass_setup_fixture patched async_setup_entry but not async_unload_entry, causing KeyError during teardown. Added unload patch. The remaining xfail is a daemon thread from asyncio internals (_run_safe_shutdown_loop) that the test framework rejects - pre-existing on main, not caused by our changes. __init__.py: Use .get() in async_unload_entry to handle the case where hass.data[DOMAIN] was never populated (e.g., failed setup). test.yaml: Fix CI workflow - branch was 'master' (repo uses 'main'), and coverage target was 'givenergy_local' (copy-paste from another project).

.get('Tunit') returned None (falsy) when the key was absent, falling through to Fahrenheit. Use .get('Tunit', 1) so missing key defaults to Celsius, matching the no-status fallback behavior. Rewrote Tunit tests to exercise the actual AirjetV01HydrojetSpaThermostat entity instead of duplicating the if/else logic inline.

The upstream verify_cleanup fixture races with a daemon thread spawned by shutdown_default_executor(). Override it in test_config_flow.py to join lingering threads before teardown assertions. Removes the xfail marker - test_successful_config_flow now passes deterministically. 78 passed, 0 xfailed.

hugo-brito · 2026-04-10T19:18:27Z

@cdpuk no worries. I added a few more fixes and updated the PR description. The precommit check is now passing.

Collapse multi-line constructor calls that fit on one line, matching ruff v0.15.1 from pre-commit config.

The event_loop fixture was removed in newer pytest-asyncio versions used by CI. Rewrite verify_cleanup override to take no parameters - just track threads before/after yield and join any daemon threads (like _run_safe_shutdown_loop) before the upstream assertion runs.

hugo-brito added 7 commits April 5, 2026 11:41

test: fix coordinator update test for unit test context

6896ea4

_handle_coordinator_update calls async_write_ha_state which requires a real HA instance. Patch it in the unit test since we only need to verify the optimistic state is cleared.

test: rename test file to describe what is tested

8cdd347

test_entity_fixes.py -> test_availability_and_controls.py

hugo-brito added 4 commits April 10, 2026 18:53

fix: resolve pre-commit CI failures

d103367

- Remove duplicate __init__ in switch.py (ruff F811) - Remove unused imports in tests (ruff F401) - Apply ruff formatting to entity.py, __init__.py, tests

hugo-brito added 2 commits April 10, 2026 21:30

style: apply ruff format to test helpers

4a59710

Collapse multi-line constructor calls that fit on one line, matching ruff v0.15.1 from pre-commit config.

cdpuk approved these changes Apr 12, 2026

View reviewed changes

cdpuk merged commit 1b65f9a into cdpuk:main Apr 12, 2026
3 checks passed

This was referenced Apr 12, 2026

All spa entities become unavailable ~every 5m even though sensor always shows connected #93

Closed

Again: Bestway API sending data, but device & entities not available #100

Closed

hugo-brito deleted the fix/ignore-unreliable-is-online branch April 16, 2026 08:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve entity unavailability, stale tokens, and unresponsive controls#109

fix: resolve entity unavailability, stale tokens, and unresponsive controls#109
cdpuk merged 13 commits intocdpuk:mainfrom
hugo-brito:fix/ignore-unreliable-is-online

hugo-brito commented Apr 5, 2026 •

edited

Loading

Uh oh!

hugo-brito commented Apr 9, 2026

Uh oh!

cdpuk commented Apr 9, 2026

Uh oh!

hugo-brito commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hugo-brito commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Code fixes

1. Ignore unreliable is_online flag (entity.py)

2. Always re-authenticate on startup for AWS IoT (__init__.py)

3. Optimistic state updates for responsive controls (switch.py, climate.py, select.py)

4. Safe attribute access for Tunit key (climate.py)

Repo maintenance fixes

5. Defensive async_unload_entry (__init__.py)

6. Config flow test teardown (test_config_flow.py)

7. CI workflow fixes (.github/workflows/test.yaml)

Testing

Test suite: 78 passed, 0 errors

Compatibility

Uh oh!

hugo-brito commented Apr 9, 2026

Uh oh!

cdpuk commented Apr 9, 2026

Uh oh!

hugo-brito commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hugo-brito commented Apr 5, 2026 •

edited

Loading

1. Ignore unreliable `is_online` flag (`entity.py`)

2. Always re-authenticate on startup for AWS IoT (`init.py`)

3. Optimistic state updates for responsive controls (`switch.py`, `climate.py`, `select.py`)

4. Safe attribute access for `Tunit` key (`climate.py`)

5. Defensive `async_unload_entry` (`init.py`)

6. Config flow test teardown (`test_config_flow.py`)

7. CI workflow fixes (`.github/workflows/test.yaml`)