Skip to content

fix: resolve entity unavailability, stale tokens, and unresponsive controls#109

Merged
cdpuk merged 13 commits intocdpuk:mainfrom
hugo-brito:fix/ignore-unreliable-is-online
Apr 12, 2026
Merged

fix: resolve entity unavailability, stale tokens, and unresponsive controls#109
cdpuk merged 13 commits intocdpuk:mainfrom
hugo-brito:fix/ignore-unreliable-is-online

Conversation

@hugo-brito
Copy link
Copy Markdown
Contributor

@hugo-brito hugo-brito commented Apr 5, 2026

Summary

Four code fixes for V02 (AWS IoT) device support, plus repo maintenance fixes for tests and CI. Addresses persistent entity unavailability, authentication failures, sluggish dashboard controls, and log spam from missing keys.

Files changed: 8 | +365 -34 | 78 tests passing

Code fixes

1. Ignore unreliable is_online flag (entity.py)

The Bestway/Gizwits cloud API frequently reports is_online: false even when the device is functioning normally and controllable via the official app. The API continues returning valid device state data regardless of this flag.

The base BestwayEntity.available property gated all entities on self.bestway_device.is_online, causing all spa entities to become permanently unavailable. The connectivity binary sensor continues to report the raw is_online value as a diagnostic indicator.

Fixes #89, #93, #100

2. Always re-authenticate on startup for AWS IoT (__init__.py)

Stored tokens expire server-side without any expiry field the integration can check. When a stale token existed, the integration skipped authentication and proceeded to refresh_bindings(), which failed with "Token is not authorized", leaving all entities unavailable until the next HA restart.

Now always requests a fresh token on startup. Re-authenticating is a single POST request (cheap).

Related: #86

3. Optimistic state updates for responsive controls (switch.py, climate.py, select.py)

Switch toggles waited for a full API round-trip (2-10s) before reflecting the state change in the UI. If the post-command refresh failed (e.g., temporary API timeout), the UI reverted the toggle, requiring the user to toggle twice.

  • Switches now use optimistic state tracking - the UI updates instantly, then a non-blocking background refresh confirms the actual state.
  • Climate and select entities now use async_request_refresh() (non-blocking, debounced) instead of async_refresh() (blocking).
  • Select (bubbles) was missing a refresh call entirely.

4. Safe attribute access for Tunit key (climate.py)

When the API returns a partial state (e.g., after a timeout or during startup before the first full poll), the Tunit key may be missing from attrs. Using dict indexing caused a KeyError that spammed the logs (~200 occurrences over 4 days). Changed to .get("Tunit", 1) to safely default to Celsius (matching the no-status fallback behavior).

Repo maintenance fixes

5. Defensive async_unload_entry (__init__.py)

Use .get(DOMAIN, {}).pop(entry.entry_id, None) to handle the case where hass.data[DOMAIN] was never populated (e.g., failed setup). Prevents KeyError during unload of partially-initialized entries.

6. Config flow test teardown (test_config_flow.py)

  • Patched both async_setup_entry and async_unload_entry in the bypass_setup_fixture to prevent teardown KeyError when HA manages entries that were never fully initialized.
  • Added verify_cleanup fixture override that joins daemon threads before the upstream thread assertion runs. The upstream verify_cleanup races with _run_safe_shutdown_loop from asyncio's shutdown_default_executor().

7. CI workflow fixes (.github/workflows/test.yaml)

Fixed two copy-paste bugs from the givenergy_local project:

  • branches: [master] -> branches: [main] (push-triggered CI never ran)
  • --cov custom_components.givenergy_local -> --cov custom_components.bestway (coverage measured wrong package)

Testing

Tested on a V02 Airjet spa (UltraFit, product_id T53NN8) over 5 days (April 5-10):

  • All 10 entities remain available across multiple HA restarts
  • Controls (power, filter, jets, bubbles, heater, temperature) respond instantly
  • Zero "Token is not authorized" errors over the monitoring period
  • Connectivity binary sensor correctly reports raw is_online value
  • Tunit KeyError log spam eliminated

Test suite: 78 passed, 0 errors

11 regression tests in test_availability_and_controls.py exercising the actual entity classes:

  • Entity availability with is_online=True, is_online=False, missing device, coordinator failure
  • Switch optimistic state: _assumed_state flag, optimistic turn on, state cleared on coordinator update
  • Climate Tunit safety: Tunit present (Celsius), Tunit=0 (Fahrenheit), Tunit missing (defaults to Celsius), no status (Celsius)

Compatibility

  • V01 (Gizwits) configurations continue to function - available property change applies to the shared base class, and climate/switch changes use the same patterns for both backends.
  • No migration required.

The Bestway/Gizwits cloud API frequently reports is_online as false
even when the device is functioning normally and controllable via the
official app. The API continues returning valid device state data
regardless of this flag.

This causes all spa entities to become permanently unavailable in
Home Assistant despite the integration successfully polling data.

Remove the is_online check from the base entity available property.
The connectivity binary sensor continues to report the raw is_online
value as a diagnostic indicator.

Fixes cdpuk#89, cdpuk#93, cdpuk#100
Stored tokens expire server-side without any expiry field the
integration can check. When a stale token is present, the integration
skipped authentication and proceeded to refresh_bindings(), which
failed with 'Token is not authorized', leaving all entities
unavailable.

Always request a fresh token on startup. Re-authenticating is a
single POST request and avoids silent auth failures.
Switches now update the UI immediately on toggle, then schedule a
non-blocking background refresh. Previously, toggles waited for a
full API round-trip (2-10s) before reflecting the change, causing
the UI to appear unresponsive or 'bounce' the toggle back.

Changes:
- switch.py: Add optimistic state tracking. UI updates instantly,
  cleared when real data arrives from coordinator poll.
- climate.py: Replace blocking async_refresh() with non-blocking
  async_request_refresh() for hvac mode and temperature changes.
- select.py: Add async_request_refresh() after bubbles selection
  (was missing entirely).
When the API returns a partial state (e.g., after a timeout or during
startup before the first full poll), the Tunit key may be missing
from attrs. Using dict indexing causes a KeyError that spams the logs.
Use .get() to safely default to Celsius.
Cover the 4 fixes with unit tests:
- Entity available when is_online=False (the core fix)
- Entity unavailable when device missing or coordinator fails
- Switch optimistic state tracking (_assumed_state, _optimistic_state)
- Switch optimistic state cleared on coordinator update
- Climate Tunit .get() safety with missing/present/zero values

Tests follow existing repo patterns and require the same
pytest-homeassistant-custom-component framework (CI only, not Windows).
_handle_coordinator_update calls async_write_ha_state which requires
a real HA instance. Patch it in the unit test since we only need to
verify the optimistic state is cleared.
test_entity_fixes.py -> test_availability_and_controls.py
@hugo-brito
Copy link
Copy Markdown
Contributor Author

@cdpuk, could you please review?

@cdpuk
Copy link
Copy Markdown
Owner

cdpuk commented Apr 9, 2026

Thanks for the collection of fixes - I don't have a V2 device so very much appreciate community contributions. I'll give this a quick check over the next few days. Could you just take a look at why the pre-commit checks are failing?

- Remove duplicate __init__ in switch.py (ruff F811)
- Remove unused imports in tests (ruff F401)
- Apply ruff formatting to entity.py, __init__.py, tests
test_config_flow: The bypass_setup_fixture patched async_setup_entry
but not async_unload_entry, causing KeyError during teardown. Added
unload patch. The remaining xfail is a daemon thread from asyncio
internals (_run_safe_shutdown_loop) that the test framework rejects -
pre-existing on main, not caused by our changes.

__init__.py: Use .get() in async_unload_entry to handle the case
where hass.data[DOMAIN] was never populated (e.g., failed setup).

test.yaml: Fix CI workflow - branch was 'master' (repo uses 'main'),
and coverage target was 'givenergy_local' (copy-paste from another
project).
.get('Tunit') returned None (falsy) when the key was absent, falling
through to Fahrenheit. Use .get('Tunit', 1) so missing key defaults to
Celsius, matching the no-status fallback behavior.

Rewrote Tunit tests to exercise the actual AirjetV01HydrojetSpaThermostat
entity instead of duplicating the if/else logic inline.
The upstream verify_cleanup fixture races with a daemon thread spawned
by shutdown_default_executor(). Override it in test_config_flow.py to
join lingering threads before teardown assertions. Removes the xfail
marker - test_successful_config_flow now passes deterministically.

78 passed, 0 xfailed.
@hugo-brito
Copy link
Copy Markdown
Contributor Author

@cdpuk no worries. I added a few more fixes and updated the PR description. The precommit check is now passing.

Collapse multi-line constructor calls that fit on one line, matching
ruff v0.15.1 from pre-commit config.
The event_loop fixture was removed in newer pytest-asyncio versions
used by CI. Rewrite verify_cleanup override to take no parameters -
just track threads before/after yield and join any daemon threads
(like _run_safe_shutdown_loop) before the upstream assertion runs.
@cdpuk cdpuk merged commit 1b65f9a into cdpuk:main Apr 12, 2026
3 checks passed
@hugo-brito hugo-brito deleted the fix/ignore-unreliable-is-online branch April 16, 2026 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

is_online entity consistently reports as "false", even though SPA is running, online and connecting through the Bestway App

2 participants