Skip to content

feat: add user-hs-resolver task#765

Merged
ok300 merged 34 commits intopubky:feat/dx-events-by-userfrom
ok300:ok300-user-hs-resolver
Apr 3, 2026
Merged

feat: add user-hs-resolver task#765
ok300 merged 34 commits intopubky:feat/dx-events-by-userfrom
ok300:ok300-user-hs-resolver

Conversation

@ok300
Copy link
Copy Markdown
Contributor

@ok300 ok300 commented Mar 12, 2026

This PR adds a 3rd periodic task called user-hs-resolver. It periodically loops over known users, resolves their HS PK as configured in their PKDNS record, and persists this mapping.

It exposes a utility

/// Returns all user IDs hosted on a given homeserver.
pub async fn get_user_ids_by_homeserver(hs_id: &str) -> Result<Vec<String>, DynError> { ... }

which can be integrated in EventProcessor::poll_events to fetch the events of all users in a given HS.

Builds on top of / targets the feature branch of #726 .


This PR brings the following related changes:

  • differentiate between active and orphan HSs: active ones have active users that point to it (HOSTED_BY)
    • external homeservers are sorted by number of active users, so that most-used external HSs are prioritized
    • external homeservers query skips orphan HSs, so the event processor doesn't even deal with such homeservers until at least one user points to them
  • circuit breaker to identify and (temporarily) backoff homeservers that may be slow or offline
  • configurable fields for user_hs_resolver scheduled task
  • TTL for the user-to-HS mapping (HOSTED_BY.resolved_at + configurable hs_resolver_ttl), such that recently resolved PKDNS records (user-to-HS mappings) are not resolved again soon, since this is a time-intensive operation (full PKDNS lookup takes ~2s per record)

Open tasks

  • Clarify handling of deleted users in new queries (deleted = not just [DELETED] username, but also absence of PKDNS record pointing to any HS)

Copilot AI and others added 2 commits March 19, 2026 18:31
`sort_by_failures` (#89)

* Initial plan

* Replace UserHsFailures JSON storage with Redis Sorted Set

Replace the JSON-based Redis storage (UserHsFailures struct with
Serialize/Deserialize fields) with a Redis Sorted Set approach:

- Key: Sorted:Users:HsResolutionFailures
- Member: user PK (user_id)
- Score: failure count

Changes:
- get() now uses ZSCORE (check_sorted_set_member)
- increment() now uses ZINCRBY (increment_score_index_sorted_set)
- remove() now uses ZREM (remove_from_index_sorted_set)
- sort_by_failures() fetches all scores in one call via
  try_from_index_sorted_set and builds a HashMap for lookup

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>

* Use total_cmp for f64 sorting in sort_by_failures

Address code review feedback: replace partial_cmp with total_cmp
for f64 comparisons since Redis scores are always valid numbers.

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>

* Introduce UserHsFailures::get_all, remove get, update sort_by_failures
  and tests

- Add get_all() returning HashMap<String, f64> from the sorted set
- Use get_all() at the start of sort_by_failures
- Remove get() method (was only used in tests)
- Update test_user_hs_failures_increment_and_remove to use get_all() +
  lookup

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>

* Re-introduce UserHsFailures::get

* Avoid unnecessary failure-state deletes in user homeserver resolver
  (#90)

* Initial plan

* fix: only clear hs failures for users that had failures

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>

* refactor: carry failure scores through resolver ordering

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot]
<198982749+Copilot@users.noreply.github.com>
Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>

* Simplify sort_by_failures

* Simplify tests

---------

Co-authored-by: copilot-swe-agent[bot]
<198982749+Copilot@users.noreply.github.com>
Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
@ok300 ok300 force-pushed the ok300-user-hs-resolver branch from fa724d2 to bb640d7 Compare March 20, 2026 12:21
ok300 and others added 4 commits March 20, 2026 13:27
…rphan homeservers (#92)

* Initial plan

* Rename get_all_from_graph to get_all_active_from_graph with orphan-filtering query and test

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>

* Move test_get_all_active_excludes_orphan_homeservers to nexus-watcher, use WatcherTest::create_user

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>

* Fix error message

* Simplify tests

* chore: rename var

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
@ok300 ok300 force-pushed the ok300-user-hs-resolver branch 2 times, most recently from fccd6ef to bdaabea Compare March 21, 2026 12:52
@ok300 ok300 force-pushed the ok300-user-hs-resolver branch from bdaabea to de11d89 Compare March 21, 2026 13:04
…d vectors (#94)

* Initial plan

* Change get_all_active_homeservers to get_all_homeservers: return all homeservers with active user counts, sorted descending

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/a3229ace-3d19-48fd-9eaf-844eb0f54d33

* Address code review: add length validation and replace unwrap() with expect() in tests

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/a3229ace-3d19-48fd-9eaf-844eb0f54d33

* Simplify get_all_homeservers query to return rows of (id, active_users) instead of two collected vectors

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/960c0e17-79c1-4ebb-8315-23239a8258c1

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
@ok300 ok300 force-pushed the ok300-user-hs-resolver branch from 106ce46 to be1c20f Compare March 21, 2026 18:04
ok300 and others added 12 commits March 22, 2026 12:53
…#98)

Add a `resolved_at` timestamp to the `HOSTED_BY` relationship and a
configurable `hs_resolver_ttl` (default 1 hour) so that the periodic
user-hs-resolver only re-resolves users whose mapping is stale or
missing, instead of resolving every user on every tick.

Co-authored-by: Claude <noreply@anthropic.com>
* Initial plan

* Extract hs_resolver_sleep into WatcherConfig

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/cda7bdfe-e2ad-4c96-be76-15130ad896e2

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
…Entry (#99)

* Initial plan

* Update get_all_homeservers to exclude orphan homeservers and remove HomeserverEntry

- Rename query to get_all_homeservers_with_active_users, use MATCH instead
  of OPTIONAL MATCH to exclude homeservers with no active users
- Remove HomeserverEntry struct, return Vec<String> of HS IDs instead
- Simplify callers in processor_runner.rs (production and mock)
- Update integration test to verify orphan HSes are excluded

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/9560a0d1-51ca-48e1-8a9b-51f7792c41fa

* Fix failing tests: create active users for test homeservers

Update create_random_homeservers_and_persist to accept create_active_users
parameter. When Some(n), creates n User nodes linked via HOSTED_BY to the
homeserver, ensuring they are returned by get_all_from_graph() which now
excludes orphan homeservers.

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/1c61bdbc-21b4-4882-8890-46151a5a1b7a

* Simplify get_all_homeservers_with_active_users query

* Clarify rustdoc for get_all_from_graph

* Rename Homeserver::get_all_from_graph

* Clarify doc

* Expand active_homeservers.rs with more test variations

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
* Initial plan

* Remove UserHsFailures, sort_by_failures, and associated Redis Sorted Set

Simplify user_hs_resolver::run() to iterate user IDs directly,
logging a warning on failure and continuing without reordering.
Remove the two tests that exercised the removed functionality.

Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/e33413d6-ba72-4835-ae05-eb24e35cb3e7

* Simplify rustdoc

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
* Add Redis-backed circuit breaker for homeserver health tracking

Detect offline/unresponsive homeservers and skip them using a
three-state
circuit breaker pattern (Closed → Open → HalfOpen) persisted in Redis.

- Closed: normal operation, failures counted
- Open: homeserver skipped after 5 consecutive failures
- HalfOpen: after 5min cooldown, one probe attempt allowed

On success the circuit resets; on probe failure it reopens. The circuit
breaker integrates into `external_homeservers_by_priority` (filtering)
and `run_external_homeservers` (outcome recording). Redis errors fail
open to avoid accidentally blocking healthy homeservers.

* Refactor circuit breaker to use RedisOps trait helpers

Replace direct get_redis_conn / AsyncCommands usage with the RedisOps
trait methods (try_from_index_json, put_index_json,
remove_from_index_multiple_json), consistent with how other models
like Homeserver interact with Redis.

https://claude.ai/code/session_017CE3zgaXzpziLUt3nUfu5X

---------

Co-authored-by: Claude <noreply@anthropic.com>
@ok300 ok300 marked this pull request as ready for review March 23, 2026 11:24
@ok300 ok300 added the 🕸️ decentralization Distributed events from homeservers label Mar 23, 2026
ok300 and others added 13 commits March 23, 2026 18:53
- Delete nexus-common/src/models/circuit_breaker.rs
  (HomeserverCircuitBreaker struct, CircuitState enum, and all tests)
- Remove pub mod circuit_breaker from models/mod.rs
- Remove HomeserverCircuitBreaker::filter_available() call from
  processor_runner.rs
- Remove HomeserverCircuitBreaker::record_success/failure() calls from
  tevent_processor_runner.rs

Co-authored-by: Claude <noreply@anthropic.com>
@ok300 ok300 merged commit 6d6fae4 into pubky:feat/dx-events-by-user Apr 3, 2026
ok300 added a commit that referenced this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🕸️ decentralization Distributed events from homeservers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants