feat: add user-hs-resolver task#765
Merged
ok300 merged 34 commits intopubky:feat/dx-events-by-userfrom Apr 3, 2026
Merged
Conversation
12 tasks
`sort_by_failures` (#89) * Initial plan * Replace UserHsFailures JSON storage with Redis Sorted Set Replace the JSON-based Redis storage (UserHsFailures struct with Serialize/Deserialize fields) with a Redis Sorted Set approach: - Key: Sorted:Users:HsResolutionFailures - Member: user PK (user_id) - Score: failure count Changes: - get() now uses ZSCORE (check_sorted_set_member) - increment() now uses ZINCRBY (increment_score_index_sorted_set) - remove() now uses ZREM (remove_from_index_sorted_set) - sort_by_failures() fetches all scores in one call via try_from_index_sorted_set and builds a HashMap for lookup Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> * Use total_cmp for f64 sorting in sort_by_failures Address code review feedback: replace partial_cmp with total_cmp for f64 comparisons since Redis scores are always valid numbers. Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> * Introduce UserHsFailures::get_all, remove get, update sort_by_failures and tests - Add get_all() returning HashMap<String, f64> from the sorted set - Use get_all() at the start of sort_by_failures - Remove get() method (was only used in tests) - Update test_user_hs_failures_increment_and_remove to use get_all() + lookup Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> * Re-introduce UserHsFailures::get * Avoid unnecessary failure-state deletes in user homeserver resolver (#90) * Initial plan * fix: only clear hs failures for users that had failures Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> * refactor: carry failure scores through resolver ordering Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> * Simplify sort_by_failures * Simplify tests --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
fa724d2 to
bb640d7
Compare
…rphan homeservers (#92) * Initial plan * Rename get_all_from_graph to get_all_active_from_graph with orphan-filtering query and test Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> * Move test_get_all_active_excludes_orphan_homeservers to nexus-watcher, use WatcherTest::create_user Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> * Fix error message * Simplify tests * chore: rename var --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
fccd6ef to
bdaabea
Compare
bdaabea to
de11d89
Compare
…d vectors (#94) * Initial plan * Change get_all_active_homeservers to get_all_homeservers: return all homeservers with active user counts, sorted descending Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/a3229ace-3d19-48fd-9eaf-844eb0f54d33 * Address code review: add length validation and replace unwrap() with expect() in tests Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/a3229ace-3d19-48fd-9eaf-844eb0f54d33 * Simplify get_all_homeservers query to return rows of (id, active_users) instead of two collected vectors Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/960c0e17-79c1-4ebb-8315-23239a8258c1 --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
106ce46 to
be1c20f
Compare
…#98) Add a `resolved_at` timestamp to the `HOSTED_BY` relationship and a configurable `hs_resolver_ttl` (default 1 hour) so that the periodic user-hs-resolver only re-resolves users whose mapping is stale or missing, instead of resolving every user on every tick. Co-authored-by: Claude <noreply@anthropic.com>
* Initial plan * Extract hs_resolver_sleep into WatcherConfig Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/cda7bdfe-e2ad-4c96-be76-15130ad896e2 --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
…Entry (#99) * Initial plan * Update get_all_homeservers to exclude orphan homeservers and remove HomeserverEntry - Rename query to get_all_homeservers_with_active_users, use MATCH instead of OPTIONAL MATCH to exclude homeservers with no active users - Remove HomeserverEntry struct, return Vec<String> of HS IDs instead - Simplify callers in processor_runner.rs (production and mock) - Update integration test to verify orphan HSes are excluded Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/9560a0d1-51ca-48e1-8a9b-51f7792c41fa * Fix failing tests: create active users for test homeservers Update create_random_homeservers_and_persist to accept create_active_users parameter. When Some(n), creates n User nodes linked via HOSTED_BY to the homeserver, ensuring they are returned by get_all_from_graph() which now excludes orphan homeservers. Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/1c61bdbc-21b4-4882-8890-46151a5a1b7a * Simplify get_all_homeservers_with_active_users query * Clarify rustdoc for get_all_from_graph * Rename Homeserver::get_all_from_graph * Clarify doc * Expand active_homeservers.rs with more test variations --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
…00-user-hs-resolver
* Initial plan * Remove UserHsFailures, sort_by_failures, and associated Redis Sorted Set Simplify user_hs_resolver::run() to iterate user IDs directly, logging a warning on failure and continuing without reordering. Remove the two tests that exercised the removed functionality. Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com> Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/e33413d6-ba72-4835-ae05-eb24e35cb3e7 * Simplify rustdoc --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
* Add Redis-backed circuit breaker for homeserver health tracking Detect offline/unresponsive homeservers and skip them using a three-state circuit breaker pattern (Closed → Open → HalfOpen) persisted in Redis. - Closed: normal operation, failures counted - Open: homeserver skipped after 5 consecutive failures - HalfOpen: after 5min cooldown, one probe attempt allowed On success the circuit resets; on probe failure it reopens. The circuit breaker integrates into `external_homeservers_by_priority` (filtering) and `run_external_homeservers` (outcome recording). Redis errors fail open to avoid accidentally blocking healthy homeservers. * Refactor circuit breaker to use RedisOps trait helpers Replace direct get_redis_conn / AsyncCommands usage with the RedisOps trait methods (try_from_index_json, put_index_json, remove_from_index_multiple_json), consistent with how other models like Homeserver interact with Redis. https://claude.ai/code/session_017CE3zgaXzpziLUt3nUfu5X --------- Co-authored-by: Claude <noreply@anthropic.com>
- Delete nexus-common/src/models/circuit_breaker.rs (HomeserverCircuitBreaker struct, CircuitState enum, and all tests) - Remove pub mod circuit_breaker from models/mod.rs - Remove HomeserverCircuitBreaker::filter_available() call from processor_runner.rs - Remove HomeserverCircuitBreaker::record_success/failure() calls from tevent_processor_runner.rs Co-authored-by: Claude <noreply@anthropic.com>
Agent-Logs-Url: https://github.com/ok300/pubky-nexus/sessions/4f2aae79-6241-4e7e-a594-f64f4c9273ca Co-authored-by: ok300 <106775972+ok300@users.noreply.github.com>
ok300-user-hs-resolver
ok300
added a commit
that referenced
this pull request
Apr 3, 2026
Ari4ka
approved these changes
Apr 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a 3rd periodic task called
user-hs-resolver. It periodically loops over known users, resolves their HS PK as configured in their PKDNS record, and persists this mapping.It exposes a utility
which can be integrated in
EventProcessor::poll_eventsto fetch the events of all users in a given HS.Builds on top of / targets the feature branch of #726 .
This PR brings the following related changes:
HOSTED_BY)user_hs_resolverscheduled taskHOSTED_BY.resolved_at+ configurablehs_resolver_ttl), such that recently resolved PKDNS records (user-to-HS mappings) are not resolved again soon, since this is a time-intensive operation (full PKDNS lookup takes ~2s per record)Open tasks
[DELETED]username, but also absence of PKDNS record pointing to any HS)