Skip to content

fix: ingest unknown users#795

Open
tipogi wants to merge 1 commit intofeat/dx-events-by-userfrom
bug/new-unknown-user
Open

fix: ingest unknown users#795
tipogi wants to merge 1 commit intofeat/dx-events-by-userfrom
bug/new-unknown-user

Conversation

@tipogi
Copy link
Copy Markdown
Collaborator

@tipogi tipogi commented Apr 3, 2026

Pre-submission Checklist

For tests to work you need a working neo4j and redis instance with the example dataset in docker/db-graph

  • Testing: Implement and pass new tests for the new features/fixes, cargo nextest run.
  • Performance: Ensure new code has relevant performance benchmarks, cargo bench -p nexus-webapi

What this PR fixes

Wrong thing was persisted for unknown users (living outside the default homeserver). When an event referenced a user Nexus did not have yet, the code resolved their homeserver and called Homeserver::persist_if_unknown, so Nexus mainly created/stored a homeserver in the graph and cache. The referenced user still was not represented as an ingested User in the graph in that path.

This PR

Ingests the user (minimal profile) when they are unknown in the graph, writes the USER_HS_CURSOR mapping in Redis, and aligns the bootstrap route naming/docs with user ingestion.

@tipogi tipogi added this to the 2026-Q1 milestone Apr 3, 2026
@tipogi tipogi self-assigned this Apr 3, 2026
@tipogi tipogi added 🐞 bug Something isn't working 🕸️ decentralization Distributed events from homeservers labels Apr 3, 2026
@tipogi tipogi requested review from aintnostressin and ok300 and removed request for ok300 April 3, 2026 17:02

/// If a referenced user is unknown, not ingested in the graph yet, resolves their homeserver
/// and persists the user node in the graph.
pub async fn maybe_ingest_for_user(user_id: &str) -> ModelResult<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub async fn maybe_ingest_for_user(user_id: &str) -> ModelResult<()> {
#[tracing::instrument(name = "user.ingest", skip_all)]
pub async fn maybe_ingest_for_user(user_id: &str) -> ModelResult<()> {

The tracing annotation was not moved together with the method.

pub async fn ingest_user_handler(Path(user_id): Path<String>) -> Result<()> {
debug!("PUT {INGEST_USER_ROUTE}, user_id:{user_id}");

PubkyId::try_from(&user_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep this PubkyId validation? It fails early, in case API is called with malformed PK.

/// ### Arguments
///
/// - `referenced_post_uri`: The parent post (if current post is a reply to it), or a reposted post (if current post is a Repost)
pub async fn maybe_ingest_for_post(referenced_post_uri: &ParsedUri) -> ModelResult<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, "ingesting" meant "ingesting a HS".

Now with this PR, "ingesting" means "ingesting a user".

Then maybe_ingest_for_post becomes a bit confusing: what does it mean to ingest a user for a post?

Therefore I would rename maybe_ingest_for_post -> maybe_ingest_author_of_post.

(A)


/// If a referenced user is unknown, not ingested in the graph yet, resolves their homeserver
/// and persists the user node in the graph.
pub async fn maybe_ingest_for_user(user_id: &str) -> ModelResult<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new meaning (see (A)), I would rename maybe_ingest_for_user -> maybe_ingest_user.

OperationOutcome::MissingDependency => {
if let Err(e) = Homeserver::maybe_ingest_for_user(followee_id.as_str()).await {
if let Err(e) = UserDetails::maybe_ingest_for_user(followee_id.as_str()).await {
tracing::error!("Failed to ingest homeserver: {e}");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tracing::error!("Failed to ingest homeserver: {e}");
tracing::error!("Failed to ingest user: {e}");

OperationOutcome::MissingDependency => {
if let Err(e) = Homeserver::maybe_ingest_for_user(tagged_user_id.as_str()).await {
if let Err(e) = UserDetails::maybe_ingest_for_user(tagged_user_id.as_str()).await {
tracing::error!("Failed to ingest homeserver: {e}");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tracing::error!("Failed to ingest homeserver: {e}");
tracing::error!("Failed to ingest user: {e}");

let hs_id = &hs_pk.into_inner().to_z32();

user_details
.put_to_graph()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe_ingest_for_user only writes the new UserDetails to graph.

I think we should add it to the index, too, by adding

user_details.put_index_json(&[user_id], None, None).await?;

right after put_to_graph.

Ok(())
}

/// If a referenced post is hosted on a new, unknown homeserver, this method triggers ingestion of that user.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If a referenced post is hosted on a new, unknown homeserver, this method triggers ingestion of that user.
/// If a referenced post is authored by a new, unknown user, this method triggers ingestion of that user.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is some duplication between

  • nexus-watcher/tests/homeserver/active_homeservers.rs
  • nexus-watcher/tests/user_ingestion/active_homeservers.rs

I'd say they should be consolidated into the first path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working 🕸️ decentralization Distributed events from homeservers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants