Skip to content

Conversation

@RyanPotat
Copy link
Contributor

Adds a route to check for a user's previous name history, showing the first and last date seen for each username.

Should still respect user opt-outs.

The query itself consistently takes around ~100ms on my local instance, scanning 1.5 billion rows.

Very (completely) new to rust so I mainly just copied your existing implementations of stuff, and probably missed something.

Tested and worked fine locally.

Copy link
Owner

@boring-nick boring-nick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a useful endpoint, but I'm worried about the database load it can cause by scanning the whole table by user id without a channel.

Ideally this should be precalculated as a materialized view/projection that can be quickly queried by user id. But that can be done later, result cache will suffice for now.

pub first_timestamp: String,
}

pub type PreviousNames = Vec<PreviousName>;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to make a type alias for such a simple type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I will simplify to a single type.

src/db/mod.rs Outdated
Comment on lines 407 to 411
let sanitized_user_login = if name_history_row.user_login.starts_with(':') {
name_history_row.user_login.chars().skip(1).collect::<String>()
} else {
name_history_row.user_login.clone()
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be let sanitized_user_login = name_history_row.user_login.trim_start_matches(':');

src/db/mod.rs Outdated
pub first_timestamp: i32,
}

let query = "SELECT user_login, toDateTime((MAX(timestamp))) AS last_timestamp, toDateTime(MIN(timestamp)) AS first_timestamp FROM message_structured WHERE user_id = ? GROUP BY user_login".to_owned();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this query does a large scan and can be potentially heavy, the result should be cached. You should add this at the end of the query so the result is cached for 10 minutes:

SETTINGS use_query_cache = 1, query_cache_ttl = 600

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will definitely add this, makes sense. I was not aware this was a in-house feature.

@RyanPotat
Copy link
Contributor Author

You are not wrong about the database load concerns, I asked the largest rustlog instance I know (@ZonianMidian's) to run this query directly, and it was extremely slow. I have already made your corrections, but I will also attempt to split up the query up and avoid a group by clause, which showed much better results on both rustlog instances.

@RyanPotat
Copy link
Contributor Author

I have split the query to vastly improve query speed on databases that have not been optimized recently.

As far as adding a projection/materialized view, this is a route that most likely will get low use, the performance tradeoff on inserts could be a negative net benefit? I'm not sure.

@RyanPotat RyanPotat requested a review from boring-nick February 6, 2025 06:55
Copy link
Owner

@boring-nick boring-nick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this approach seems fine, should be much faster than the GROUP BY version since now it always filters by user id.

You should also run cargo clippy on your code, it might have some minor nitpicks regarding style

src/db/mod.rs Outdated
Comment on lines 410 to 427
let name_history_rows = sanitized_user_logins
.into_iter()
.map(|login| {
let query = history_query.clone();
async move {
let query = db.query(&query).bind(user_id).bind(&login);
query
.fetch_one::<SingleNameHistory>()
.await
.map(|history| (login, history))
.map_err(Error::from)
}
})
.collect::<futures::stream::FuturesOrdered<_>>()
.collect::<Vec<_>>()
.await
.into_iter()
.collect::<Result<Vec<_>>>()?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The triple .collect() is needlessly convoluted, you can write this using try_join_all:

    let name_history_rows = try_join_all(sanitized_user_logins.into_iter().map(|login| {
        let query = history_query.clone();
        async move {
            let query = db.query(&query).bind(user_id).bind(&login);
            query
                .fetch_one::<SingleNameHistory>()
                .await
                .map(|history| (login, history))
        }
    }))
    .await?;

This also makes the map_err redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, this is much cleaner

src/db/mod.rs Outdated
let sanitized_user_logins = distinct_logins
.iter()
.map(|login| login.trim_start_matches(':').to_owned())
.collect::<Vec<String>>();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This collect is redundant, you continue iterating over the logins later

Comment on lines 188 to 189
pub last_timestamp: String,
pub first_timestamp: String,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use DateTime<Utc> directly in the response type, it will get serialized into an RFC3339 string in json

@boring-nick boring-nick merged commit f99b909 into boring-nick:master Feb 7, 2025
@RyanPotat RyanPotat deleted the impr/add-name-change-route branch February 7, 2025 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants