-
Notifications
You must be signed in to change notification settings - Fork 17
Impr/add name change route #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impr/add name change route #35
Conversation
boring-nick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a useful endpoint, but I'm worried about the database load it can cause by scanning the whole table by user id without a channel.
Ideally this should be precalculated as a materialized view/projection that can be quickly queried by user id. But that can be done later, result cache will suffice for now.
src/web/schema.rs
Outdated
| pub first_timestamp: String, | ||
| } | ||
|
|
||
| pub type PreviousNames = Vec<PreviousName>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense to make a type alias for such a simple type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I will simplify to a single type.
src/db/mod.rs
Outdated
| let sanitized_user_login = if name_history_row.user_login.starts_with(':') { | ||
| name_history_row.user_login.chars().skip(1).collect::<String>() | ||
| } else { | ||
| name_history_row.user_login.clone() | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be let sanitized_user_login = name_history_row.user_login.trim_start_matches(':');
src/db/mod.rs
Outdated
| pub first_timestamp: i32, | ||
| } | ||
|
|
||
| let query = "SELECT user_login, toDateTime((MAX(timestamp))) AS last_timestamp, toDateTime(MIN(timestamp)) AS first_timestamp FROM message_structured WHERE user_id = ? GROUP BY user_login".to_owned(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this query does a large scan and can be potentially heavy, the result should be cached. You should add this at the end of the query so the result is cached for 10 minutes:
SETTINGS use_query_cache = 1, query_cache_ttl = 600
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will definitely add this, makes sense. I was not aware this was a in-house feature.
|
You are not wrong about the database load concerns, I asked the largest rustlog instance I know (@ZonianMidian's) to run this query directly, and it was extremely slow. I have already made your corrections, but I will also attempt to split up the query up and avoid a group by clause, which showed much better results on both rustlog instances. |
|
I have split the query to vastly improve query speed on databases that have not been optimized recently. As far as adding a projection/materialized view, this is a route that most likely will get low use, the performance tradeoff on inserts could be a negative net benefit? I'm not sure. |
boring-nick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this approach seems fine, should be much faster than the GROUP BY version since now it always filters by user id.
You should also run cargo clippy on your code, it might have some minor nitpicks regarding style
src/db/mod.rs
Outdated
| let name_history_rows = sanitized_user_logins | ||
| .into_iter() | ||
| .map(|login| { | ||
| let query = history_query.clone(); | ||
| async move { | ||
| let query = db.query(&query).bind(user_id).bind(&login); | ||
| query | ||
| .fetch_one::<SingleNameHistory>() | ||
| .await | ||
| .map(|history| (login, history)) | ||
| .map_err(Error::from) | ||
| } | ||
| }) | ||
| .collect::<futures::stream::FuturesOrdered<_>>() | ||
| .collect::<Vec<_>>() | ||
| .await | ||
| .into_iter() | ||
| .collect::<Result<Vec<_>>>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The triple .collect() is needlessly convoluted, you can write this using try_join_all:
let name_history_rows = try_join_all(sanitized_user_logins.into_iter().map(|login| {
let query = history_query.clone();
async move {
let query = db.query(&query).bind(user_id).bind(&login);
query
.fetch_one::<SingleNameHistory>()
.await
.map(|history| (login, history))
}
}))
.await?;This also makes the map_err redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, this is much cleaner
src/db/mod.rs
Outdated
| let sanitized_user_logins = distinct_logins | ||
| .iter() | ||
| .map(|login| login.trim_start_matches(':').to_owned()) | ||
| .collect::<Vec<String>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This collect is redundant, you continue iterating over the logins later
src/web/schema.rs
Outdated
| pub last_timestamp: String, | ||
| pub first_timestamp: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use DateTime<Utc> directly in the response type, it will get serialized into an RFC3339 string in json
Adds a route to check for a user's previous name history, showing the first and last date seen for each username.
Should still respect user opt-outs.
The query itself consistently takes around ~100ms on my local instance, scanning 1.5 billion rows.
Very (completely) new to rust so I mainly just copied your existing implementations of stuff, and probably missed something.
Tested and worked fine locally.