Impr/add name change route #35

RyanPotat · 2025-01-21T10:20:01Z

Adds a route to check for a user's previous name history, showing the first and last date seen for each username.

Should still respect user opt-outs.

The query itself consistently takes around ~100ms on my local instance, scanning 1.5 billion rows.

Very (completely) new to rust so I mainly just copied your existing implementations of stuff, and probably missed something.

Tested and worked fine locally.

boring-nick

This is definitely a useful endpoint, but I'm worried about the database load it can cause by scanning the whole table by user id without a channel.

Ideally this should be precalculated as a materialized view/projection that can be quickly queried by user id. But that can be done later, result cache will suffice for now.

boring-nick · 2025-02-05T07:28:20Z

src/web/schema.rs

+    pub first_timestamp: String,
+}
+
+pub type PreviousNames = Vec<PreviousName>;


It doesn't make sense to make a type alias for such a simple type

Makes sense, I will simplify to a single type.

boring-nick · 2025-02-05T07:30:33Z

src/db/mod.rs

+            let sanitized_user_login = if name_history_row.user_login.starts_with(':') {
+                name_history_row.user_login.chars().skip(1).collect::<String>()
+            } else {
+                name_history_row.user_login.clone()
+            };


This can just be let sanitized_user_login = name_history_row.user_login.trim_start_matches(':');

boring-nick · 2025-02-05T07:37:08Z

src/db/mod.rs

+        pub first_timestamp: i32,
+    }
+
+    let query = "SELECT user_login, toDateTime((MAX(timestamp))) AS last_timestamp, toDateTime(MIN(timestamp)) AS first_timestamp FROM message_structured WHERE user_id = ? GROUP BY user_login".to_owned();


Since this query does a large scan and can be potentially heavy, the result should be cached. You should add this at the end of the query so the result is cached for 10 minutes:

SETTINGS use_query_cache = 1, query_cache_ttl = 600

I will definitely add this, makes sense. I was not aware this was a in-house feature.

RyanPotat · 2025-02-05T10:01:33Z

You are not wrong about the database load concerns, I asked the largest rustlog instance I know (@ZonianMidian's) to run this query directly, and it was extremely slow. I have already made your corrections, but I will also attempt to split up the query up and avoid a group by clause, which showed much better results on both rustlog instances.

RyanPotat · 2025-02-06T06:54:46Z

I have split the query to vastly improve query speed on databases that have not been optimized recently.

As far as adding a projection/materialized view, this is a route that most likely will get low use, the performance tradeoff on inserts could be a negative net benefit? I'm not sure.

boring-nick

Overall this approach seems fine, should be much faster than the GROUP BY version since now it always filters by user id.

You should also run cargo clippy on your code, it might have some minor nitpicks regarding style

boring-nick · 2025-02-07T08:20:25Z

src/db/mod.rs

+    let name_history_rows = sanitized_user_logins
+        .into_iter()
+        .map(|login| {
+            let query = history_query.clone();
+            async move {
+                let query = db.query(&query).bind(user_id).bind(&login);
+                query
+                    .fetch_one::<SingleNameHistory>()
+                    .await
+                    .map(|history| (login, history))
+                    .map_err(Error::from)
+            }
+        })
+        .collect::<futures::stream::FuturesOrdered<_>>()
+        .collect::<Vec<_>>()
+        .await
+        .into_iter()
+        .collect::<Result<Vec<_>>>()?;


The triple .collect() is needlessly convoluted, you can write this using try_join_all:

let name_history_rows = try_join_all(sanitized_user_logins.into_iter().map(|login| { let query = history_query.clone(); async move { let query = db.query(&query).bind(user_id).bind(&login); query .fetch_one::<SingleNameHistory>() .await .map(|history| (login, history)) } })) .await?;

This also makes the map_err redundant

Makes sense, this is much cleaner

boring-nick · 2025-02-07T08:20:47Z

src/db/mod.rs

+    let sanitized_user_logins = distinct_logins
+        .iter()
+        .map(|login| login.trim_start_matches(':').to_owned())
+        .collect::<Vec<String>>();


This collect is redundant, you continue iterating over the logins later

boring-nick · 2025-02-07T08:28:35Z

src/web/schema.rs

+    pub last_timestamp: String,
+    pub first_timestamp: String,


You can use DateTime<Utc> directly in the response type, it will get serialized into an RFC3339 string in json

RyanPotat added 2 commits January 21, 2025 08:39

feat: add namechagne route

45d15bc

fix: actually works now

f2afe5c

boring-nick requested changes Feb 5, 2025

View reviewed changes

RyanPotat added 2 commits February 6, 2025 05:50

fix: add cache, simplify deduplication, remove redundant type

b15b7ce

impr: split query to avoid aggregation

fe52cb8

RyanPotat requested a review from boring-nick February 6, 2025 06:55

boring-nick requested changes Feb 7, 2025

View reviewed changes

fix: update type, simplify mapping, clippy nitpick

f67e62b

boring-nick merged commit f99b909 into boring-nick:master Feb 7, 2025

RyanPotat deleted the impr/add-name-change-route branch February 7, 2025 19:33

Impr/add name change route #35

Impr/add name change route #35

Uh oh!

Conversation

RyanPotat commented Jan 21, 2025

Uh oh!

boring-nick left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RyanPotat commented Feb 5, 2025

Uh oh!

RyanPotat commented Feb 6, 2025

Uh oh!

boring-nick left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants