Multithread timeseries Take 2 by msweier · Pull Request #219 · HydrologicEngineeringCenter/cwms-python

msweier · 2025-09-30T15:23:05Z

Hi Guys,

I refactored my first attempt at chunk multi-threading timeseries to make it easier to follow/maintain (hopefully). By default, chunk timeseries multi-threading will be on. For example for a query that is longer than 2 weeks it will spawn threads for every 2 week chunk until it hits the max_worker parameter. For a store longer than 2 weeks it will do the same until it hits the max worker.

**Multi-threading to get_timeseries**

The function defaults to multi-threading and will use up to a max_worker number of threads (~20) and will chunk data by a default of 14 days (assuming 15 minute data). It uses a helper function get_ts_extents from the catalog to check the extents of the timeseries to prevent requesting data for times outside the extents, if the request is before 2014.

**Multi-threading to store_timeseries**

The function defaults to multi-threading and will use up to a max_worker number of threads (~20) and will chunk data by a default of roughly 700 values (~14 days of 15 minute data).

I added a store and read test of roughly 1 month, which checks to make sure only 2 threads are used.

This PR replaces #210

Enovotny

overall looks good. Just have some additions to the tests. Also did you do any performance testing to see what the optimal chunk size would be? is it 2 weeks? or would a larger size be more efficient.

tests/cda/timeseries/timeseries_CDA_test.py

Enovotny · 2025-10-06T16:35:21Z

tests/cda/timeseries/timeseries_CDA_test.py

+    # Generate 15-minute interval timestamps
+    dt = pd.date_range(
+        start=START_DATE_CHUNK_MULTI, end=END_DATE_CHUNK_MULTI, freq="15T", tz="UTC"
+    )


change 15T to 15min based on warning from test.

msweier · 2025-10-06T16:38:33Z

overall looks good. Just have some additions to the tests. Also did you do any performance testing to see what the optimal chunk size would be? is it 2 weeks? or would a larger size be more efficient.
api_read_performance_result.csv

I did some testing last week on reads for chunk size and number of max_workers (I assume stores are similar). Generally, 14 day chunks are good for queries up to about a year or so, and then you are better using about 30 day chunks. For max_worker threads, 20 is good for lengths below a year and then you are better off bumping to 30. We could code this default in to scale depending on the query length, but I think that would just complicate things. The user can up the chunks and max_workers if they are getting POR data.

Enovotny · 2025-10-06T16:51:56Z

How does the 30 day size run with data less than a year? If it is comparable to 14 I would put it at 30 as default. Same for workers. if the 14 is faster than 30 for smaller pulls I would agree and leave it. maybe add a note to the parameter that lets the user know this.

Also have additional asserts to add. An assert to make sure that the number of values returned is that same as the number stored. and make sure there are not any null values returned.

msweier · 2025-10-06T18:46:39Z

How does the 30 day size run with data less than a year? If it is comparable to 14 I would put it at 30 as default. Same for workers. if the 14 is faster than 30 for smaller pulls I would agree and leave it. maybe add a note to the parameter that lets the user know this.

Yeah for smaller pulls, 14 is faster. Same with max_workers (20 is better than 30 for small pulls). I added a note to the documentation. Probably worth revisiting once CDA gets performance improvements.

Also have additional asserts to add. An assert to make sure that the number of values returned is that same as the number stored. and make sure there are not any null values returned.

Added those asserts.

sonarqubecloud · 2025-10-06T19:47:25Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

msweier · 2025-10-06T19:48:51Z

Updated tests to compare entire df read/write with assert_frame_equal

msweier added 6 commits September 26, 2025 14:09

add get_ts_extents

35cf149

add multithread to get_timeseries

287bd46

add multi to store_timeseries

e5061b8

fix typing

52d17b7

add extent check

f2b80a2

add chunk multi tests

e110b04

msweier requested review from Enovotny and krowvin September 30, 2025 15:23

msweier mentioned this pull request Sep 30, 2025

Multithread timeseries #210

Closed

msweier added 2 commits September 30, 2025 10:32

cleanup variables

0a0b53c

format

b732cf7

Enovotny requested changes Sep 30, 2025

View reviewed changes

tests/cda/timeseries/timeseries_CDA_test.py Show resolved Hide resolved

msweier added 3 commits October 6, 2025 11:19

correct thread notification

75c6742

format

b8502a0

add value checking

f5e735a

Enovotny reviewed Oct 6, 2025

View reviewed changes

update freq

037fa00

msweier added 2 commits October 6, 2025 13:43

update documentation

becbe69

add num of values and non null check

70fc788

Enovotny approved these changes Oct 6, 2025

View reviewed changes

compare entire df with read/write

e90b7d2

msweier marked this pull request as ready for review October 6, 2025 19:59

Enovotny approved these changes Oct 6, 2025

View reviewed changes

msweier merged commit 1769a73 into main Oct 6, 2025
9 checks passed

Enovotny deleted the multithread_timeseries2 branch October 15, 2025 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithread timeseries Take 2#219

Multithread timeseries Take 2#219
msweier merged 15 commits intomainfrom
multithread_timeseries2

msweier commented Sep 30, 2025 •

edited

Loading

Uh oh!

Enovotny left a comment

Uh oh!

Uh oh!

Enovotny Oct 6, 2025

Uh oh!

msweier commented Oct 6, 2025 •

edited

Loading

Uh oh!

Enovotny commented Oct 6, 2025

Uh oh!

msweier commented Oct 6, 2025 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Oct 6, 2025

Uh oh!

msweier commented Oct 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

msweier commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Enovotny left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Enovotny Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

msweier commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Enovotny commented Oct 6, 2025

Uh oh!

msweier commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Oct 6, 2025

Quality Gate passed

Uh oh!

msweier commented Oct 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

msweier commented Sep 30, 2025 •

edited

Loading

msweier commented Oct 6, 2025 •

edited

Loading

msweier commented Oct 6, 2025 •

edited

Loading