Let estimated rows work with public data by mccalluc · Pull Request #889 · opendp/dp-wizard

mccalluc · 2026-03-05T20:43:08Z

Fix Estimated rows isn't used with public data #861
Simplify UI code in analysis_panel/__init__.py: Minimize the code inside the if-else, and move the return outside.
Use polars sample to generate a Lazyframe of the requested size, either smaller or larger than the original. (lazyframe -> dataframe -> lazyframe is kludgy, but I don't have a better alternative.)
Inside make_accuracy_histogram, rename row_count parameter to max_length, since that's what it is used for.

For reviewer:

Adding more computation will slow down the interface a little bit. With a thousand line file I don't notice a difference, but is this the direction we want to go in? (See Latency is too high for epsilon slider #863)
The only change in the UI is in the plot, so I don't have good idea for an automated test. Let me know if you have ideas?

ekraffmiller · 2026-03-09T14:21:33Z

@mccalluc I tried this with a 1000 row file and it worked fine, but when I tried a 10,000 row file the app seemed to freeze up. Is this too big for DP Wizard?
PUMS5extract10000.csv

mccalluc · 2026-03-09T15:06:58Z

@ekraffmiller , thanks for the failing example! Which step are you stuck at?

(We are reading the entire file to infer schema, and then the sampling and preview time could also slow things down. Both of those could be done with just the first n rows, at the risk of being surprised if later rows aren't like the earlier rows.)

ekraffmiller · 2026-03-09T16:21:22Z

@mccalluc screen-capture (3).webm
Here is a video, basically what happens is after I select the file, the page becomes unresponsive - I'm trying to update the other input fields and I can't. Eventually I get an error from Chrome that the page is unresponsive.

mccalluc · 2026-03-09T17:12:10Z

@ekraffmiller : The video isn't working for me, though the errors are different in FF and Chrome, but that's just FYI: I have enough information: Thank you!

The naive approach to sampling is almost certainly O(n^2). I think we could instead pick row indexes by taking some large-ish prime, modulo the number of rows.
Filed another issue for more general profiling: Profile large CSVs... and warn on upload #901. In my court.

ekraffmiller · 2026-03-09T18:57:15Z

@ekraffmiller : The video isn't working for me, though the errors are different in FF and Chrome, but that's just FYI: I have enough information: Thank you!

The naive approach to sampling is almost certainly O(n^2). I think we could instead pick row indexes by taking some large-ish prime, modulo the number of rows.

Filed another issue for more general profiling: Profile large CSVs #901. In my court.

Ok, do you want to work more on this, or consider the size issue separately?

mccalluc · 2026-03-09T20:02:18Z

@ekraffmiller : Let me move this back to draft: I think the sampling does fill a gap, but with it probably being O(n^2), it's not something to merge now. Thanks for catching this.

mccalluc added 3 commits March 5, 2026 14:33

keep defaults in one place

7273a2d

factor ui out of if-then

686dd31

sample from public data

2e20116

github-project-automation bot added this to DP Wizard Mar 5, 2026

github-project-automation bot moved this to Pending in DP Wizard Mar 5, 2026

ekraffmiller self-assigned this Mar 9, 2026

Merge branch 'main' into 861-estimated-rows-and-public-data

d7c8929

mccalluc marked this pull request as draft March 9, 2026 20:02

mccalluc mentioned this pull request Mar 11, 2026

Move Privacy Budget and Simulation below #906

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let estimated rows work with public data#889

Let estimated rows work with public data#889
mccalluc wants to merge 4 commits intomainfrom
861-estimated-rows-and-public-data

mccalluc commented Mar 5, 2026 •

edited

Loading

Uh oh!

ekraffmiller commented Mar 9, 2026

Uh oh!

mccalluc commented Mar 9, 2026

Uh oh!

ekraffmiller commented Mar 9, 2026

Uh oh!

mccalluc commented Mar 9, 2026

Uh oh!

ekraffmiller commented Mar 9, 2026

Uh oh!

mccalluc commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mccalluc commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ekraffmiller commented Mar 9, 2026

Uh oh!

mccalluc commented Mar 9, 2026

Uh oh!

ekraffmiller commented Mar 9, 2026

Uh oh!

mccalluc commented Mar 9, 2026

Uh oh!

ekraffmiller commented Mar 9, 2026

Uh oh!

mccalluc commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mccalluc commented Mar 5, 2026 •

edited

Loading