Conversation
|
@mccalluc I tried this with a 1000 row file and it worked fine, but when I tried a 10,000 row file the app seemed to freeze up. Is this too big for DP Wizard? |
|
@ekraffmiller , thanks for the failing example! Which step are you stuck at? (We are reading the entire file to infer schema, and then the sampling and preview time could also slow things down. Both of those could be done with just the first n rows, at the risk of being surprised if later rows aren't like the earlier rows.) |
|
@mccalluc screen-capture (3).webm |
|
@ekraffmiller : The video isn't working for me, though the errors are different in FF and Chrome, but that's just FYI: I have enough information: Thank you!
|
Ok, do you want to work more on this, or consider the size issue separately? |
|
@ekraffmiller : Let me move this back to draft: I think the sampling does fill a gap, but with it probably being O(n^2), it's not something to merge now. Thanks for catching this. |
analysis_panel/__init__.py: Minimize the code inside the if-else, and move the return outside.sampleto generate a Lazyframe of the requested size, either smaller or larger than the original. (lazyframe -> dataframe -> lazyframe is kludgy, but I don't have a better alternative.)make_accuracy_histogram, renamerow_countparameter tomax_length, since that's what it is used for.For reviewer: