Make list diff reuse _obj_diff results instead of making duplicate calls#74
Make list diff reuse _obj_diff results instead of making duplicate calls#74bfrobin446 wants to merge 1 commit intoxlwings:masterfrom
Conversation
The backtracking pass of the list diff algorithm in `_list_diff_0` was calling `_obj_diff` on pairs of objects that had already been diffed in the forward pass earlier in the `_list_diff` method. This commit saves the `_obj_diff` results from the forward pass and reuses them in the backtracking pass so we don't duplicate the recursive calls.
|
I'm running some benchmarks to see how this affects performance and it looks like this is noticeably slower which is unexpected. Do you happen to have any benchmarks that you've run? I'd be interested in results for both memory usage and execution time. I picked a random test that touches this code path, At first glance I suspect this is caused by preallocating the entire 2d array of |
|
Also thanks for taking to time to make this PR @bfrobin446 :) |
|
I’d expect the difference to be the size and structure of the array entries. If the child objects are expensive to diff (perhaps especially if there are arrays at several different levels of nesting), the balance will shift in favor of avoiding duplicate work. I can’t share the files I was profiling on (employer internal), but I’ll see if I can generate a set of files that shows the behavior I was seeing. |
|
Another option would be to replace the preallocated array with a dictionary that stores only the row/column combinations that the forward pass touched. Once I have a set of reproduction files that I can share, I'll see if that gets us a better compromise between performance in the normal case and performance in the pathological case. |
The backtracking pass of the list diff algorithm in
_list_diff_0was calling_obj_diffon pairs of objects that had already been diffed in the forward pass earlier in the_list_diffmethod.When I create an array of
_obj_diffresults in the forward pass and reuse that array in the backtracking pass, I see about a 5x speedup on JSON files that contain lists of a few dozen large objects.