[GLUTEN-9456][VL] Add custom direct buffered input#11452
Conversation
|
|
||
| namespace gluten { | ||
|
|
||
| class GlutenDirectBufferedInput : public facebook::velox::dwio::common::BufferedInput { |
There was a problem hiding this comment.
Can we extend from DirectBufferedInput and only overwrite the function we need?
There was a problem hiding this comment.
We cannot do that if we want to clear the requests_, because it is private in the DirectBufferedInput.
~GlutenDirectBufferedInput() override {
requests_.clear();
for (auto& load : coalescedLoads_) {
if (load->state() == facebook::velox::cache::CoalescedLoad::State::kLoading) {
folly::SemiFuture<bool> waitFuture(false);
if (!load->loadOrFuture(&waitFuture)) {
auto& exec = folly::QueuedImmediateExecutor::instance();
std::move(waitFuture).via(&exec).wait();
}
}
load->cancel();
}
coalescedLoads_.clear();
}
There was a problem hiding this comment.
Let's merge this PR firstly, but submit a PR to Velox to change them as protect and other related changes.
There was a problem hiding this comment.
The required change has been merged into Velox, so we can make a fairly clean update now.
|
Thank you, Rui. Let's remove IBM/velox@2c1c467 once the PR is merged. |
|
@rui-mo Can there still be race condition scenarios after this change where we can hit this issue? |
Hi @ayushi-agarwal, are you referring to scenarios like the one above, or is there another case you encountered? |
|
@rui-mo @FelixYBW We see this issue again: there were few log lines added so that we know it goes in the new destructor added. |
|
Hi @ayushi-agarwal, I’ve learned that one of our customers is still encountering this issue after the fix was applied. It turns out their environment has I/O throttling and the bandwidth is saturated, which causes the following code to never exit. Do you have any similar throttling or bandwidth limits in your environment? Can you please also confirm if the program never exits from the above code? |
|
Thanks @rui-mo for quick reply, I had added a log line here - and this never got printed so I think program would have exited from the destructor. |
|
@ayushi-agarwal Gluten has a configuration called |
|
@rui-mo I had tried increasing that config to 75s and for some cases I had to increase it to 120s and it helped but we didn't want to increase it too much so that it starts affecting query latency. |
|
@ayushi-agarwal If increasing the wait time could take effect, the problem is still likely related to the |
|
@rui-mo I added a log line at the end of destructor also and it also got printed. I see the ref count of pool also increases with time for the pool: I have added these log lines in velox memory manage tryDestructSafeLoop and here in the end you can see the refcount Memory Allocator[MALLOC capacity UNLIMITED allocated bytes 1048576 allocated pages 0 mapped pages 0] Memory Allocator[MALLOC capacity UNLIMITED allocated bytes 1048576 allocated pages 0 mapped pages 0] |
|
@rui-mo @FelixYBW The issue is in the sleep function being used here - usleep is not working as expected and calls comes out as soon as it is called so no sleep at all. Changing it to std::this_thread::sleep_for(std::chrono::milliseconds(waitMs)); helped and now timeout of 2-3 s by default is also enough. |
@ayushi-agarwal Thanks for keeping us updated. May I confirm whether the actual wait time has changed, or is this only a config change because of using a larger wait unit? |
|
Thank you, @ayushi-agarwal, for your confirmation. In theory, with the |
|
@rui-mo In our case executor thread was getting interrupted by another monitoring tool that's why with usleep thread was not sleeping for specified interval. We didn't change the wait time, we just changed the method being used from usleep to std::this_thread::sleep_for. @FelixYBW From our investigation, we also think issue is still with I/O thread of DBI, here in the loop we are checking for loading state but there can be a case where a task was already submitted to executor pool which was in planned state at the time we checked and which got executed later. |
@ayushi-agarwal If that’s the case, could you please try #11697 to see whether it resolves the issue? Thanks! |
|
Record here, thanks @rui-mo 's confirmation tryDestrucSafe stack: DBI destruct stack: thread: They are confirmed in the same thread, and DBI is released before tryDestrucSafe. If everything work correctly, we needn't wait in tryDestrucSafe for DBI release. |
|
@ayushi-agarwal can you print the call stack and the thread id when you hit the issue? Which can help us to identify where the issue is. |
What changes are proposed in this pull request?
Add custom direct buffered input and input stream to allow additional control over how input buffers are created and managed.
Follow-up for: #11249.
How was this patch tested?
under test
Related issue: #9456