[GLUTEN-11485][VL] Fix the race condition in ArrowMemoryPool#11493
[GLUTEN-11485][VL] Fix the race condition in ArrowMemoryPool#11493baibaichen merged 2 commits intoapache:mainfrom
Conversation
| if (auto pool = it->second.lock()) { | ||
| return pool; | ||
| } | ||
| arrowPools_.erase(name); | ||
| } | ||
| auto pool = std::make_shared<ArrowMemoryPool>( | ||
| blockListener_.get(), [this, name](arrow::MemoryPool* pool) { this->dropMemoryPool(name); }); | ||
|
|
||
| auto pool = std::make_shared<ArrowMemoryPool>(blockListener_.get()); |
There was a problem hiding this comment.
If we no longer remove the entries, will there be any memory leak risk caused by the expanding entry list in arrowPools_?
There was a problem hiding this comment.
It looks like not an issue as long as the number of keys is not too large.
There was a problem hiding this comment.
@baibaichen mentioned in the issue that we can use a background thread to clean up, but I wonder if that might be too heavy for the current use of the arrow pool, since only shuffle and parquet write create named arrow memory pool.
If necessary, perhaps looping over the weak_ptr each time getOrCreateArrowMemoryPool is called and remove the expired ones. @zhztheplayer What do you think?
There was a problem hiding this comment.
It looks like not an issue as long as the number of keys is not too large.
I thought it might be up to the usage. Hi @marin-ma, do you think we can skip cleaning up the Arrow pools? In that case, we can simply store shared pointer as the class member.
If necessary, perhaps looping over the weak_ptr each time getOrCreateArrowMemoryPool is called and remove the expired ones.
Otherwise, this might be helpful, but we should also figure out whether it will cause any instability regarding performance.
There was a problem hiding this comment.
If there aren’t many entries in arrowPools_, I think it’s fine to keep them via shared_ptr. A MemoryPool object itself shouldn’t take much memory anyway.
There was a problem hiding this comment.
Using weak_ptr can help to clarify the ownership of the arrow pool. For example if the arrow pool is used by shuffle A, it should be destroyed together with the shuffle writer A itself. And then the next shuffle B will create a new pool. In this way we can also track the memory allocation status for each shuffle separately.
There was a problem hiding this comment.
@marin-ma I meant, is it feasible to use std::unordered_map<std::string, std::shared_ptr<ArrowMemoryPool>> arrowPools_? Only by changing the map value type from std::weak_ptr to std::shared_ptr.
There was a problem hiding this comment.
Because in the latest PR's code, the expired map entries won't get removed from map until being replaced by new value of the same key. Is this an intentional change?
There was a problem hiding this comment.
Because in the latest PR's code, the expired map entries won't get removed from map until being replaced by new value of the same key. Is this an intentional change?
Yes. In this way it can create new arrow pools for each shuffle writer instance. If changing to std::unordered_map<std::string, std::shared_ptr<ArrowMemoryPool>> arrowPools_, the new shuffle writers will reuse the ones created by the previous shuffle.
|
@baibaichen Do you have any more comments? Thanks! |
No. Let's merge to see whether the issue is fixed or not. |
Related issue: #11485