-
Notifications
You must be signed in to change notification settings - Fork 3
Description
I ran into problems when trying to use two separate instances of the Benchmark class pointing to two different SQL warehouses. I could create the instances without issues, but whenever I executed a query on the second instance, it would throw a KeyError (for key "res") on this line while fetching the query history.
I found a potential explanation while digging through the source -- there is a global variable that is being used to store thread_local information that gets shared across all beaker.Benchmark instances:
beaker/src/beaker/benchmark.py
Lines 72 to 75 in a454d90
| def _get_thread_local_connection(self): | |
| if not hasattr(thread_local, "connection"): | |
| thread_local.connection = self._create_dbc() | |
| return thread_local.connection |
It looks like the query gets executed using self.sql_warehouse (here) which in turn is derived from the thread_local.connection (here). On the other hand, the get_query_history method is passed self.warehouse_id (here) which in my case is derived from the user-specified http_path (here). This means that the warehouse on which the query is executed stays the same across Benchmark instances, while the warehouse from which we fetch query history can vary from instance to instance.
Would it make sense for each instance of beaker.Benchmark to have its own thread_local connection (for example: self.thread_local = ... during __init__), and therefore its own value of self.sql_warehouse? I think this would solve the issue of self.warehouse_id getting out of sync with self.sql_warehouse. Or, if there's a technical reason why certain objects need to be shared across all Benchmark instances, could this limitation be better documented and/or caught earlier with a clear error message?