Conversation
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
Co-authored-by: antidodo <albin2993@gmail.com>
Co-authored-by: antidodo <albin2993@gmail.com>
Co-authored-by: antidodo <albin2993@gmail.com>
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
📝 WalkthroughWalkthroughReplace hub_core_client with hub_client for hub interactions; add AnalysisStoppedLog and stream it when analyses stop; switch status polling code from async to sync HTTP client and add partner-status retrieval/submission; add helper functions for node status retrieval; update Kubernetes nginx service selector to use the storage service label. Changes
Sequence Diagram(s)sequenceDiagram
participant Poller as StatusPoller
participant Deploy as InternalDeployment
participant Hub as HubClient
participant Nginx as NginxEndpoint
Poller->>Deploy: GET /analysis/healthz (sync)
Deploy-->>Poller: deployment status
alt node_analysis_id present
Poller->>Hub: GET node analyses for analysis_id
Hub-->>Poller: node statuses (map)
Poller->>Nginx: POST /node/partner-status (partner statuses)
Nginx-->>Poller: 200/error
end
alt analysis stopped
Poller->>Hub: stream AnalysisStoppedLog(analysis_id, node_id)
Hub-->>Poller: ack
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.Add a .trivyignore file to your project to customize which findings Trivy reports. |
Co-authored-by: Nightknight3000 <alexander.roehl@uni-tuebingen.de>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/api/api.py`:
- Around line 199-205: Before calling stop_analysis('all', ...), capture the set
of analyses that were actually running by calling database.get_analysis_ids()
and filtering with database.analysis_is_running(analysis_id) (e.g.,
running_ids_before = [id for id in database.get_analysis_ids() if
database.analysis_is_running(id)]); then call stop_analysis('all',
self.database) and iterate only over running_ids_before (optionally re-check
database.analysis_is_running(analysis_id) is now False) when creating and
streaming AnalysisStoppedLog via stream_logs to ensure you only emit stopped
logs for analyses that truly transitioned to stopped.
In `@src/status/status.py`:
- Around line 180-185: The branch checking was_stopped never triggers because
_get_internal_deployment_status currently normalizes non-finished/running/stuck
statuses to failed; update that normalization so a true internal "stopped"
status is preserved (i.e., return AnalysisStatus.STOPPED or its .value from
_get_internal_deployment_status instead of mapping it to failed), and keep the
existing check that uses was_stopped (int_status ==
AnalysisStatus.STOPPED.value) so the finalization path for stopped runs is
executed; ensure any callers of _get_internal_deployment_status still handle the
preserved STOPPED enum/value.
- Around line 136-143: The code calls get_partner_node_statuses(...) and then
unconditionally uses node_statuses (len(node_statuses) and as JSON), which can
be None; change status submission logic in the status loop to first check
node_statuses is not None (or coerce to an empty list/dict) before computing
len() or including it in the POST body; specifically update the block that calls
get_partner_node_statuses, the subsequent print and the Client.post JSON payload
so it skips the HTTP call or sends an empty safe structure when node_statuses is
None, ensuring no TypeError is raised and the broad except in status_loop does
not hide this case.
- Around line 140-143: The per-request httpx.Client invocations in
src/status/status.py (e.g., the Client(...) used to POST to
"/analysis/partner_status" with deployment_name/PORTS and the similar clients at
the other locations) must be used as context managers (with Client(...) as
client: client.post(...)) so transports are closed instead of leaked; update
those call sites (the Client(...) creation around the partner_status POST and
the other two places around lines ~195-197) to use with and close the client
after the request. Also extend _refresh_keycloak_token() error handling to catch
httpx.ConnectError and httpx.ConnectTimeout (in addition to
httpx.HTTPStatusError), log the failure and either return an appropriate
sentinel/falsy value or retry/backoff as the surrounding code expects so
network/connectivity failures don’t bubble out of the long-running status loop.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6794f777-8eee-40d6-97bb-22efea4c431a
📒 Files selected for processing (5)
src/api/api.pysrc/k8s/kubernetes.pysrc/resources/log/entity.pysrc/status/status.pysrc/utils/hub_client.py
| response = stop_analysis('all', self.database) | ||
| for analysis_id in self.database.get_analysis_ids(): | ||
| stream_logs(AnalysisStoppedLog(analysis_id), | ||
| self.node_id, | ||
| self.enable_hub_logging, | ||
| self.database, | ||
| self.hub_client) |
There was a problem hiding this comment.
Only emit stop logs for analyses that actually transitioned to stopped.
database.get_analysis_ids() is treated elsewhere as the full set and filtered through database.analysis_is_running() in src/status/status.py Lines 55-56. Iterating it here after stop_analysis('all', ...) will also append AnalysisStoppedLog entries to finished/failed analyses, and stream_logs() can then propagate a bogus stopped lifecycle event.
Proposed fix
- response = stop_analysis('all', self.database)
- for analysis_id in self.database.get_analysis_ids():
+ analysis_ids = [
+ analysis_id
+ for analysis_id in self.database.get_analysis_ids()
+ if self.database.analysis_is_running(analysis_id)
+ ]
+ response = stop_analysis('all', self.database)
+ for analysis_id in analysis_ids:
stream_logs(AnalysisStoppedLog(analysis_id),
self.node_id,
self.enable_hub_logging,
self.database,
self.hub_client)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| response = stop_analysis('all', self.database) | |
| for analysis_id in self.database.get_analysis_ids(): | |
| stream_logs(AnalysisStoppedLog(analysis_id), | |
| self.node_id, | |
| self.enable_hub_logging, | |
| self.database, | |
| self.hub_client) | |
| analysis_ids = [ | |
| analysis_id | |
| for analysis_id in self.database.get_analysis_ids() | |
| if self.database.analysis_is_running(analysis_id) | |
| ] | |
| response = stop_analysis('all', self.database) | |
| for analysis_id in analysis_ids: | |
| stream_logs(AnalysisStoppedLog(analysis_id), | |
| self.node_id, | |
| self.enable_hub_logging, | |
| self.database, | |
| self.hub_client) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/api/api.py` around lines 199 - 205, Before calling stop_analysis('all',
...), capture the set of analyses that were actually running by calling
database.get_analysis_ids() and filtering with
database.analysis_is_running(analysis_id) (e.g., running_ids_before = [id for id
in database.get_analysis_ids() if database.analysis_is_running(id)]); then call
stop_analysis('all', self.database) and iterate only over running_ids_before
(optionally re-check database.analysis_is_running(analysis_id) is now False)
when creating and streaming AnalysisStoppedLog via stream_logs to ensure you
only emit stopped logs for analyses that truly transitioned to stopped.
| node_statuses = get_partner_node_statuses(hub_client, analysis_id, node_analysis_id) | ||
| deployment_name = database.get_latest_deployment(analysis_id).deployment_name | ||
| try: # try except, in case analysis api is not yet ready | ||
| print(f"Node statuses to be submitted to {f'http://nginx-{deployment_name}:80'}: {node_statuses} (size={len(node_statuses)})") | ||
| response = Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}").post(url="/analysis/partner_status", | ||
| headers=[('Connection', 'close')], | ||
| json={'partner_status': node_statuses}) | ||
| response.raise_for_status() |
There was a problem hiding this comment.
Handle the None fallback before touching node_statuses.
get_partner_node_statuses() returns Optional[...], but Line 139 calls len(node_statuses) unconditionally and the POST body also assumes a dict. When hub lookup fails, this turns the intended no-op path into a TypeError that gets swallowed by the broad except in status_loop().
Proposed fix
def inform_analysis_of_partner_statuses(database: Database,
hub_client: flame_hub.CoreClient,
analysis_id: str,
node_analysis_id: str) -> Optional[dict[str, str]]:
node_statuses = get_partner_node_statuses(hub_client, analysis_id, node_analysis_id)
+ if node_statuses is None:
+ return None
deployment_name = database.get_latest_deployment(analysis_id).deployment_name
try: # try except, in case analysis api is not yet ready
print(f"Node statuses to be submitted to {f'http://nginx-{deployment_name}:80'}: {node_statuses} (size={len(node_statuses)})")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| node_statuses = get_partner_node_statuses(hub_client, analysis_id, node_analysis_id) | |
| deployment_name = database.get_latest_deployment(analysis_id).deployment_name | |
| try: # try except, in case analysis api is not yet ready | |
| print(f"Node statuses to be submitted to {f'http://nginx-{deployment_name}:80'}: {node_statuses} (size={len(node_statuses)})") | |
| response = Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}").post(url="/analysis/partner_status", | |
| headers=[('Connection', 'close')], | |
| json={'partner_status': node_statuses}) | |
| response.raise_for_status() | |
| node_statuses = get_partner_node_statuses(hub_client, analysis_id, node_analysis_id) | |
| if node_statuses is None: | |
| return None | |
| deployment_name = database.get_latest_deployment(analysis_id).deployment_name | |
| try: # try except, in case analysis api is not yet ready | |
| print(f"Node statuses to be submitted to {f'http://nginx-{deployment_name}:80'}: {node_statuses} (size={len(node_statuses)})") | |
| response = Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}").post(url="/analysis/partner_status", | |
| headers=[('Connection', 'close')], | |
| json={'partner_status': node_statuses}) | |
| response.raise_for_status() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/status/status.py` around lines 136 - 143, The code calls
get_partner_node_statuses(...) and then unconditionally uses node_statuses
(len(node_statuses) and as JSON), which can be None; change status submission
logic in the status loop to first check node_statuses is not None (or coerce to
an empty list/dict) before computing len() or including it in the POST body;
specifically update the block that calls get_partner_node_statuses, the
subsequent print and the Client.post JSON payload so it skips the HTTP call or
sends an empty safe structure when node_statuses is None, ensuring no TypeError
is raised and the broad except in status_loop does not hide this case.
| response = Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}").post(url="/analysis/partner_status", | ||
| headers=[('Connection', 'close')], | ||
| json={'partner_status': node_statuses}) | ||
| response.raise_for_status() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "status.py" -path "*/status/*" | head -20Repository: PrivateAIM/node-pod-orchestration
Length of output: 99
🏁 Script executed:
cat -n src/status/status.py | head -250Repository: PrivateAIM/node-pod-orchestration
Length of output: 15453
🏁 Script executed:
# Check imports and the _refresh_keycloak_token function
rg "def _refresh_keycloak_token|from httpx|import httpx|ConnectError|ConnectTimeout" src/status/status.py -A 3 -B 1Repository: PrivateAIM/node-pod-orchestration
Length of output: 1170
Close the per-request httpx.Clients and add error handling for network failures in token refresh.
These branches allocate fresh Clients without closing them. In the long-running status loop, this accumulates open transports. Additionally, _refresh_keycloak_token() at line 239 only handles HTTPStatusError but will let ConnectError/ConnectTimeout bubble out.
Suggested pattern
- response = Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}").post(url="/analysis/partner_status",
- headers=[('Connection', 'close')],
- json={'partner_status': node_statuses})
+ with Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}") as client:
+ response = client.post(url="/analysis/partner_status",
+ headers=[('Connection', 'close')],
+ json={'partner_status': node_statuses})
- response = Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}").get("/analysis/healthz",
- headers=[('Connection', 'close')])
+ with Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}") as client:
+ response = client.get("/analysis/healthz",
+ headers=[('Connection', 'close')])
- try:
- response = Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}").post("/analysis/token_refresh",
- json={'token': keycloak_token},
- headers=[('Connection', 'close')])
+ try:
+ with Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}") as client:
+ response = client.post("/analysis/token_refresh",
+ json={'token': keycloak_token},
+ headers=[('Connection', 'close')])
response.raise_for_status()
except HTTPStatusError as e:
print(f"Error: Failed to refresh keycloak token in deployment {deployment_name}.\n{e}")
+ except (ConnectError, ConnectTimeout) as e:
+ print(f"Error: Failed to reach deployment {deployment_name} for token refresh.\n{e}")Also applies to: 195-197, 239-241
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/status/status.py` around lines 140 - 143, The per-request httpx.Client
invocations in src/status/status.py (e.g., the Client(...) used to POST to
"/analysis/partner_status" with deployment_name/PORTS and the similar clients at
the other locations) must be used as context managers (with Client(...) as
client: client.post(...)) so transports are closed instead of leaked; update
those call sites (the Client(...) creation around the partner_status POST and
the other two places around lines ~195-197) to use with and close the client
after the request. Also extend _refresh_keycloak_token() error handling to catch
httpx.ConnectError and httpx.ConnectTimeout (in addition to
httpx.HTTPStatusError), log the failure and either return an appropriate
sentinel/falsy value or retry/backoff as the surrounding code expects so
network/connectivity failures don’t bubble out of the long-running status loop.
| was_stopped = int_status == AnalysisStatus.STOPPED.value | ||
| if is_stuck or is_slow: | ||
| return 'unstuck' | ||
| elif newly_running: | ||
| return 'running' | ||
| elif speedy_finished or newly_ended or firmly_stuck: | ||
| elif speedy_finished or newly_ended or firmly_stuck or was_stopped: |
There was a problem hiding this comment.
was_stopped is dead code until the health mapper returns stopped.
_get_internal_deployment_status() still normalizes every non-finished/running/stuck result to failed, so this branch never fires. A real internal stopped result will continue down the failure path instead of the intended finishing path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/status/status.py` around lines 180 - 185, The branch checking was_stopped
never triggers because _get_internal_deployment_status currently normalizes
non-finished/running/stuck statuses to failed; update that normalization so a
true internal "stopped" status is preserved (i.e., return AnalysisStatus.STOPPED
or its .value from _get_internal_deployment_status instead of mapping it to
failed), and keep the existing check that uses was_stopped (int_status ==
AnalysisStatus.STOPPED.value) so the finalization path for stopped runs is
executed; ensure any callers of _get_internal_deployment_status still handle the
preserved STOPPED enum/value.
There was a problem hiding this comment.
♻️ Duplicate comments (4)
src/status/status.py (4)
136-150:⚠️ Potential issue | 🟡 MinorHandle
Nonereturn fromget_partner_node_statusesand close the HTTP client.Two issues remain:
get_partner_node_statuses()returnsOptional[...], but the code unconditionally sendsnode_statusesin the POST body. When the hub lookup fails, this sends{'partner_status': null}which may not be the intended behavior.- The
Clientinstance is not closed, leaking resources in the long-running status loop.🔧 Proposed fix
def inform_analysis_of_partner_statuses(database: Database, hub_client: flame_hub.CoreClient, analysis_id: str, node_analysis_id: str) -> Optional[dict[str, str]]: node_statuses = get_partner_node_statuses(hub_client, analysis_id, node_analysis_id) + if node_statuses is None: + return None deployment_name = database.get_latest_deployment(analysis_id).deployment_name try: # try except, in case analysis api is not yet ready - response = Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}").post(url="/analysis/partner_status", - headers=[('Connection', 'close')], - json={'partner_status': node_statuses}) + with Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}") as client: + response = client.post(url="/analysis/partner_status", + headers=[('Connection', 'close')], + json={'partner_status': node_statuses}) response.raise_for_status() return response.json()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/status/status.py` around lines 136 - 150, get_partner_node_statuses may return None and the Client is never closed; update the block that builds node_statuses and uses Client so you first check the result of get_partner_node_statuses(…) and avoid sending {'partner_status': None} (either return None/skip the POST or send an empty/validated payload) and ensure the HTTP client is closed by using the Client as a context manager (with Client(...) as client:) or explicitly calling client.close() after use; update the code paths that reference node_statuses and the POST (the Client(...) .post and response handling) so they respect the None case and always close the Client.
179-184:⚠️ Potential issue | 🟡 Minor
was_stoppedis unreachable with current status mapping.The
was_stoppedcheck will never evaluate toTruebecause_get_internal_deployment_status()(lines 220-228) maps any status that isn'tFINISHED,RUNNING, orSTUCKtoFAILED. TheSTOPPEDstatus is never preserved.To enable this code path, update the status mapping in
_get_internal_deployment_status:🔧 Proposed fix in _get_internal_deployment_status
# Map status from response to preset values if analysis_status == AnalysisStatus.FINISHED.value: health_status = AnalysisStatus.FINISHED.value elif analysis_status == AnalysisStatus.RUNNING.value: health_status = AnalysisStatus.RUNNING.value elif analysis_status == AnalysisStatus.STUCK.value: health_status = AnalysisStatus.STUCK.value + elif analysis_status == AnalysisStatus.STOPPED.value: + health_status = AnalysisStatus.STOPPED.value else: health_status = AnalysisStatus.FAILED.value return health_status🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/status/status.py` around lines 179 - 184, The was_stopped branch is unreachable because _get_internal_deployment_status currently collapses any non-FINISHED/RUNNING/STUCK status into FAILED; update _get_internal_deployment_status so it preserves AnalysisStatus.STOPPED (i.e., detect AnalysisStatus.STOPPED and return the corresponding internal status/value rather than mapping it to FAILED) so that int_status can equal AnalysisStatus.STOPPED.value and the was_stopped check in the status resolution logic will be reachable; verify _get_internal_deployment_status and any enum/value returned remain consistent with the was_stopped comparison.
238-244:⚠️ Potential issue | 🟡 MinorClose the client and handle network errors in token refresh.
Two issues:
- The
Clientis not closed, leaking resources.- Only
HTTPStatusErroris caught;ConnectErrorandConnectTimeoutwill propagate up and could disrupt the status loop.🔧 Proposed fix
try: - response = Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}").post("/analysis/token_refresh", - json={'token': keycloak_token}, - headers=[('Connection', 'close')]) + with Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}") as client: + response = client.post("/analysis/token_refresh", + json={'token': keycloak_token}, + headers=[('Connection', 'close')]) response.raise_for_status() except HTTPStatusError as e: print(f"Error: Failed to refresh keycloak token in deployment {deployment_name}.\n{e}") + except (ConnectError, ConnectTimeout) as e: + print(f"Error: Failed to reach deployment {deployment_name} for token refresh.\n{e}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/status/status.py` around lines 238 - 244, The token refresh code uses Client(...) without closing it and only catches HTTPStatusError; change to create the client with a context manager (use "with Client(base_url=...) as client:" or ensure client.aclose() in a finally block) and replace the current except to catch HTTPStatusError, ConnectError, and ConnectTimeout (import these exceptions) so network/connect failures are handled (log/print the error along with deployment_name and the exception); keep the existing response.raise_for_status() call inside the try so HTTP errors are still raised and handled.
195-197:⚠️ Potential issue | 🟡 MinorClose the
httpx.Clientto prevent resource leaks.The
Clientinstance is created but never closed. In the long-running status loop with retry logic, this accumulates open transports.🔧 Proposed fix
try: - response = Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}").get("/analysis/healthz", - headers=[('Connection', 'close')]) + with Client(base_url=f"http://nginx-{deployment_name}:{PORTS['nginx'][0]}") as client: + response = client.get("/analysis/healthz", + headers=[('Connection', 'close')]) response.raise_for_status() break🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/status/status.py` around lines 195 - 197, The httpx.Client is created but never closed, causing transport/resource leaks; update the call that creates Client(...) (the instance passed to get and whose response.raise_for_status() is used) to either be used in a context manager (with Client(base_url=...) as client: response = client.get(...); response.raise_for_status()) or ensure client.close() is called in a finally block after using response.raise_for_status(), so the Client instance is properly disposed between retries.
🧹 Nitpick comments (1)
src/status/status.py (1)
78-86: Consider narrowing the exception catch or logging the exception type.The broad
Exceptioncatch prevents the status loop from crashing, which is appropriate for resilience. However, it may hide unexpected errors. Consider catching specific exceptions or at minimum logging the exception type for better debuggability.♻️ Suggested improvement
try: # Inform local analysis of partner node statuses _ = inform_analysis_of_partner_statuses(database, hub_client, analysis_id, node_analysis_id) except Exception as e: - print(f"\tPO STATUS LOOP - Error when attempting to access partner_status endpoint of {analysis_id} ({repr(e)})") + print(f"\tPO STATUS LOOP - Error when attempting to access partner_status endpoint of {analysis_id} ({type(e).__name__}: {e!r})")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/status/status.py` around lines 78 - 86, Narrow the broad except in the try/except around the call to inform_analysis_of_partner_statuses(database, hub_client, analysis_id, node_analysis_id) by either catching specific expected exceptions (e.g., requests.exceptions.RequestException, ConnectionError, TimeoutError, or any domain-specific errors thrown by inform_analysis_of_partner_statuses) or, if you must keep a general fallback, at minimum include the exception type in the log; replace the generic except Exception as e with targeted except clauses and change the print to include type(e).__name__ along with repr(e) so the status loop remains resilient but errors are easier to diagnose.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/status/status.py`:
- Around line 136-150: get_partner_node_statuses may return None and the Client
is never closed; update the block that builds node_statuses and uses Client so
you first check the result of get_partner_node_statuses(…) and avoid sending
{'partner_status': None} (either return None/skip the POST or send an
empty/validated payload) and ensure the HTTP client is closed by using the
Client as a context manager (with Client(...) as client:) or explicitly calling
client.close() after use; update the code paths that reference node_statuses and
the POST (the Client(...) .post and response handling) so they respect the None
case and always close the Client.
- Around line 179-184: The was_stopped branch is unreachable because
_get_internal_deployment_status currently collapses any
non-FINISHED/RUNNING/STUCK status into FAILED; update
_get_internal_deployment_status so it preserves AnalysisStatus.STOPPED (i.e.,
detect AnalysisStatus.STOPPED and return the corresponding internal status/value
rather than mapping it to FAILED) so that int_status can equal
AnalysisStatus.STOPPED.value and the was_stopped check in the status resolution
logic will be reachable; verify _get_internal_deployment_status and any
enum/value returned remain consistent with the was_stopped comparison.
- Around line 238-244: The token refresh code uses Client(...) without closing
it and only catches HTTPStatusError; change to create the client with a context
manager (use "with Client(base_url=...) as client:" or ensure client.aclose() in
a finally block) and replace the current except to catch HTTPStatusError,
ConnectError, and ConnectTimeout (import these exceptions) so network/connect
failures are handled (log/print the error along with deployment_name and the
exception); keep the existing response.raise_for_status() call inside the try so
HTTP errors are still raised and handled.
- Around line 195-197: The httpx.Client is created but never closed, causing
transport/resource leaks; update the call that creates Client(...) (the instance
passed to get and whose response.raise_for_status() is used) to either be used
in a context manager (with Client(base_url=...) as client: response =
client.get(...); response.raise_for_status()) or ensure client.close() is called
in a finally block after using response.raise_for_status(), so the Client
instance is properly disposed between retries.
---
Nitpick comments:
In `@src/status/status.py`:
- Around line 78-86: Narrow the broad except in the try/except around the call
to inform_analysis_of_partner_statuses(database, hub_client, analysis_id,
node_analysis_id) by either catching specific expected exceptions (e.g.,
requests.exceptions.RequestException, ConnectionError, TimeoutError, or any
domain-specific errors thrown by inform_analysis_of_partner_statuses) or, if you
must keep a general fallback, at minimum include the exception type in the log;
replace the generic except Exception as e with targeted except clauses and
change the print to include type(e).__name__ along with repr(e) so the status
loop remains resilient but errors are easier to diagnose.
Summary by CodeRabbit
New Features
Refactor