-
Notifications
You must be signed in to change notification settings - Fork 11
Update fetch strategy #473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughApp orchestration now separately fetches non‑IAEA (F4E) experimental data via Changes
Sequence Diagram(s)sequenceDiagram
participant App as App (update_inputs)
participant Fetch as fetch module
participant Remote as Remote (GitHub/RAW/IAEA)
participant FS as Filesystem
App->>Fetch: fetch_nonIAEA_exp_data(exp_data_root)
Fetch->>Remote: download non‑IAEA exp bundle (RAW/Git)
Remote-->>Fetch: bundle.zip
Fetch->>FS: install only_exp_data -> ROOT/_exp_-_exp_
FS-->>Fetch: install result
Fetch-->>App: success/failure
alt F4E token present
App->>Fetch: fetch_f4e_inputs(inputs_root, access_token)
Fetch->>Remote: access F4E inputs via API/GitLab
Remote-->>Fetch: inputs bundle
Fetch->>FS: install only_inputs -> inputs_root
FS-->>Fetch: install result
Fetch-->>App: success/failure
else no token
App-->>App: skip F4E inputs update (log)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/jade/app/fetch.py (1)
187-225: fetch_f4e_inputs now only installs inputs; update docstring to avoid confusionThe refactor to call
_install_standard_folder_structure(..., only_inputs=True)correctly limits this function to inputs, and the""placeholders are never used in that branch, so behavior is fine.However, the docstring still says “inputs and experimental data”, which is now misleading. Suggest tightening it, e.g.:
-def fetch_f4e_inputs(inputs_root: PathLike, access_token: str) -> bool: - """Fetch F4E benchmark inputs and experimental data and copy them to - the correct folder in jade structure. This will always override the available - data. +def fetch_f4e_inputs(inputs_root: PathLike, access_token: str) -> bool: + """Fetch F4E benchmark inputs and copy them to + the correct folder in the JADE structure. This will always override the available + inputs.
🧹 Nitpick comments (3)
src/jade/app/fetch.py (1)
125-147: Selective install flags look correct; consider guarding impossible combinationsThe
only_exp_data/only_inputsbranching preserves the previous default behavior and correctly limits installation to the requested subtree. Given current call sites you never pass both asTrue, but to harden this helper you might add a small guard:def _install_standard_folder_structure( extracted_folder: str | os.PathLike, inputs_root: PathLike, exp_data_root: PathLike, path_to_inputs: str | os.PathLike, path_to_exp_data: str | os.PathLike, only_exp_data: bool = False, only_inputs: bool = False, ) -> bool: if isinstance(extracted_folder, bool): return False + if only_exp_data and only_inputs: + raise ValueError("only_exp_data and only_inputs cannot both be True")Not required for correctness today, but it prevents accidental misuse later.
tests/app/test_fetch.py (2)
7-7: Tests correctly adapted to the new fetch_f4e_inputs API
- Importing
fetch_f4e_exp_dataalongsidefetch_f4e_inputsandfetch_iaea_inputsis consistent with the refactored fetch module.test_wrong_fetch_f4e_inputsnow callsfetch_f4e_inputs(inp_path, access_token="wrongtoken"), which is an appropriate negative test for bad credentials.test_fetch_f4e_inputscorrectly uses the new signature with justinp_pathandF4E_GITLAB_TOKEN, and re-calls the function to validate overwriting behavior.Only minor nit: the docstring of
test_wrong_fetch_f4e_inputsstill mentions “IAEA website”, which no longer matches what the test is exercising; consider updating it when convenient.Also applies to: 49-50, 53-69
71-81: New test_fetch_f4e_exp_data verifies the new RAW GitHub-based flow; fix stale wordingThe test does the right behavioral checks for
fetch_f4e_exp_data:
- Creates an
expdirectory, runs the fetch, asserts success and non-empty content.- Calls it again on a non-empty directory to ensure idempotent overwrite behavior.
However, the docstring still says “from the F4E GitLab”, while
fetch_f4e_exp_datanow pulls from theJADE-RAW-RESULTSGitHub repository viaRAW_DATA_GITHUB_URL. To avoid confusion for future maintainers, consider updating that wording to match the new source (e.g., “from the JADE-RAW-RESULTS GitHub repository”).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/jade/app/app.py(2 hunks)src/jade/app/fetch.py(4 hunks)src/jade/utilities.py(0 hunks)tests/app/test_fetch.py(2 hunks)
💤 Files with no reviewable changes (1)
- src/jade/utilities.py
🧰 Additional context used
🧬 Code graph analysis (2)
tests/app/test_fetch.py (1)
src/jade/app/fetch.py (3)
fetch_f4e_inputs(187-224)fetch_iaea_inputs(156-184)fetch_f4e_exp_data(227-254)
src/jade/app/app.py (1)
src/jade/app/fetch.py (3)
fetch_f4e_inputs(187-224)fetch_iaea_inputs(156-184)fetch_f4e_exp_data(227-254)
🪛 Ruff (0.14.7)
tests/app/test_fetch.py
49-49: Possible hardcoded password assigned to argument: "access_token"
(S106)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test (ubuntu-latest, 3.12, false)
- GitHub Check: test (ubuntu-latest, 3.10, false)
- GitHub Check: test (ubuntu-latest, 3.11, false)
- GitHub Check: test (windows-latest, 3.10, false)
- GitHub Check: test (windows-latest, 3.11, false)
- GitHub Check: test (windows-latest, 3.12, false)
- GitHub Check: test (ubuntu-latest, 3.11, false)
- GitHub Check: test (ubuntu-latest, 3.10, false)
- GitHub Check: test (ubuntu-latest, 3.12, false)
- GitHub Check: test (windows-latest, 3.12, false)
- GitHub Check: test (windows-latest, 3.10, false)
- GitHub Check: test (windows-latest, 3.11, false)
🔇 Additional comments (2)
src/jade/app/fetch.py (1)
15-16: New URLs/constants align with the new data layoutUsing explicit archive URLs for IAEA and JADE-RAW-RESULTS looks correct and keeps endpoints centralized; no issues from this change alone.
src/jade/app/app.py (1)
15-15: update_inputs orchestration matches the new fetching strategy; clarify desired failure semanticsThe new flow:
- Always fetches IAEA inputs + experimental data.
- Always fetches F4E experimental data from GitHub (no token).
- Optionally fetches F4E inputs from GitLab when
F4E_GITLAB_TOKENis set, logging and skipping otherwise.This matches the PR goals (central experimental data on GitHub, optional F4E inputs via token). All three steps currently log on failure but allow
update_inputs()to return normally, so the app can proceed with a partially updated tree.If the intent is that missing experimental data should be a hard failure (e.g., to avoid running benchmarks with incomplete reference data), consider either:
- making
update_inputs()return a boolean and checking it at call sites, or- raising on failure of IAEA or F4E experimental data fetches.
If partial updates are acceptable, the current behavior is fine; just worth confirming this is intentional.
Also applies to: 89-118
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/jade/app/fetch.py (2)
173-173: Handle failed downloads instead of passingstr(False)through.If
_fetch_from_git(IAEA_URL)fails and returnsFalse, the current code wraps it instr(...), creating"False". This string bypasses theisinstance(extracted_folder, bool)check at line 134, leading toFileNotFoundErrorwhen_install_datatries to callos.listdir("False/jade_open_benchmarks/inputs")at line 112.Apply this fix:
- extracted_folder = str(_fetch_from_git(IAEA_URL)) # no token required anymore + extracted_folder = _fetch_from_git(IAEA_URL) # no token required anymore + if not extracted_folder: + return False path_to_inputs = Path(extracted_folder, "jade_open_benchmarks", "inputs")Based on past review comments, this same pattern should be applied to
fetch_f4e_exp_dataat line 242-244.
227-254: Handle failed downloads instead of passingstr(False)through.If
_fetch_from_git(RAW_DATA_GITHUB_URL)fails and returnsFalse, the current code wraps it instr(...), creating the string"False". This bypasses theisinstance(extracted_folder, bool)check at line 134, and_install_datawill raiseFileNotFoundErrorwhen tryingos.listdir("False/ROOT/_exp_-_exp_")at line 112.Apply this fix:
- extracted_folder = str( - _fetch_from_git(RAW_DATA_GITHUB_URL) - ) # no token required anymore + extracted_folder = _fetch_from_git(RAW_DATA_GITHUB_URL) # no token required anymore + if not extracted_folder: + return False path_to_exp_data = Path( extracted_folder,This maintains consistency with
fetch_from_gitlab(lines 210-211) which properly checks the result before proceeding.
🧹 Nitpick comments (1)
src/jade/app/fetch.py (1)
131-145: Consider validating mutual exclusivity of flags.The current logic prioritizes
only_exp_dataoveronly_inputswhen both are True, but this behavior isn't documented or validated. While current callers don't trigger this case, adding a guard would prevent future confusion.Optional: Add validation at the start of the function:
+ if only_exp_data and only_inputs: + raise ValueError("only_exp_data and only_inputs cannot both be True") + if only_exp_data: to_install = [(path_to_exp_data, exp_data_root)]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/jade/app/fetch.py(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test (windows-latest, 3.10, false)
- GitHub Check: test (ubuntu-latest, 3.10, false)
- GitHub Check: test (ubuntu-latest, 3.12, false)
- GitHub Check: test (ubuntu-latest, 3.11, false)
- GitHub Check: test (windows-latest, 3.12, false)
- GitHub Check: test (windows-latest, 3.11, false)
- GitHub Check: test (ubuntu-latest, 3.10, false)
- GitHub Check: test (ubuntu-latest, 3.11, false)
- GitHub Check: test (windows-latest, 3.11, false)
- GitHub Check: test (windows-latest, 3.12, false)
- GitHub Check: test (windows-latest, 3.10, false)
- GitHub Check: test (ubuntu-latest, 3.12, false)
🔇 Additional comments (2)
src/jade/app/fetch.py (2)
15-16: LGTM: Constants correctly point to GitHub archive URLs.The new constants are properly formatted and point to the correct repositories for fetching IAEA and RAW experimental data.
187-224: API change: All callers have been updated to the new signature.The function signature change from
fetch_f4e_inputs(inputs_root, exp_data_root, access_token)tofetch_f4e_inputs(inputs_root, access_token)is correctly implemented throughout the codebase. All call sites in tests and application code use the new two-parameter signature.
| if not isinstance(extracted_folder, PathLike): # anything else that went wrong | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "aux_functions.py" -type fRepository: JADE-V-V/JADE
Length of output: 91
🏁 Script executed:
cat -n ./src/jade/helper/aux_functions.py | head -100Repository: JADE-V-V/JADE
Length of output: 3264
🏁 Script executed:
sed -n '205,220p' ./src/jade/app/fetch.pyRepository: JADE-V-V/JADE
Length of output: 577
🏁 Script executed:
head -50 ./src/jade/app/fetch.py | grep -E "import|PathLike"Repository: JADE-V-V/JADE
Length of output: 285
isinstance() check with PathLike will fail at runtime.
Line 212 uses isinstance(extracted_folder, PathLike) where PathLike is defined as Union[str, os.PathLike, Path] in jade.helper.aux_functions (line 16). Type aliases cannot be used with isinstance()—it requires a class or tuple of classes. At runtime, this will raise TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union.
Replace with a runtime check like: isinstance(extracted_folder, (str, os.PathLike, Path)) or use a helper function to validate the type.
🤖 Prompt for AI Agents
In src/jade/app/fetch.py around lines 212 to 213, the
isinstance(extracted_folder, PathLike) call uses a type-alias PathLike (defined
as Union[str, os.PathLike, Path]) which will raise TypeError at runtime; replace
that check with a runtime-valid type check such as isinstance(extracted_folder,
(str, os.PathLike, Path)) or call a small helper validator that explicitly
checks for str, os.PathLike, or pathlib.Path and return False when it doesn't
match.
alexvalentine94
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we OK to migrate the exp data to an open repo?
src/jade/app/app.py
Outdated
| logging.error("Failed to update the IAEA benchmark inputs.") | ||
|
|
||
| # Install F4E exp data | ||
| success = fetch_f4e_exp_data(self.tree.exp_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed as requested
src/jade/app/fetch.py
Outdated
| return success | ||
|
|
||
|
|
||
| def fetch_f4e_exp_data(exp_data_root: PathLike) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed as requested
|
Will the exp data be moved in separate PR? |
exp data was already public in the JADE tests and in the JADE/RAW_RESULTS repo since it was needed for the webapp plots. The rationale is that in theory such data can be retrieved on the internet and papers while it is the input files to reproduce the results the real asset to protect. |
Exp data is not distributed with JADE so no PR needed here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
src/jade/app/fetch.py (2)
204-223: Fixisinstance(extracted_folder, PathLike)– this will raiseTypeErrorat runtime
PathLikehere is a typing alias (Union[str, os.PathLike, Path]fromjade.helper.aux_functions), not a real runtime type. Using it inisinstance(extracted_folder, PathLike)will raiseTypeError: isinstance() arg 2 must be a type, a tuple of types, or a union.As soon as
fetch_from_gitlabreturns a validPath, this check will blow up instead of returningFalse, breaking both the app flow and the tests exercisingfetch_f4e_inputs.You can fix this by checking against the actual runtime types, e.g.:
@@ - if not extracted_folder: # authentication went wrong - return False - if not isinstance(extracted_folder, PathLike): # anything else that went wrong - return False + if not extracted_folder: # authentication went wrong or other failure + return False + if not isinstance(extracted_folder, (str, os.PathLike, Path)): + # Unexpected type from fetch_from_gitlab + return FalseThis keeps the existing behavior intent while avoiding the runtime exception.
227-253: Handle failed_fetch_from_gitinfetch_nonIAEA_exp_databefore building paths
extracted_folder = str(_fetch_from_git(RAW_DATA_GITHUB_URL))converts aFalsereturn into the string"False". When the download fails:
extracted_folderbecomes"False",path_to_exp_databecomesPath("False", "ROOT", "_exp_-_exp_"),_install_standard_folder_structuresees a non‑boolextracted_folderand proceeds,_install_datathen doesos.listdir("False/ROOT/_exp_-_exp_"), raisingFileNotFoundErrorinstead of a cleanFalse.This is the same failure mode already noted for the IAEA fetcher and introduces a new uncaught runtime path on network/HTTP errors.
A minimal, consistent fix is:
def fetch_nonIAEA_exp_data(exp_data_root: PathLike) -> bool: @@ - extracted_folder = str( - _fetch_from_git(RAW_DATA_GITHUB_URL) - ) # no token required anymore + extracted_folder = _fetch_from_git(RAW_DATA_GITHUB_URL) # no token required anymore + if not extracted_folder: + return False @@ - path_to_exp_data = Path( - extracted_folder, - "ROOT", - "_exp_-_exp_", - ) + path_to_exp_data = Path(extracted_folder, "ROOT", "_exp_-_exp_")For robustness and consistency, consider applying the same pattern to
fetch_iaea_inputs, which still stringifies_fetch_from_git(IAEA_URL)and is subject to the same issue.src/jade/app/app.py (1)
94-117: Clarify naming/logging for non‑IAEA vs F4E experimental dataThe new flow in
update_inputslooks correct (IAEA inputs+exp, then non‑IAEA exp, then F4E inputs with token guard), but the naming is a bit confusing:
- Function:
fetch_nonIAEA_exp_data(self.tree.exp_data)- Log message on failure:
"Failed to update the F4E experimental data."Given the function is explicitly “non‑IAEA” and is backed by
RAW_DATA_GITHUB_URL, consider aligning terminology (e.g. “non‑IAEA experimental data” or “RAW experimental data”) and/or renaming the helper to something likefetch_raw_exp_dataorfetch_f4e_exp_datadepending on what it actually fetches. This will make it easier for future maintainers to understand the data source.
🧹 Nitpick comments (2)
tests/app/test_fetch.py (2)
44-51: Address Ruff S106 (‘hardcoded password’) onaccess_token="wrongtoken"in testsRuff flags this line as S106 because a string literal is passed to a parameter named
access_token, even though this is clearly a dummy token in test code.If S106 is enforced in CI, you can silence it in a minimal way:
- success = fetch_f4e_inputs(inp_path, access_token="wrongtoken") + # Intentional invalid token for negative-path testing. + success = fetch_f4e_inputs( # noqa: S106 + inp_path, + access_token="wrongtoken", + )Alternatively, configure Ruff to ignore S106 in tests if that better matches your project policy.
71-81: Aligntest_fetch_f4e_exp_datadocstring/name with actual data source
test_fetch_f4e_exp_dataexercisesfetch_nonIAEA_exp_data, which internally pulls fromRAW_DATA_GITHUB_URL(GitHub JADE‑RAW‑RESULTS), not from the F4E GitLab as the docstring states:def test_fetch_f4e_exp_data(tmpdir): """ " Test that experimental data can be correctly fetched from the F4E GitLab.""" exp_path = tmpdir.mkdir("exp") success = fetch_nonIAEA_exp_data(exp_path)For clarity, consider renaming the test and updating the docstring to reflect the real source, e.g.:
-def test_fetch_f4e_exp_data(tmpdir): - """ " Test that experimental data can be correctly fetched from the F4E GitLab.""" +def test_fetch_non_iaea_exp_data(tmpdir): + """Test that non‑IAEA experimental data can be fetched from the RAW GitHub repo."""This will better document the new fetching strategy and avoid confusion between F4E GitLab inputs and non‑IAEA experimental data hosted on GitHub.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/jade/app/app.py(2 hunks)src/jade/app/fetch.py(4 hunks)tests/app/test_fetch.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/jade/app/app.py (1)
src/jade/app/fetch.py (3)
fetch_f4e_inputs(187-224)fetch_iaea_inputs(156-184)fetch_nonIAEA_exp_data(227-254)
tests/app/test_fetch.py (1)
src/jade/app/fetch.py (3)
fetch_f4e_inputs(187-224)fetch_iaea_inputs(156-184)fetch_nonIAEA_exp_data(227-254)
🪛 Ruff (0.14.8)
tests/app/test_fetch.py
49-49: Possible hardcoded password assigned to argument: "access_token"
(S106)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test (ubuntu-latest, 3.11, false)
- GitHub Check: test (ubuntu-latest, 3.12, false)
- GitHub Check: test (windows-latest, 3.12, false)
- GitHub Check: test (ubuntu-latest, 3.10, false)
- GitHub Check: test (windows-latest, 3.10, false)
- GitHub Check: test (windows-latest, 3.11, false)
- GitHub Check: test (windows-latest, 3.12, false)
- GitHub Check: test (ubuntu-latest, 3.11, false)
- GitHub Check: test (ubuntu-latest, 3.10, false)
- GitHub Check: test (windows-latest, 3.11, false)
- GitHub Check: test (ubuntu-latest, 3.12, false)
- GitHub Check: test (windows-latest, 3.10, false)
Since now the JADE web app has been updated to work with JADE v4 there is now a duplication of the storage of experimental data both on the raw results repository and the various repositories in which inputs are stored. Since F4E GitLab has some problems interacting with GitHub and other online applications, it has been decided to:
This PR implements the new fetching strategy in JADE
Summary by CodeRabbit
New Features
Improvements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.