Skip to content

Adds scripts for aggregating information on all supported datasets, tasks, and models into JSON files for web ingestion.#211

Draft
mmcdermott wants to merge 7 commits intodevfrom
add_web_scripts
Draft

Adds scripts for aggregating information on all supported datasets, tasks, and models into JSON files for web ingestion.#211
mmcdermott wants to merge 7 commits intodevfrom
add_web_scripts

Conversation

@mmcdermott
Copy link
Collaborator

@mmcdermott mmcdermott commented May 19, 2025

Closes #186.
Closes #187.
Closes #188.

Need to:

  • Stabilize tests.
  • Update README documentation re task naming / file convention.
  • Consider converting all settings to use installed files, rather than re-cloning the repo?
  • Strip suffixes from JSON keys

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new Python script to aggregate multiple JSON result files into a single output, updates the web README with instructions for its usage, adjusts project dependencies and linting options, and adds test fixture setup in conftest.py.

  • Adds aggregate_results.py to collate JSON result files.
  • Updates README.md for usage instructions.
  • Modifies pyproject.toml dependencies and configuration.
  • Adds a testing fixture in conftest.py.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

File Description
src/MEDS_DEV/web/aggregate_results.py New aggregation script with JSON error handling
src/MEDS_DEV/web/README.md Added documentation for aggregate_results.py
pyproject.toml Updated dependencies and added doctest options
conftest.py Introduced a fixture for test setup using tempfile

{"44": {"result": "data for 44"}, "200": {"result": "data for 200"}}
"""
if not input_dir.exists():
err_lines = ["Input directory '{input_dir.resolve()!s}' does not exist."]
Copy link

Copilot AI May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is intended to display the resolved input directory but is missing the 'f' prefix for f-string formatting. Consider changing it to: f"Input directory '{input_dir.resolve()!s}' does not exist."

Suggested change
err_lines = ["Input directory '{input_dir.resolve()!s}' does not exist."]
err_lines = [f"Input directory '{input_dir.resolve()!s}' does not exist."]

Copilot uses AI. Check for mistakes.
@codecov
Copy link

codecov bot commented May 19, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
18 1 17 0
View the top 1 failed test(s) by shortest run time
src.MEDS_DEV.web.collate_entities::MEDS_DEV.web.collate_entities.parse_nested_tree
Stack Traces | 0.012s run time
157         ...         README.md: "This is a README for the readmission/30d task."
158         ...         task.yaml: {"task": "value"}
159         ...   models:
160         ...     cehrbert:
161         ...       README.md: "This is a README for the model."
162         ...       model.yaml: {"model": "value"}
163         ...       refs.bib: "@article{model, paper}"
164         ...       requirements.txt: "numpy==1.21.0"
165         ... '''
166         >>> with yaml_disk(test_disk) as root_dir:
Expected:
    {'datasets/MIMIC/III': Node(name='datasets/MIMIC/III',
                                data={'readme': 'This is a README.',
                                      'dataset': {'foo': 'bar'},
                                      'predicates': {'predicate': 'value'},
                                      'refs': '@article{foo, bar}',
                                      'requirements': ['numpy==1.21.0', 'pandas==1.3.0']},
                                children=[]),
     'datasets/MIMIC': Node(name='datasets/MIMIC',
                            data={'readme': 'This is a README for the category.'},
                            children=['datasets/MIMIC/III', 'datasets/MIMIC/IV']),
     'datasets/MIMIC/IV': Node(name='datasets/MIMIC/IV',
                               data={'readme': 'This is another README.',
                                     'dataset': {'foo': 'baz'},
                                     'predicates': {'predicate': 'alt_value'},
                                     'refs': '@article{baz, qux}',
                                     'requirements': ['numpy==1.21.0', 'pandas==1.3.0']},
                               children=[]),
     'datasets/eICU': Node(name='datasets/eICU',
                           data={'dataset': {'foo': 'quux'},
                           'predicates': {'predicate': 'quuz'}},
                           children=[])}
Got:
    {'datasets/eICU': Node(name='datasets/eICU', data={'dataset': {'foo': 'quux'}, 'predicates': {'predicate': 'quuz'}}, children=[]), 'datasets/MIMIC/III': Node(name='datasets/MIMIC/III', data={'readme': 'This is a README.', 'dataset': {'foo': 'bar'}, 'predicates': {'predicate': 'value'}, 'refs': '@article{foo, bar}', 'requirements': ['numpy==1.21.0', 'pandas==1.3.0']}, children=[]), 'datasets/MIMIC': Node(name='datasets/MIMIC', data={'readme': 'This is a README for the category.'}, children=['datasets/MIMIC/III', 'datasets/MIMIC/IV']), 'datasets/MIMIC/IV': Node(name='datasets/MIMIC/IV', data={'readme': 'This is another README.', 'dataset': {'foo': 'baz'}, 'predicates': {'predicate': 'alt_value'}, 'refs': '@article{baz, qux}', 'requirements': ['numpy==1.21.0', 'pandas==1.3.0']}, children=[])}

.../MEDS_DEV/web/collate_entities.py:166: DocTestFailure

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

mmcdermott added a commit that referenced this pull request May 19, 2025
These were produced via preliminary files in #211
@mmcdermott mmcdermott marked this pull request as draft May 19, 2025 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment