Conversation
…pare-sample-database-script
…pare-sample-database-script
… for get_train_file method
…pare-sample-database-script
…pare-sample-database-script
There was a problem hiding this comment.
Pull Request Overview
This PR adds unit tests for the sample dataset preparation pipeline and updates the signature and in-call usage of the add_schema_used function.
- Added tests for functions such as get_train_file_path, create_train_file, copy_bird_train_file, get_train_data, and add_schema_used.
- Updated add_schema_used to accept an additional train_file parameter and modified calls accordingly.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| server/test/preprocess/test_prepare_sample_dataset.py | Adds comprehensive test cases for sample dataset functions. |
| server/preprocess/prepare_sample_dataset.py | Updates the add_schema_used function signature and call sites to pass train_file as a Path object; adjusts file path handling. |
| server/init.py | Introduces package documentation for the server package. |
|
|
||
| if train_data: | ||
| add_schema_used(train_data, dataset_type) | ||
| add_schema_used(train_data, dataset_type, Path(train_file)) |
There was a problem hiding this comment.
Both the public function and its caller share the same name 'add_schema_used' but now with an added parameter, which could lead to confusion or unintended recursion; consider renaming one of these to clearly differentiate their responsibilities.
| train_file = get_train_file_path() | ||
| dataset_type = PATH_CONFIG.sample_dataset_type | ||
| train_data = get_train_data(train_file) | ||
| train_data = get_train_data(Path(train_file)) |
There was a problem hiding this comment.
[nitpick] Consider using consistent types for file paths across your functions; either update get_train_data to accept a Path object or convert the Path to a string before passing it.
| train_data = get_train_data(Path(train_file)) | |
| train_data = get_train_data(Path(train_file)) # Ensure get_train_data supports Path objects |
| @patch('os.path.exists', return_value=True) | ||
| @patch('os.makedirs') | ||
| @patch('shutil.copyfile') | ||
| @patch('server.preprocess.prepare_sample_dataset.add_sequential_ids_to_questions') # Mocking API key during test |
There was a problem hiding this comment.
[nitpick] The comment 'Mocking API key during test' is misleading for add_sequential_ids_to_questions; please update it to accurately reflect the mocked functionality.
Description
This PR corresponds to the following Write-Test-Cases-For-Prepare-Sample-Dataset