refactor: prepare_sample_dataset updated#122
Merged
AwaisKamran merged 33 commits intomainfrom May 22, 2025
Merged
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the sample dataset preparation script for text-to-SQL tasks, aiming to reduce cyclomatic complexity and better adhere to single-responsibility principles. Key changes include enhanced error handling and logging, improved documentation, and reorganized functions for creating and processing the train file.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| server/utilities/constants/preprocess/prepare_sample_dataset/response_messages.py | Added standardized response message constants. |
| server/utilities/constants/preprocess/prepare_sample_dataset/indexing_constants.py | Introduced indexing constants for dataset preparation. |
| server/utilities/constants/common/error_messages.py | Provided common error message templates. |
| server/utilities/connections/sqlite.py | Implemented in-memory SQLite connection using backup. |
| server/utilities/connections/common.py | Added utility for safely closing connections. |
| server/scripts/train_test_split_bird.py | Enhanced functions and added detailed documentation for JSON processing and splitting. |
| server/scripts/mask_bird.py | Improved masking functionality with detailed documentation and refined file operations. |
| server/preprocess/prepare_sample_dataset.py | Refactored the main dataset preparation script with improved logging, error handling, and function reorganization. |
server/utilities/constants/preprocess/prepare_sample_dataset/indexing_constants.py
Outdated
Show resolved
Hide resolved
server/utilities/constants/preprocess/prepare_sample_dataset/response_messages.py
Outdated
Show resolved
Hide resolved
fi5421
reviewed
May 19, 2025
…pare-sample-database-script
…pare-sample-database-script
… for get_train_file method
fi5421
reviewed
May 22, 2025
fi5421
previously approved these changes
May 22, 2025
Mehak-Conrad
previously approved these changes
May 22, 2025
Mehak-Conrad
approved these changes
May 22, 2025
fi5421
approved these changes
May 22, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR corresponds to the following Refactor-Prepare-Sample-Database-Script
This PR refactors the existing script based on its cyclomatic complexity score and violation of SRP principles.
Before Refactor
After Refactor
After Review
For Reviewers
@fi5421 I you are requested to please run and test this script before merging