Skip to content

refactor: prepare_sample_dataset updated#122

Merged
AwaisKamran merged 33 commits intomainfrom
refactor-prepare-sample-database-script
May 22, 2025
Merged

refactor: prepare_sample_dataset updated#122
AwaisKamran merged 33 commits intomainfrom
refactor-prepare-sample-database-script

Conversation

@AwaisKamran
Copy link
Contributor

@AwaisKamran AwaisKamran commented May 15, 2025

Description

This PR corresponds to the following Refactor-Prepare-Sample-Database-Script
This PR refactors the existing script based on its cyclomatic complexity score and violation of SRP principles.

Before Refactor

./preprocess/prepare_sample_dataset.py
    F 15:0 get_train_file_path - A
    F 47:0 add_schema_used - A
    F 39:0 make_sqlite_connection - A

3 blocks (classes, functions, methods) analyzed.
Average complexity: A (3.0)

After Refactor

./preprocess/prepare_sample_dataset.py
    F 34:0 create_train_file - A
    F 71:0 add_schema_used - A
    F 25:0 get_train_file_path - A
    F 58:0 add_question_id_for_bird_train - A
    F 53:0 copy_bird_train_file - A
    F 116:0 write_train_data_to_file - A

6 blocks (classes, functions, methods) analyzed.
Average complexity: A (2.6666666666666665)

After Review

./preprocess/prepare_sample_dataset.py
    F 56:0 create_train_file - A
    F 147:0 add_schema_used - A
    F 121:0 fetch_database - A
    F 43:0 get_train_file_path - A
    F 92:0 create_database_connection - A
    F 81:0 copy_bird_train_file - A

6 blocks (classes, functions, methods) analyzed.
Average complexity: A (3.0)

For Reviewers

@fi5421 I you are requested to please run and test this script before merging

@AwaisKamran AwaisKamran self-assigned this May 15, 2025
Copilot AI review requested due to automatic review settings May 15, 2025 11:10
@AwaisKamran AwaisKamran added documentation Improvements or additions to documentation enhancement New feature or request labels May 15, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the sample dataset preparation script for text-to-SQL tasks, aiming to reduce cyclomatic complexity and better adhere to single-responsibility principles. Key changes include enhanced error handling and logging, improved documentation, and reorganized functions for creating and processing the train file.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
server/utilities/constants/preprocess/prepare_sample_dataset/response_messages.py Added standardized response message constants.
server/utilities/constants/preprocess/prepare_sample_dataset/indexing_constants.py Introduced indexing constants for dataset preparation.
server/utilities/constants/common/error_messages.py Provided common error message templates.
server/utilities/connections/sqlite.py Implemented in-memory SQLite connection using backup.
server/utilities/connections/common.py Added utility for safely closing connections.
server/scripts/train_test_split_bird.py Enhanced functions and added detailed documentation for JSON processing and splitting.
server/scripts/mask_bird.py Improved masking functionality with detailed documentation and refined file operations.
server/preprocess/prepare_sample_dataset.py Refactored the main dataset preparation script with improved logging, error handling, and function reorganization.

@AwaisKamran AwaisKamran requested a review from fi5421 May 20, 2025 07:15
fi5421
fi5421 previously approved these changes May 22, 2025
Mehak-Conrad
Mehak-Conrad previously approved these changes May 22, 2025
@AwaisKamran AwaisKamran dismissed stale reviews from Mehak-Conrad and fi5421 via 0a5c46b May 22, 2025 15:29
@fi5421 fi5421 self-requested a review May 22, 2025 17:24
@AwaisKamran AwaisKamran merged commit a3f46ab into main May 22, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants