Skip to content

Refactor Format Schema#123

Merged
fi5421 merged 14 commits intomainfrom
refactor_format_schema
May 21, 2025
Merged

Refactor Format Schema#123
fi5421 merged 14 commits intomainfrom
refactor_format_schema

Conversation

@fi5421
Copy link
Collaborator

@fi5421 fi5421 commented May 15, 2025

Description

This PR implements the refactoring described in Refactor format_schema Method in utility_functions.py.

Summary of Changes

  • The format_schema function has been refactored and moved to a separate module.

  • The new flow is as follows:

    1. format_schema loads all table and column descriptions from CSV files into a description_dict.

    2. A schema_config_dict is constructed. This dictionary holds all schema-specific information, including:

      • Tables
      • Column names
      • Primary keys and foreign keys
      • Descriptions
      • Column examples
      • Column types
    3. The schema_config_dict is passed to each schema format function independently to generate the final schema output.

Key Notes

  • Decouples DB Access from Schema Format Logic:
    Even schema formats that don’t require DB access (e.g., Text) still receive a fully populated schema_config_dict. Which can inadvertently lead to slow execution however this design promotes clear separation of concerns and paves the way for supporting multiple database connectors in the future.

  • Removes External Dependencies:
    Eliminates reliance on external packages like SQLAlchemy, reducing overhead and simplifying the dependency tree.


Dependent PRs

This PR depends on updates made to utility_functions.py in the following PR:
Update Utility Functions


For Reviewers

Please review utility_functions.py only after Update Utility Functions is merged to avoid unrelated diffs.

refactor format_schema and moved it into another file
added seperate constants for format schema
updated all imports for format schema
@fi5421 fi5421 added this to the Refactor Text2SQL Codebase milestone May 15, 2025
@fi5421 fi5421 self-assigned this May 15, 2025
@fi5421 fi5421 added the chore label May 15, 2025
fi5421 and others added 5 commits May 19, 2025 12:34
resolved PR Comments
Updated function and variable names
refactor examples_to_str in utilitiy_functions
refactored m_schema function
minor pr comments resolved
fi5421 added 4 commits May 20, 2025 17:22
Updated mschema functions
moved read_csv logic to bird_utils with multiple encodings
Updated mschema functions
moved read_csv logic to bird_utils with multiple encodings
@fi5421 fi5421 force-pushed the refactor_format_schema branch from 790af53 to c2869ec Compare May 21, 2025 09:30
AwaisKamran
AwaisKamran previously approved these changes May 21, 2025
@AwaisKamran AwaisKamran self-requested a review May 21, 2025 09:45
@fi5421 fi5421 merged commit c914c64 into main May 21, 2025
2 checks passed
@fi5421 fi5421 deleted the refactor_format_schema branch May 21, 2025 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants