Feature/glob pattern support by TheLazzziest · Pull Request #1 · Houston56/paracelsus

TheLazzziest · 2025-12-13T19:34:38Z

No description provided.

TheLazzziest

So far it looks great, but there is some room for improvements.

tests/test_graph.py

TheLazzziest · 2025-12-13T20:18:34Z

Move code-generate structure from fixtures to assets
Add a mermaid diagram for a generic case
Refactor fixture to let them provision the directories to the cases instead of creating them
Add E2E tests for different cases

Houston56 · 2025-12-16T13:09:26Z

Refactored test fixtures to use asset templates

Moved package structures to tests/assets/ (single_level, nested, multi_star, namespace)
Simplified fixtures to use shutil.copytree() instead of programmatic file creation
All fixtures now follow the same pattern as package_path fixture
Reduces code duplication and improves maintainability

Did not use mermaid_assert in test_get_graph_string_with_nested_glob_pattern because mermaid_assert validates the specific structure from package_path fixture (users/posts/comments tables), while this test uses nested_package_path which has a different structure (users/products/api_resources tables). The current assertion correctly verifies that glob pattern matching works by checking for expected tables.

Next steps if this looks good:

Implement module search functionality for remaining patterns (nested glob patterns like example.*.*.models and multiple stars like example.*.api.*.models)
Then add integration tests for these patterns

TheLazzziest · 2025-12-16T21:13:37Z

tests/transformers/test_find_modules.py

It looks much better now. Just don't forget that about glob functionality itself:

Matching Any String example.*.models

Matching a Single Character example.fo?.models

Matching character groups:

Character classes example.api.v[12].models

Character ranges example.api.v[0-9].models

Complementation

Character class example.api.v[!1].models

Character range example.api.v[!0-9].models

Pathnames:

Multiple packages deep: example.v?.*.*.models

Recursive packages lookup: example.**.api.*.models

Errors:

Missing rule: example.v?..models

Invalid delimiter example.v?,,models

As for tables, there is no need so far. It will be too time consuming to implement. We still don't know if the maintainer will share this approach. So my suggestion would be the following:

Complete the test suit

Implement the first two cases (Any string, A single char)

Open a drafted PR with a question (if this approach is ok), then continue work if he says - yes.

tests/conftest.py

Houston56 · 2025-12-26T14:57:15Z

Thanks for the feedback! I've implemented full support for all the requested glob pattern cases:

Basic wildcard (*) - matches any string at a single level
Multiple wildcards (*.*, *.*.*) - matches multiple package levels
Single character matching (?) - matches exactly one character
Character classes with prefix (v[12]) - matches exactly one character from the set
Character ranges (v[0-9]) - matches exactly one character in the range
Negation/complementation ([!1], [!0-9]) - matches any character except those specified
Recursive patterns (**) - matches zero or more package levels at any depth
Mixed wildcards (v?.*.*.models) - supports combinations of different pattern types
Namespace packages (PEP 420) - handles packages with multiple __path__ entries
Comprehensive error handling - validates patterns and provides clear error messages for invalid inputs

The code has been refactored into a dedicated finders module with full test coverage. All tests are passing.

TheLazzziest · 2026-01-01T22:06:19Z

Summary: An introduction of a state machine

The current implementation treats module discovery as a graph traversal problem solved with a Breadth-First Search (BFS) using a state machine.

The Mask (Pattern): Converted into a Linked List of GlobNode objects. This allows us to track our progress through the mask using object pointers rather than fragile integer indices.
Execution Flow: We move through the filesystem and the pattern independently. A standard match moves both cursors forward. A ** match moves the filesystem cursor forward while keeping the pattern cursor stationary (consuming directories):

Type-Aware Matching:
- Files: Matched by Stem (e.g., mask models matches models.py).
- Directories: Matched by Name (e.g., mask models matches models/).
- Namespace Support: Any directory with a valid Python identifier name (and not pycache) is treated as a potential package, complying with PEP 420.

Tradeoffs

Advantages

Strict Correctness with **: The state machine approach is one of the few ways to correctly handle the non-determinism of ** (recursive wildcards) without getting stuck in infinite loops or missing overlapping paths.
Extensibility: Because the pattern is a Linked List, you can easily implement "Macros" later. For example, if you encounter a token @django_apps, you can dynamically inject a new chain of GlobNodes into the list at runtime without breaking the traverser.
OS Agnostic: Full use of pathlib ensures consistent behavior across Windows and POSIX systems.

Risks & Limitations

Memory Overhead (BFS): We store every unique state (Path, PatternNode) in the visited set. On massive filesystems with broad ** patterns, this set can grow significantly.
False Positives (Namespace Packages):
- Issue: PEP 420 says any directory can be a package.
- Risk: If your structure has a folder named media or templates (which are valid Python identifiers), the traverser will enter them and try to match modules inside.
- Mitigation: You must rely on your glob mask being specific enough (e.g., apps.**.models) to avoid wandering into asset directories.
Stem Matching Ambiguity:
- Issue: A mask of utils will match both utils.py and a package utils/.
- Risk: If you have both in the same folder (bad practice, but possible), both will be returned. The consumer of this generator must decide which one to prioritize.

…tring

TheLazzziest · 2026-01-02T20:14:03Z

A small recap of what has been done:

Implement a state machine for searching and glob pattern matching
Implement glob pattern validation rules
Improve tests for the state maching
Decouple searching logic from import execution
Add tests for get_graph_string

Remaining scope of the work:

Add mermaid chart parser
Perform chart comparison between the expected chart and the one produced by the library
Update mermaid_assert function to perform dynamic comparison
Define what to do with namespace packages when we need to select a base model (both packages must define a base model)

…to feature/glob-pattern-support

Houston56 · 2026-01-15T09:32:10Z

Separate graph building from serialization and add dynamic graph comparison

This commit refactors the graph generation logic to separate graph building
from serialization, enabling direct MetaData comparison in tests without
parsing strings. It also adds support for namespace packages with multiple
base classes.

Changes

Core Refactoring

Separated graph building and serialization
- Added get_graph_metadata(): Returns MetaData object instead of string
- Added serialize_metadata(): Serializes MetaData to string format
- Refactored get_graph_string(): Now a thin wrapper combining the above
This separation allows tests to compare graph structures directly using
MetaData objects, making tests more robust and independent of serialization
format.
Added graph comparison functionality
- Added compare_metadata(): Structurally compares two MetaData objects
- Compares tables, columns, types, constraints, and foreign key relationships
- Provides detailed error messages on mismatch
This enables dynamic testing of graphs generated from glob patterns where
the expected structure is not known in advance.
Updated mermaid_assert() for dynamic comparison
- Supports both legacy mode (string assertions) and dynamic mode (MetaData comparison)
- Maintains backward compatibility with existing tests
- Allows tests to work with dynamically discovered models

Namespace Packages Support

Added support for multiple base classes in namespace packages
- Added _find_base_classes_by_pattern(): Finds all base classes matching glob pattern
- Added _merge_metadata(): Merges MetaData from multiple base classes
- Handles table name conflicts by adding namespace prefixes
- Re-merges metadata after model imports to capture all registered tables
When using glob patterns like project*.example.base:Base, the system now:
- Finds all matching base classes (e.g., project1.example.base:Base, project2.example.base:Base)
- Merges their MetaData into a single unified graph
- Resolves naming conflicts automatically
- Ensures all models from all namespace packages are included

Test Fixes

Fixed test_find_modules tests
- Changed to_module_name() calls to use Path.cwd() instead of package_path
- Reason: Test fixtures call os.chdir(), so Path.cwd() correctly reflects
  the working directory after the change. This matches the behavior in
  get_graph_metadata() which also uses Path.cwd().
- ModuleFinder returns absolute paths, and Path.cwd() ensures correct
  relative path calculation

Technical Details

Why separate serialization?

Previously, get_graph_string() did everything in one step:

Import modules
Build graph (get MetaData)
Filter tables
Serialize to string

This made it impossible to:

Compare two graphs structurally
Test with dynamic glob patterns
Reuse graph building logic

Now the flow is:

get_graph_metadata() → Returns MetaData
compare_metadata() → Compares MetaData objects
serialize_metadata() → Converts MetaData to string

Why re-merge metadata after imports?

When merging metadata from multiple namespace packages:

Initial merge happens before model imports (empty metadata)
Models register themselves in their original base class metadata
After imports, we re-merge to capture all newly registered tables

This ensures the final merged metadata contains all tables from all
namespace packages.

Testing

All 66 tests pass
Code coverage: 86%
Backward compatibility maintained
New tests added for namespace packages and wildcard patterns

All changes are backward compatible. get_graph_string() continues
to work as before, now implemented as a wrapper around the new functions.

TheLazzziest · 2026-01-17T18:40:44Z

LGTM!

…n-support

Apti added 2 commits December 11, 2025 18:19

WIP: add glob pattern support for --import-module

952199d

Implemented tests

b8d6f62

TheLazzziest commented Dec 13, 2025

View reviewed changes

Refactor fixtures to use asset templates

e1d0d17

TheLazzziest commented Dec 16, 2025

View reviewed changes

Apti added 2 commits December 19, 2025 19:22

refactor: use singledispatch for test path utilities

ba08efe

refactor and extend glob pattern matching with advanced features

fbf6cb0

TheLazzziest added 5 commits January 1, 2026 22:28

Feature, Add Pattern model

559b393

Feature, Add ModuleFinder

7f35f75

Feature, Fix validation issues

e6afae3

Feature, Reimplment module lookup block. Update tests

8c3a769

Feature, Add pre-commit to optional dependencies

2a0fc66

TheLazzziest added 5 commits January 2, 2026 22:59

Feature, Improve the method description for state processing

eabad47

Feature, Replace pool executor with a separate thread

408019b

Feature, Add validation rules for pattern masks

de0a598

Feature, Add base.py module to the namespace case

8eb7a61

Feature, Fix validation tests for patterns. Add tests for get_graph_s…

2156ce1

…tring

TheLazzziest and others added 5 commits January 2, 2026 23:18

Feature, Remove do_import

c5d99f2

Merge branch 'main' into feature/glob-pattern-support

7d0716e

Separate graph building from serialization and add dynamic comparison

ca98f29

fix format

26a903e

Merge remote-tracking branch 'origin/feature/glob-pattern-support' in…

04c83e8

…to feature/glob-pattern-support

Merge remote-tracking branch 'upstream/main' into feature/glob-patter…

ce11e2d

…n-support

Houston56 added 2 commits January 18, 2026 19:40

Fix mypy type errors

c26352d

Refactor: make module imports single-threaded

85da90a

Conversation

TheLazzziest commented Dec 13, 2025

Uh oh!

TheLazzziest left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheLazzziest commented Dec 13, 2025

Uh oh!

Houston56 commented Dec 16, 2025

Uh oh!

TheLazzziest Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Houston56 commented Dec 26, 2025

Uh oh!

TheLazzziest commented Jan 1, 2026

Summary: An introduction of a state machine

Tradeoffs

Advantages

Risks & Limitations

Uh oh!

TheLazzziest commented Jan 2, 2026

Uh oh!

Houston56 commented Jan 15, 2026

Changes

Core Refactoring

Namespace Packages Support

Test Fixes

Technical Details

Why separate serialization?

Why re-merge metadata after imports?

Testing

Uh oh!

TheLazzziest commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants