Skip to content

Feature: Add glob pattern support, fixes #49#55

Open
Houston56 wants to merge 23 commits intotedivm:mainfrom
Houston56:feature/glob-pattern-support
Open

Feature: Add glob pattern support, fixes #49#55
Houston56 wants to merge 23 commits intotedivm:mainfrom
Houston56:feature/glob-pattern-support

Conversation

@Houston56
Copy link
Copy Markdown

@Houston56 Houston56 commented Jan 18, 2026

Summary

Hi! This PR adds glob pattern support to --import-module, addressing #49. Users can now use glob patterns to discover and import modules instead of listing each path.

Changes

Core Features

  • Glob pattern support in --import-module:

    • * - matches any string at one level (e.g., example.*.models)
    • ? - matches a single character (e.g., example.fo?.models)
    • ** - recursive wildcard matching zero or more levels (e.g., example.**.api.*.models)
    • [abc] - character class (e.g., example.api.v[12].models)
    • [a-z] - character range (e.g., example.api.v[0-9].models)
    • [!a] - negated character class (e.g., example.api.v[!1].models)
  • Wildcard import support: Patterns ending with :* perform from module import * for each matching module

    • Example: "src.domains.*.models:*" finds all matching modules and imports each with from <module> import *
  • Glob pattern support for base class path: The base_class_path argument now also supports glob patterns for finding multiple base classes

    • Example: "project*.example.base:Base" will find and merge metadata from all matching base classes

Implementation Details

  1. Pattern validation (paracelsus/models/pattern.py):

    • New Pattern class with grammar validation
    • Prevents ambiguous patterns where ** is followed by *
    • Validates pattern syntax before processing
  2. Module finder (paracelsus/finders.py):

    • New ModuleFinder class using BFS traversal
    • Supports namespace packages (PEP 420)
    • Prevents infinite loops with symlinks and redundant paths
  3. Graph building refactor (paracelsus/graph.py):

    • New function: get_graph_metadata() - separates graph building logic from serialization
    • New function: serialize_metadata() - handles metadata serialization
    • Refactored: get_graph_string() - now a convenience wrapper combining get_graph_metadata() and serialize_metadata() for backward compatibility
    • New function: _find_base_classes_by_pattern() - finds multiple base classes by glob pattern
    • New function: _merge_metadata() - merges MetaData from multiple base classes with conflict resolution using path-based prefixes
    • New function: to_module_name() - converts filesystem paths to Python module names
    • New function: consume_import_tasks() - threaded module import worker
    • Updated import logic: Replaced simple loop with pattern-aware ModuleFinder and threaded import queue
    • Updated base class logic: Added support for glob patterns in base_class_path with automatic metadata merging

Examples

# Basic usage from Issue #49
paracelsus inject docs/database.md src.infra.orm:Base \
  --import-module "src.domains.*.models:*"

# Recursive lookup
paracelsus graph example.base:Base \
  --import-module "example.**.api.*.models"

# Multiple patterns
paracelsus graph example.base:Base \
  --import-module "example.domain.*.models" \
  --import-module "example.api.v[0-9].models"

# Base class pattern (namespace packages)
paracelsus graph "project*.example.base:Base" \
  --import-module "project*.example.*.models"

Testing

  • Tests covering all pattern types
  • Tests for nested package structures
  • Tests for namespace packages (PEP 420)
  • Integration tests with SQLAlchemy models
  • Validation error tests for invalid patterns

Related Issues

Closes #49

if any(pattern.errors):
raise ValueError(pattern.serialized_errors)

current_root = Path.cwd()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable can be dropped as it's been defined on the 194th line.

@tedivm
Copy link
Copy Markdown
Owner

tedivm commented Jan 18, 2026

Before this can be merged can you please do some small cleanup:

  1. Resolve the conflicts with the main branch so that this PR can be "squash merged".
  2. Fix the mypy issues.

Thanks!

@Houston56
Copy link
Copy Markdown
Author

Done!

  1. Updated branch with upstream/main (no conflicts)
  2. Fixed mypy errors by adding explicit type hints

@TheLazzziest
Copy link
Copy Markdown

hi @tedivm ! A quick question about the docs. Do we need to update it here or a separate PR will do as well?

Comment on lines +218 to +221
import_queue_sentinel = object()
import_queue: Queue[Union[Dict[str, str], object]] = Queue()
import_worker = Thread(target=consume_import_tasks, args=(import_queue, import_queue_sentinel), daemon=True)
import_worker.start()
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about making this application threaded. There are some platforms which do not support threading, and it adds additional complexity to the application. I'm also not sure there will be much performance increase. Can you make this single threaded for now, and then we can talk about introducing threading as a separate PR? Or can you add some benchmarks showing that the complexity is worth it form a performance standpoint?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tedivm . Thanks for the question. Unfortunately, I don't have any benchmarks to provide, so I could share my logic behind this decision.

Since import is thread-safe, a background worker can 'warm up' sys.modules by handling the I/O-heavy operations (especially of deep dependency trees (like for Django applications) in parallel. This effectively hides the disk latency from the main execution thread allowing the exploration process to run concurrently. It means that while the finder keeps going over the project, the import cache is warming up progressively reducing the initialization time of the context significantly.

However, simplicity and portability would be a better bet in this case. So I'd rather not over-engineer this if you prefer a leaner codebase.

Anyways, we can revisit the threading/concurrency logic as a separate, data-driven improvement later on.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tedivm , can we close this thread and move on or there are stil some questions left ?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, busy week at work, but I'm planning on reviewing this over the weekend.

@Houston56 Houston56 requested a review from tedivm January 24, 2026 10:05
@Houston56
Copy link
Copy Markdown
Author

Hi @tedivm ! Just a quick bump on this PR. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support globbing syntax

3 participants