Feature: Add glob pattern support, fixes #49#55
Feature: Add glob pattern support, fixes #49#55Houston56 wants to merge 23 commits intotedivm:mainfrom
Conversation
…to feature/glob-pattern-support
paracelsus/graph.py
Outdated
| if any(pattern.errors): | ||
| raise ValueError(pattern.serialized_errors) | ||
|
|
||
| current_root = Path.cwd() |
There was a problem hiding this comment.
This variable can be dropped as it's been defined on the 194th line.
|
Before this can be merged can you please do some small cleanup:
Thanks! |
|
Done!
|
|
hi @tedivm ! A quick question about the docs. Do we need to update it here or a separate PR will do as well? |
paracelsus/graph.py
Outdated
| import_queue_sentinel = object() | ||
| import_queue: Queue[Union[Dict[str, str], object]] = Queue() | ||
| import_worker = Thread(target=consume_import_tasks, args=(import_queue, import_queue_sentinel), daemon=True) | ||
| import_worker.start() |
There was a problem hiding this comment.
I have concerns about making this application threaded. There are some platforms which do not support threading, and it adds additional complexity to the application. I'm also not sure there will be much performance increase. Can you make this single threaded for now, and then we can talk about introducing threading as a separate PR? Or can you add some benchmarks showing that the complexity is worth it form a performance standpoint?
There was a problem hiding this comment.
Hey @tedivm . Thanks for the question. Unfortunately, I don't have any benchmarks to provide, so I could share my logic behind this decision.
Since import is thread-safe, a background worker can 'warm up' sys.modules by handling the I/O-heavy operations (especially of deep dependency trees (like for Django applications) in parallel. This effectively hides the disk latency from the main execution thread allowing the exploration process to run concurrently. It means that while the finder keeps going over the project, the import cache is warming up progressively reducing the initialization time of the context significantly.
However, simplicity and portability would be a better bet in this case. So I'd rather not over-engineer this if you prefer a leaner codebase.
Anyways, we can revisit the threading/concurrency logic as a separate, data-driven improvement later on.
There was a problem hiding this comment.
@tedivm , can we close this thread and move on or there are stil some questions left ?
There was a problem hiding this comment.
sorry, busy week at work, but I'm planning on reviewing this over the weekend.
|
Hi @tedivm ! Just a quick bump on this PR. Thanks! |
Summary
Hi! This PR adds glob pattern support to
--import-module, addressing #49. Users can now use glob patterns to discover and import modules instead of listing each path.Changes
Core Features
Glob pattern support in
--import-module:*- matches any string at one level (e.g.,example.*.models)?- matches a single character (e.g.,example.fo?.models)**- recursive wildcard matching zero or more levels (e.g.,example.**.api.*.models)[abc]- character class (e.g.,example.api.v[12].models)[a-z]- character range (e.g.,example.api.v[0-9].models)[!a]- negated character class (e.g.,example.api.v[!1].models)Wildcard import support: Patterns ending with
:*performfrom module import *for each matching module"src.domains.*.models:*"finds all matching modules and imports each withfrom <module> import *Glob pattern support for base class path: The
base_class_pathargument now also supports glob patterns for finding multiple base classes"project*.example.base:Base"will find and merge metadata from all matching base classesImplementation Details
Pattern validation (
paracelsus/models/pattern.py):Patternclass with grammar validation**is followed by*Module finder (
paracelsus/finders.py):ModuleFinderclass using BFS traversalGraph building refactor (
paracelsus/graph.py):get_graph_metadata()- separates graph building logic from serializationserialize_metadata()- handles metadata serializationget_graph_string()- now a convenience wrapper combiningget_graph_metadata()andserialize_metadata()for backward compatibility_find_base_classes_by_pattern()- finds multiple base classes by glob pattern_merge_metadata()- merges MetaData from multiple base classes with conflict resolution using path-based prefixesto_module_name()- converts filesystem paths to Python module namesconsume_import_tasks()- threaded module import workerModuleFinderand threaded import queuebase_class_pathwith automatic metadata mergingExamples
Testing
Related Issues
Closes #49