Skip to content

👽 Update MelusinePipeline for sklearn compatibility#194

Open
HugoPerrier wants to merge 12 commits intomasterfrom
update_sklearn_pipeline
Open

👽 Update MelusinePipeline for sklearn compatibility#194
HugoPerrier wants to merge 12 commits intomasterfrom
update_sklearn_pipeline

Conversation

@HugoPerrier
Copy link
Contributor

Python version and environment updates:

  • Updated the minimum required Python version to 3.10 in pyproject.toml and removed references to Python 3.8 and 3.9 from both pyproject.toml and tox.ini. Test environments for Python 3.10 and 3.12 are now specified. [1] [2] [3]

Pipeline compatibility and type simplification:

  • Added a no-op fit method and a __sklearn_is_fitted__ property to MelusinePipeline in melusine/pipeline.py to ensure compatibility with scikit-learn >=1.2, which requires pipelines to be fitted before calling transform.
  • Simplified type annotations throughout melusine/pipeline.py by removing explicit type hints from variable assignments, making the code more concise and compatible with Python 3.10+. [1] [2] [3] [4] [5] [6]
  • Changed default arguments and return type annotations in pipeline configuration methods for better Python 3.10+ compatibility. [1] [2]

Regex and segmentation improvements:

  • Improved email segmentation regex patterns in melusine/processors.py to better handle cases where meta keywords are followed by a newline instead of a colon, increasing robustness in email parsing. [1] [2]

Minor bugfixes:

  • Fixed a minor typo in a regex pattern for matching addresses in melusine/processors.py ({,3} to {0,3} for clarity and correctness).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant