Support Newer Hugging Face Model for Text Vectorization

**Problem**: iQual’s text vectorization uses `sentence-transformers` (e.g., older models like `all-MiniLM-L6-v1`). Newer models like `all-MiniLM-L12-v2` offer better accuracy with similar efficiency.
**Proposed Solution**: Update `src/iqual/text_features.py` to support `all-MiniLM-L12-v2` as an option in `add_text_features`. This would:
- Add a parameter to select the model (default to current).
- Update notebook examples (`Basic Modelling`) to demo the new model.
- Include performance benchmarks (e.g., accuracy on politeness dataset).

**Steps**:
1. Add model option in `text_features.py`.
2. Test on sample data (politeness dataset).
3. Update `notebooks/Basic_Modelling.ipynb` with example.
4. Add tests for vectorization output.
**Impact**: Improves iQual’s NLP accuracy, aligning with World Bank’s AI-for-data goals.
**Willing to Implement**: I can submit a PR with code and updated notebook.

@addypy @g4brielvs, seeking your thoughts on adding `all-MiniLM-L12-v2` to iQual’s text vectorization to boost NLP accuracy for SDG analysis. Happy to refine benchmarks or model choices per your guidance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Newer Hugging Face Model for Text Vectorization #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Newer Hugging Face Model for Text Vectorization #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions