Problem: iQual’s text vectorization uses sentence-transformers (e.g., older models like all-MiniLM-L6-v1). Newer models like all-MiniLM-L12-v2 offer better accuracy with similar efficiency.
Proposed Solution: Update src/iqual/text_features.py to support all-MiniLM-L12-v2 as an option in add_text_features. This would:
- Add a parameter to select the model (default to current).
- Update notebook examples (
Basic Modelling) to demo the new model.
- Include performance benchmarks (e.g., accuracy on politeness dataset).
Steps:
- Add model option in
text_features.py.
- Test on sample data (politeness dataset).
- Update
notebooks/Basic_Modelling.ipynb with example.
- Add tests for vectorization output.
Impact: Improves iQual’s NLP accuracy, aligning with World Bank’s AI-for-data goals.
Willing to Implement: I can submit a PR with code and updated notebook.
@addypy @g4brielvs, seeking your thoughts on adding all-MiniLM-L12-v2 to iQual’s text vectorization to boost NLP accuracy for SDG analysis. Happy to refine benchmarks or model choices per your guidance!
Problem: iQual’s text vectorization uses
sentence-transformers(e.g., older models likeall-MiniLM-L6-v1). Newer models likeall-MiniLM-L12-v2offer better accuracy with similar efficiency.Proposed Solution: Update
src/iqual/text_features.pyto supportall-MiniLM-L12-v2as an option inadd_text_features. This would:Basic Modelling) to demo the new model.Steps:
text_features.py.notebooks/Basic_Modelling.ipynbwith example.Impact: Improves iQual’s NLP accuracy, aligning with World Bank’s AI-for-data goals.
Willing to Implement: I can submit a PR with code and updated notebook.
@addypy @g4brielvs, seeking your thoughts on adding
all-MiniLM-L12-v2to iQual’s text vectorization to boost NLP accuracy for SDG analysis. Happy to refine benchmarks or model choices per your guidance!