Issue Description
There's a mismatch between the CLIP implementation used in our research jupyter notebook, validate_clip_rankings.ipynb, and our production code, which may lead to unexpected differences in scoring behavior.
Details
Current Implementation
Impact
This inconsistency means that:
- The baseline adjustment behavior might differ between research and production
- Scoring thresholds determined in the notebook might not directly transfer to production
- Research findings might not fully apply to the deployed system
Potential Solutions
- Update
ScoreValidator to use the same direct CLIP approach as the notebook
- Modify the notebook to use
ClipEmbedder for consistency
- Perform a comparative analysis to determine which approach performs better
- Standardize on a single CLIP implementation across all code
Reproduction Steps
- Run baseline adjustment tests in the notebook
- Run the same tests with the
ScoreValidator implementation
- Compare the results for the same input texts and images
Priority
Medium - This won't break the system but should be addressed for consistent behavior between research and production.