Research Documentation for Academic Publication
Generated from experimental analysis using sentence-transformers/all-MiniLM-L6-v2
This analysis demonstrates three critical limitations in sentence transformer models when using cosine similarity for semantic understanding. The experiments reveal fundamental issues that affect real-world applications including search systems, recommendation engines, and semantic matching tasks.
Key Findings:
- Paraphrase variants maintain similar pairwise cosine similarity but produce inconsistent neighbor rankings
- Semantically unrelated pairs can exhibit identical cosine similarities
- Mathematical transformations can preserve some relationships while destroying others unpredictably
Semantically equivalent paraphrased sentences maintain similar pairwise cosine similarity but show different neighbor rankings in a fixed corpus, affecting practical search results.
- Model: sentence-transformers/all-MiniLM-L6-v2
- Corpus Size: 31 documents across 6 domains (safety, finance, sports, AI, commerce, health)
- Top-K Neighbors: 10
- Evaluation Metrics: Jaccard similarity, Rank-Biased Overlap (RBO)
Original Pair:
- A1: "Safety fences reduce the risk of worker injury around machines."
- A2: "Protective barriers shield employees from accidents near industrial robots."
Paraphrased Pair:
- A1': "By installing safety fences around machines, the risk of worker injury is reduced."
- A2': "Employees are protected from accidents around industrial robots by protective barriers."
| Metric | Original Pair | Paraphrased Pair | Difference |
|---|---|---|---|
| Pairwise Cosine Similarity | 0.6485 | 0.6107 | 0.0377 |
| Jaccard Overlap (A1 vs A1') | - | - | 1.000 |
| Jaccard Overlap (A2 vs A2') | - | - | 0.818 |
| RBO Score (A1 vs A1') | - | - | 0.949 |
| RBO Score (A2 vs A2') | - | - | 0.990 |
Top 5 Neighbors for A1 (Original):
- [1.000] Safety fences reduce the risk of worker injury around machines...
- [0.578] Protective barriers around robot cells prevent accidents...
- [0.464] OSHA 1910.212 requires machine guarding to protect operators...
- [0.307] Light curtains can stop hazardous motion when a person enters...
- [0.271] CE marking indicates conformity with health and safety standards...
Top 5 Neighbors for A1' (Paraphrased):
- [0.947] Safety fences reduce the risk of worker injury around machines...
- [0.529] Protective barriers around robot cells prevent accidents...
- [0.460] OSHA 1910.212 requires machine guarding to protect operators...
- [0.286] Light curtains can stop hazardous motion when a person enters...
- [0.250] Regular exercise reduces the risk of cardiovascular disease...
- Search Inconsistency: Equivalent queries produce different result rankings
- User Experience Impact: Paraphrased searches yield inconsistent results
- System Reliability: Semantic search systems show unstable behavior
Semantically unrelated pairs can exhibit identical cosine similarities, demonstrating that cosine similarity alone is insufficient for semantic understanding.
- Candidate Pool: 30 diverse phrases (words, short sentences)
- Total Comparisons: 435 pairwise similarities computed
- Cosine Tolerance: ±0.02
- Lexical Overlap Filter: <0.3 to ensure semantic differences
- Total Similar Pairs Found: 7,307 pairs with approximately identical cosine similarities
- Semantic Diversity: Pairs span completely different domains and concepts
- Lexical Independence: Low lexical overlap confirms semantic differences
Example 1:
- Pair A: 'automobile' ↔ 'An eagle soared over the valley.'
- Pair B: 'couch' ↔ 'A sparrow perched on the fence.'
- Cosine A: 0.0642
- Cosine B: 0.0642
- Difference: 0.0000
- Max Lexical Overlap: 0.000
Example 2:
- Pair A: 'couch' ↔ 'The dog barked at the mailman.'
- Pair B: 'He tightened the screw with a screwdriver.' ↔ 'The physician assessed the symptoms.'
- Cosine A: 0.0015
- Cosine B: 0.0015
- Difference: 0.0000
- Max Lexical Overlap: 0.100
Example 3:
- Pair A: 'An eagle soared over the valley.' ↔ 'The automobile was parked in the garage.'
- Pair B: 'The physician assessed the symptoms.' ↔ 'He was joyful during the celebration.'
- Cosine A: 0.0448
- Cosine B: 0.0448
- Difference: 0.0000
- Max Lexical Overlap: 0.111
- Semantic Ambiguity: Cosine similarity fails to distinguish semantic relationships
- False Positives: Unrelated content appears semantically similar
- Evaluation Limitations: Single-metric evaluation insufficient for semantic tasks
Mathematical transformations can preserve query-document dot-product rankings while significantly altering document-document cosine similarities, revealing instability in similarity measures.
- Queries: 4 diverse search queries
- Documents: 8 documents across multiple domains
- Transformation: Diagonal scaling with compensation (U→U·D, V→V·D⁻¹)
- Embedding Dimension: 384
- Scaling Range: [0.673, 1.480]
| Metric | Value |
|---|---|
| Query→Document Rankings Preserved | True (100%) |
| Top-K Overlap (All Queries) | 8/8 (Perfect) |
| Document-Document Cosine Change (Frobenius Norm) | 0.1687 |
| Maximum Cosine Similarity Change | 0.0426 |
| Mean Cosine Similarity Change | 0.0166 |
Query 1: "what are machine safety fences used for"
- Top Result: [0.814] Safety fences reduce the risk of worker injury around machines...
- Ranking Stability: Perfect preservation across transformation
Query 2: "how to protect employees near industrial robots"
- Top Result: [0.547] Safety fences reduce the risk of worker injury around machines...
- Second Result: [0.531] Protective barriers around robot cells prevent accidents...
Query 4: "vector search and semantic similarity"
- Top Result: [0.732] Vector databases accelerate semantic search over large corpora...
- Second Result: [0.429] Contrastive learning aligns similar sentences in embedding space...
- Ranking Fragility: Document relationships unstable under transformations
- Evaluation Robustness: Need for transformation-invariant evaluation metrics
- System Reliability: Embedding modifications can have unpredictable effects
-
Paraphrase Sensitivity
- Semantically equivalent expressions produce inconsistent neighbor rankings
- Impact: Unreliable search and retrieval systems
-
Cosine Ambiguity
- Unrelated semantic pairs exhibit identical similarity scores
- Impact: False semantic matches in applications
-
Transformation Instability
- Mathematical operations preserve some relationships while destroying others
- Impact: Fragile similarity measures under system modifications
-
Multi-Metric Evaluation
- Combine cosine similarity with additional semantic measures
- Develop robust evaluation frameworks beyond pairwise comparisons
-
Paraphrase Robustness Testing
- Systematic evaluation across paraphrase variants
- Development of paraphrase-invariant similarity measures
-
Transformation Stability Analysis
- Test embedding stability under various mathematical transformations
- Design transformation-robust similarity metrics
-
Semantic Evaluation Beyond Cosine
- Explore alternative similarity measures (e.g., Earth Mover's Distance, Wasserstein)
- Investigate contextual and compositional similarity approaches
- Model: sentence-transformers/all-MiniLM-L6-v2
- Framework: Sentence Transformers, scikit-learn
- Evaluation Metrics: Cosine similarity, Jaccard index, Rank-Biased Overlap
- Random Seed: 42 (for reproducibility)
- Corpus Domains: Safety/Industry, Finance, Sports, Web/AI, Commerce, Health
This analysis provides empirical evidence for fundamental limitations in current sentence transformer approaches to semantic similarity, offering concrete examples for academic research and system development considerations.