-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Description
Validate HNSW algorithm performance and memory usage with enterprise-scale datasets of 100,000+ vectors to ensure it can handle real-world workloads.
Phase
Phase 2: Large-Scale Stress Testing
Epic
Related to #202
Acceptance Criteria
- Test HNSW build time with 100K, 500K, and 1M vector datasets
- Validate memory usage stays within reasonable bounds (< 8GB for 1M vectors)
- Verify search accuracy remains high with large datasets
- Test index serialization/deserialization performance at scale
- Benchmark against other algorithms (KD-Tree, Linear) at scale
Test Scenarios
- Build Performance - Time to build HNSW index for large datasets
- Memory Usage - Peak memory consumption during build and search
- Search Accuracy - Precision/recall metrics with large datasets
- Concurrent Operations - Multiple searches during large index builds
- Persistence - Save/load times for large HNSW indexes
Test Structure
[Test]
[Category("Stress")]
[Explicit("Large dataset test - run manually")]
public async Task HNSW_Build_100KVectors_CompletesWithinTimeLimit()
{
// Arrange
const int VectorCount = 100_000;
const int Dimensions = 384; // Common embedding dimension
const int MaxBuildTimeMinutes = 10;
var database = new VectorDatabase();
var vectors = GenerateLargeTestDataset(VectorCount, Dimensions);
using var memoryMonitor = new MemoryUsageMonitor();
var stopwatch = Stopwatch.StartNew();
// Act
foreach (var vector in vectors)
database.Vectors.Add(vector);
await database.RebuildSearchIndexAsync(SearchAlgorithm.HNSW);
stopwatch.Stop();
// Assert
Assert.That(stopwatch.Elapsed, Is.LessThan(TimeSpan.FromMinutes(MaxBuildTimeMinutes)));
Assert.That(memoryMonitor.PeakMemoryMB, Is.LessThan(4000)); // 4GB limit
Assert.That(database.Count, Is.EqualTo(VectorCount));
// Verify search functionality
var query = vectors.First();
var results = database.Search(query, 10, SearchAlgorithm.HNSW);
Assert.That(results.Count, Is.EqualTo(10));
}Performance Metrics
- Build time per vector (target: < 1ms average)
- Memory efficiency (target: < 50 bytes per vector overhead)
- Search latency with large indexes (target: < 100ms for k=10)