Performance issue: O(n³) complexity in cluster assignment

## Summary

I'm using hdbscan-ts for clustering 6,000+ data points and noticed significant performance issues compared to the Python HDBSCAN implementation.

## Environment

- hdbscan-ts version: 1.0.0
- Node.js version: 20.x
- Data: 6,000+ 512-dimensional embeddings

## Performance Comparison

| Implementation | Time | Speedup |
|----------------|------|---------|
| Python (hdbscan + numba) | ~17 seconds | 220x faster |
| hdbscan-ts | 65+ minutes | baseline |

## Suspected Cause

The performance issue appears to be in the cluster assignment/lookup logic. Based on profiling, the bottleneck seems to be related to repeated linear searches that result in O(n³) complexity.

Specifically, operations that should use Set/Map lookups for O(1) membership checks may be using Array.includes() which is O(n).

## Impact

This makes hdbscan-ts impractical for datasets above ~1,000 points in production scenarios where clustering needs to complete within reasonable timeframes.

## Workaround

I currently use a Python bridge to call the Cython-accelerated Python HDBSCAN implementation, which works well but adds a Python dependency.

## Potential Fix

If the issue is indeed Array.includes() vs Set.has():

```typescript
// Before (O(n))
if (array.includes(item)) { ... }

// After (O(1))
const set = new Set(array);
if (set.has(item)) { ... }
```

## Request

Would you be open to:
1. Investigating the performance bottleneck?
2. Accepting a PR if I identify and fix the issue?

I can provide more detailed profiling data if helpful.

Thank you for creating this library - HDBSCAN in TypeScript is valuable for the ecosystem!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue: O(n³) complexity in cluster assignment #1

Summary

Environment

Performance Comparison

Suspected Cause

Impact

Workaround

Potential Fix

Request

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implementation	Time	Speedup
Python (hdbscan + numba)	~17 seconds	220x faster
hdbscan-ts	65+ minutes	baseline

Performance issue: O(n³) complexity in cluster assignment #1

Description

Summary

Environment

Performance Comparison

Suspected Cause

Impact

Workaround

Potential Fix

Request

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions