-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
This can work after k-means has been implemented (see #1), but can be very slow for large datasets.
The idea is the following:
- we do k-means summarization, and we build the null models as usual.
- We permute with sampling the centroids, meaning we only take a subset of centroids to estimate the actual pHD between batches, and we do this several times (100? 1000?) to generate a probability of being a Hausdorff centroid (conditioned on the probability of being sampled, which is uniform)
- We then measure the distance of each cell to its closest Hausdorff centroid
- Each cell is assigned a "batch effect value" which is the distance to the Hausdorff centroid * the probability of that particular centroid
TODO
- code the thing: ⌛️
- check it makes sense (comparison with CellMixS?): ⌛️