feat: support custom projector callback in `compute_text_projection` by peter-gy · Pull Request #170 · apple/embedding-atlas

peter-gy · 2026-03-12T08:52:57Z

compute_text_projection currently handles the full text pipeline: text -> embeddings -> UMAP projection. That works well for the built-in litellm and sentence_transformer providers, but it makes it difficult to reuse embeddings that were already computed elsewhere. A common case is storing embeddings in systems like ChromaDB or LanceDB and wanting to explore them with Embedding Atlas.

Today, doing that requires users either

to recompute embeddings inside compute_text_projection or
to reimplement the UMAP and nearest-neighbor computation themselves.

This PR allows the text_projector arg of compute_text_projection to be a TextProjectorCallback in addition to the built-in string options. The callback can return precomputed embeddings or use any custom embedding implementation. Embedding Atlas still handles the rest of the projection flow, so users can plug in their own embeddings without giving up the existing projection setup and caching behavior.

…ion`

donghaoren · 2026-03-12T16:37:11Z

Umm, have you tried the compute_vector_projection function? That takes pre-computed vectors as input.

peter-gy · 2026-03-12T16:52:58Z

Oh wow, I have no idea how I missed that. I've been using compute_text_projection all along and totally ignored compute_vector_projection. Thanks!

I still see value in providing users the flexibility to specify a custom projector callback through this PR, but if that's not something you would like to support for now, please feel free to close this.

domoritz · 2026-03-12T17:27:32Z

What use case would a custom callback have? We probably want to avoid having two ways to do the same thing (https://peps.python.org/pep-0020/).

peter-gy · 2026-03-12T17:44:52Z

What use case would a custom callback have? We probably want to avoid having two ways to do the same thing (https://peps.python.org/pep-0020/).

I see two main use cases:

using embedding models which are not exposed by sentence_transformers or litellm
invoking embedding models through a specific library not supported by Embedding Atlas, such as Pydantic AI or ChromaDB where they support instantiating embedding functions from a JSON manifest across Python, TS and Rust

domoritz · 2026-03-12T17:53:47Z

Ah, I see. So instead of computing everything in compute_vector_projection, you would provide a callback function from some library/service to process a single projection.

peter-gy · 2026-03-12T17:58:34Z

Ah, I see. So instead of computing everything in compute_vector_projection, you would provide a callback function from some library/service to process a single projection.

Exactly. The TextProjectorCallback takes batch_size as an argument too, so it can compute projections in batches and technically does not need to process a single projection. This can be also useful in cases where some lab releases a new model (as Google did with Gemini Embedding 2.0 with multimodal inputs) and some features are only available properly with full support through their own SDK. In that case, users might want to just quickly wrap a google genai client call into a TextProjectorCallback. Remotely similar to how LLOOM supports custom LLM and Embedding APIs (https://stanfordhci.github.io/lloom/about/custom-models.html#ex-setup-functions-for-gemini)

donghaoren · 2026-03-12T22:09:28Z

Thanks for the explanation!

It looks like in _projection_for_texts the preprocessing is minimal, and everything is almost directly passed into TextProjectorCallback. There's a caching mechanism which could be useful, but as in "logger.warning" part of the PR says, the cache cannot guarantee that a custom TextProjectorCallback is properly hashed (I think even with a known function name, the user may still change the function contents, e.g., in a notebook), so it may result in stale cached results. Given that the cache won't be very reliable and it's not a lot more work to do:

df["vector"] = custom_text_projector(df["texts"], batch_size=batch_size, model=model)
compute_vector_projection(df, vector="vector", ...)

than

compute_text_projection(df, text="text", text_projector=custom_text_projector)

(unless the above doesn't work and it's tricky to pass the vectors around in the current API). I think it's probably better to not add this option?

btw, I think multimodal inputs is interesting to support, maybe we should just have one compute_projection function that can take multiple modalities (or perhaps even data with mixed modalities).

peter-gy · 2026-03-13T07:26:07Z

Thanks, that makes sense. Given the caching caveats around arbitrary callbacks, I agree this would probably add more confusion than value right now, so I'm closing the PR.

btw, I think multimodal inputs is interesting to support, maybe we should just have one compute_projection function that can take multiple modalities (or perhaps even data with mixed modalities).

I'm experimenting with multimodal embeddings at the moment, so I'd be happy to circle back once I have a better sense of what kind of support would actually be useful in Embedding Atlas.

feat: support custom text projector callback in `compute_text_project…

6cc1bce

…ion`

peter-gy closed this Mar 13, 2026

peter-gy deleted the feat/custom-text-projector-callback branch March 13, 2026 07:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support custom projector callback in `compute_text_projection`#170

feat: support custom projector callback in `compute_text_projection`#170
peter-gy wants to merge 1 commit intoapple:mainfrom
peter-gy:feat/custom-text-projector-callback

peter-gy commented Mar 12, 2026

Uh oh!

donghaoren commented Mar 12, 2026

Uh oh!

peter-gy commented Mar 12, 2026

Uh oh!

domoritz commented Mar 12, 2026

Uh oh!

peter-gy commented Mar 12, 2026 •

edited

Loading

Uh oh!

domoritz commented Mar 12, 2026

Uh oh!

peter-gy commented Mar 12, 2026 •

edited

Loading

Uh oh!

donghaoren commented Mar 12, 2026

Uh oh!

peter-gy commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

peter-gy commented Mar 12, 2026

Uh oh!

donghaoren commented Mar 12, 2026

Uh oh!

peter-gy commented Mar 12, 2026

Uh oh!

domoritz commented Mar 12, 2026

Uh oh!

peter-gy commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

domoritz commented Mar 12, 2026

Uh oh!

peter-gy commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

donghaoren commented Mar 12, 2026

Uh oh!

peter-gy commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peter-gy commented Mar 12, 2026 •

edited

Loading

peter-gy commented Mar 12, 2026 •

edited

Loading