feat: support custom projector callback in compute_text_projection#170
feat: support custom projector callback in compute_text_projection#170peter-gy wants to merge 1 commit intoapple:mainfrom
compute_text_projection#170Conversation
|
Umm, have you tried the |
|
Oh wow, I have no idea how I missed that. I've been using I still see value in providing users the flexibility to specify a custom projector callback through this PR, but if that's not something you would like to support for now, please feel free to close this. |
|
What use case would a custom callback have? We probably want to avoid having two ways to do the same thing (https://peps.python.org/pep-0020/). |
I see two main use cases:
|
|
Ah, I see. So instead of computing everything in |
Exactly. The |
|
Thanks for the explanation! It looks like in df["vector"] = custom_text_projector(df["texts"], batch_size=batch_size, model=model)
compute_vector_projection(df, vector="vector", ...)than compute_text_projection(df, text="text", text_projector=custom_text_projector)(unless the above doesn't work and it's tricky to pass the vectors around in the current API). I think it's probably better to not add this option? btw, I think multimodal inputs is interesting to support, maybe we should just have one |
|
Thanks, that makes sense. Given the caching caveats around arbitrary callbacks, I agree this would probably add more confusion than value right now, so I'm closing the PR.
I'm experimenting with multimodal embeddings at the moment, so I'd be happy to circle back once I have a better sense of what kind of support would actually be useful in Embedding Atlas. |
compute_text_projectioncurrently handles the full text pipeline: text -> embeddings -> UMAP projection. That works well for the built-inlitellmandsentence_transformerproviders, but it makes it difficult to reuse embeddings that were already computed elsewhere. A common case is storing embeddings in systems like ChromaDB or LanceDB and wanting to explore them with Embedding Atlas.Today, doing that requires users either
compute_text_projectionorThis PR allows the
text_projectorarg ofcompute_text_projectionto be aTextProjectorCallbackin addition to the built-in string options. The callback can return precomputed embeddings or use any custom embedding implementation. Embedding Atlas still handles the rest of the projection flow, so users can plug in their own embeddings without giving up the existing projection setup and caching behavior.