Skip to content

Conversation

@wirthual
Copy link
Contributor

Added OpenAI variables which are required to run the example with locally hosted models.

Example got tested with locally hosted embedding model:

docker run -it -p 7997:7997  michaelf34/infinity:latest  v2  --model-id mixedbread-ai/mxbai-embed-large-v1 --device cpu --engine optimum

Together with a chat completion model:

ollama run llama3.2:1b

I was able to execute the Getting Started guide with the following command:

OPENAI_BASE_URL='http://localhost:7997' OPENAI_ENDPOINT='http://localhost:11434/v1/chat/completions' OPENAI_API_KEY='does-not-matter' OPENAI_MODEL="llama3.2:1b" uv run python code.py

Which resulted in the following output:

0.044s -- Using OpenAI
Indexing 3 messages...
Indexed 3 messages.
Got 15 semantic refs.

The used model uses embedding dimensions of 1024, which I adapted by changing DEFAULT_EMBEDDING_SIZE

Added OpenAI variables which are required to run the example with locally hosted models.
Copy link
Collaborator

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have some suggestions. After applying and pushing those I will merge your PR.

## OPENAI environment variables

The (public) OpenAI environment variables include:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I get this blank line back? I like the extra bit of whitespace in the rendered page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

wirthual and others added 2 commits October 19, 2025 13:08
Reverted deletion of new line
Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
@gvanrossum-ms gvanrossum-ms merged commit 6dc7044 into microsoft:main Oct 19, 2025
3 checks passed
@gvanrossum-ms
Copy link
Collaborator

Thanks for your contribution! Really cool you got it to work with a local model.

Regarding the embedding width, I think the correct way currently is to update the dict model_to_embedding_size_and_envvar in typeagent/aitools/embeddings.py with an appropriate entry (you could patch it in from the outside, that variable is not private) and then use code like this:

from typeagent.aitools.embeddings import AsyncEmbeddingModel
from typeagent.knowpro.convsettings import ConversationSettings

async def main():
    embedding_model = AsyncEmbeddingModel(embedding_size=1024, model_name="your-name")
    settings = ConversationSettings(model=embedding_model)
    settings.semantic_ref_index_settings.auto_extract_knowledge = True      # DON'T FORGET THIS
    conversation = await create_conversation("demo.db", TranscriptMessage, settings=settings)
    # ... original code ...

You could argue that it should be possible to construct an AsyncEmbeddingModel instance that doesn't correspond to a table entry. It looks like there's already an argument endpoint_var but no logic that uses it. If you're interested I encourage you to submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants