Storebrand have moved on from Meltano, and we're therefore no longer maintaining this repository.
Inline mapper for splitting documents and calculating OpenAI embeddings, for purposes of building vectorstore knowledge base usable by GPT and ChatGPT. Split documents into segments, then vectorize.
This version of the mapper is a small modification of the original map-gpt-embeddings mapper, modified to work with the Azure OpenAI API.
Authentication can be either via API key or native Azure authentication either through managed identity or DefaultAzureCredential. If no authentication is provided, the mapper will attempt to use DefaultAzureCredential to authenticate with Azure, otherwise specify either openai_api_key for authentication with a Bearer token, or msi_client_id to authenticate with a specific managed identity.
Built with the Meltano Singer SDK.
stream-maps
| Setting | Required | Default | Description |
|---|---|---|---|
| document_text_property | False | page_content | The name of the property containing the document text. |
| document_metadata_property | False | None | The name of the property containing the document metadata. |
| openai_api_key | False | None | OpenAI API key. Optional if OPENAI_API_KEY env var is set. |
| msi_client_id | False | None | Azure Managed Identity for authentication |
| use_msi | False | 0 | Use Azure Managed Identity for authentication |
| api_endpoint | False | https://api.openai.com | Azure OpenAI API Endpoint |
| deployment_name | False | None | Azure OpenAI Deployment Name |
| stream_maps | False | None | Config object for stream maps capability. For more information check out Stream Maps. |
| stream_map_config | False | None | User-defined config values to be used within map expressions. |
A full list of supported settings and capabilities is available by running: map-openai-embeddings --about
The demo project that originally used this mapper https://github.com/MeltanoLabs/gpt-meltano-demo.
A full list of supported settings and capabilities for this tap is available by running:
map-gpt-embeddings --aboutThis Singer tap will automatically import any environment variables within the working directory's
.env if the --config=ENV is provided, such that config values will be considered if a matching
environment variable is set either in the terminal context or in the .env file.
You will need an OpenAI API Key to calculate embeddings using OpenAI's models. Free accounts are rate limited to 60 calls per minute. This is different from ChatGPT Plus account and requires a per-API call billing method established with OpenAI.
You can easily run map-gpt-embeddings by itself or in a pipeline using Meltano.
map-gpt-embeddings --version
map-gpt-embeddings --help
map-gpt-embeddings --config CONFIG --discover > ./catalog.jsonFollow these instructions to contribute to this project.
pipx install poetry
poetry installCreate tests within the map_gpt_embeddings/tests subfolder and
then run:
poetry run pytestYou can also test the map-gpt-embeddings CLI interface directly using poetry run:
poetry run map-gpt-embeddings --helpTesting with Meltano
Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.
Next, install Meltano (if you haven't already) and any needed plugins:
# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd map-gpt-embeddings
meltano installNow you can test and orchestrate using Meltano:
# Test invocation:
meltano invoke map-gpt-embeddings --version
# OR run a test `elt` pipeline:
meltano run tap-smoke-test map-gpt-embeddings target-jsonlSee the dev guide for more instructions on how to use the SDK to develop your own taps and targets.