GitHub Action to regenerate OpenAI word embeddings and store them in a Supabase vector store via LangChain. Useful if you have a retrieval-augmented generation (RAG) system and want to update the word embeddings automatically when the knowledge base changes.
Required Github personal access token
Required OpenAI API key
Required Supabase anon key
Required Supabase url
Required GitHub username of the repository owner
Required Name of the repository
Required Path to the directory containing notes content relative to the root path
Required Either nested or flat
nested: path-to-contents points to a list of directories
flat: path-to-contents points to a list of files
Note: please have github-personal-access-token, openai-api-key, supabase-anon-key and supabase-url defined as environment variables. See the section below
- On the GitHub repository you're adding this action to, go to Settings > Environments and create a new environment called
Dev - Add environment variables to the
Devenvironment by following these instructions - Create a
.github/workflowsdirectory in the root of the project - In
.github/workflows, create a file calledregenerate-embeddings.yml - Copy the following YAML into
regenerate-embeddings.yml
name: Regenerate embeddings
run-name: Regenerate embeddings and store in Supabase
on: [push]
jobs:
regenerate-embeddings:
runs-on: ubuntu-latest
environment: Dev
steps:
- name: Regenerate embeddings (flat notes)
uses: K02D/regenerate-embeddings@v2.3
with:
repository-owner-username: "K02D"
repository-name: "retrieval-augmented-generation"
path-to-contents: "notes_flat"
directory-structure: "flat"
github-personal-access-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
openai-api-key: ${{ secrets.OPENAI_API_KEY }}
supabase-anon-key: ${{ secrets.SUPABASE_ANON_KEY }}
supabase-url: ${{ secrets.SUPABASE_URL }}This YAML
- Assumes the environment variables added in step 2 are named
GH_PERSONAL_ACCESS_TOKEN,OPENAI_API_KEY,SUPABASE_ANON_KEY, andSUPABASE_URL - Triggers the action on every push to the
mainbranch
-
Create an OpenAI API key here if you don't have one. Use this for
OPENAI_API_KEY- OpenAI's API is used to generate the word embeddings
-
Create a supabase project here if you don't have one. Once created, go to Project Settings > API to get the project URL and anon api key. Use these for
SUPABASE_URLandSUPABASE_ANON_KEY- Supabase is used to store the word embeddings in a postgres vector database so relevant content is retrieved when a user enters a prompt. This relevant content augments the LLM's response
-
Initialize your database in your supabase project using LangChain's template (ref). On your project dashboard, go to SQL Editor > Quickstarts > LangChain and click RUN