This project integrates ChromaDB, Google Gemini AI, and YouTube Transcript API to handle vector embeddings, text processing, and CRUD operations. It supports semantic search, text embeddings, and AI-generated keynotes from video transcripts.
- ChromaDB integration for vector storage
- Google Gemini AI for text embeddings and summarization
- YouTube Transcript API to fetch video transcripts
- CRUD operations (Create, Read, Update, Delete) on embeddings
- Persistent and ephemeral database modes using
chromadb - File handling for storing transcripts and notes
Ensure you have Python installed, then run:
pip install -r requirements.txtCreate a .env file with:
GOOGLE_API_KEY=your_gemini_api_keypython main.pyDefined in utils.py, the get_client() function initializes a ChromaDB client:
def get_client(client_type=None, path=None):
if client_type.lower() != 'persistent' and path is None:
return chromadb.EphemeralClient()
return chromadb.PersistentClient(path=path)- Persistent Mode: Saves embeddings for future retrieval.
- Ephemeral Mode: Stores data only for the current session.
Collections store vector embeddings and text data.
def get_or_create_collection(client, name='my_collection', embedding_function=None, data_loader=None):
return client.get_or_create_collection(
name=name,
embedding_function=embedding_function,
data_loader=data_loader
)Extracts keynotes from a transcript using Gemini AI:
response = genai_model.generate_content(prompt + transcript, stream=False)
with open(notes_path, "w") as file:
file.write(response.text)transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=['en', 'en-US'])
formatted_transcript = TextFormatter().format_transcript(transcript)response = genai_model.generate_content("Extract keynotes: " + transcript)
collection.upsert(ids=[video_id], documents=[response.text])- Insert data:
collection.upsert() - Retrieve data:
collection.get() - Update existing data:
collection.update() - Delete data:
collection.delete()
- ChromaDB Docs: https://github.com/chroma-core/chroma
- Google Gemini AI: https://ai.google.dev/
- YouTube Transcript API: https://pypi.org/project/youtube-transcript-api/
This project is licensed under the GNU GENERAL PUBLIC LICENSE - see the LICENSE file for details.
For any inquiries or contributions, please feel free to reach out.
- GitHub Profile: kivanc57
- Email: kivancgordu@hotmail.com