Professor Starstuff is a multimodal AI chatbot that makes learning about space fun and interactive for children. This chatbot leverages natural language processing, vector-based retrieval, and podcast-style responses to engage young minds with fascinating space facts.
- π§ Natural Language Processing (NLP): Understands and responds to kids' astronomy questions.
- π Vector-based Knowledge System: Retrieves accurate space facts from YouTube video transcripts.
- π‘ NASA Image API: Fetches real images of celestial objects for better visualization.
- ποΈ Podcast-Style Responses: Generates engaging storytelling audio from text-based answers.
- ποΈ ChromaDB Integration: Efficient search and retrieval of astronomy knowledge.
- π OpenAI TTS: Converts text responses into audio format.
- π Deployabled on Heroku: Django-based backend with an HTML/CSS/JavaScript frontend.
- π’ Django - Main backend framework
- π΅ SQLite (ChromaDB) - Vector database for storing astronomy facts
- π΄ Redis - Cloud memory storage for conversation context
- π£ Heroku - Deployment platform
- π€ GPT-4 & GPT-3.5 Turbo - Language models for chatbot responses
- π ChromaDB - Vector storage for RAG (Retrieval-Augmented Generation)
- π‘ NASA API - Fetches real space images
- π OpenAI TTS - Text-to-speech for podcast-style responses
- π HTML, CSS, JavaScript - Simple, interactive UI
- π¨ Bootstrap - Styling framework
Professor Starstuff is built on a dataset extracted from YouTube astronomy video transcripts:
- Transcript Extraction: Uses
youtube_transcript_apito fetch video transcripts (~8 hours of content). - Chunking Strategy:
- Chunk size: 500 tokens
- Overlap: 100 tokens for better context retention
- Vector Embeddings:
- Uses
text-embedding-3-largefrom OpenAI for high-quality embeddings.
- Uses
- Storage:
- Stored in ChromaDB with metadata (e.g., video titles) for efficient retrieval.
- User Input: Professor Starstuff processes questions and determines if they are related to astronomy.
- Decision Making (GPT-4):
- If the question is astronomical, it proceeds to retrieval.
- If general, it provides a direct response.
- Retrieval & Response Generation:
- ChromaDB fetches relevant facts.
- NASA Image API retrieves space-related images.
- OpenAI TTS converts responses into audio.
- Final Output:
- Provides a text response, space images, and an audio podcast snippet.
- Clone the repository:
git clone https://github.com/Senimtra/astronomy-bot.git cd astronomy-bot - Install dependencies:
pip install -r requirements.txt
- Run the application locally:
python manage.py runserver
- Deploy to Heroku:
heroku create professor-starstuff git push heroku main
Professor Starstuff is continuously evaluated using LangSmith:
- β‘ Inference Time: Measures response speed.
- π Retrieval Efficiency: Ensures accurate fact retrieval.
- π§ Tool Efficiency: API calls (NASA, ChromaDB, OpenAI TTS).
- π Model Selection:
- GPT-4: Best for decision-making.
- GPT-3.5 Turbo: Faster for general responses.
- π‘ Live Space Event Integration: Fetch real-time astronomy news.
- π Voice Interaction: Enable full voice-based conversation.
- π Streaming Responses: Faster and smoother podcast delivery.
- π Educational Quizzes: Make learning more interactive.
- π€ User Profiles: Personalize experience based on learning history.
Made with π for young space explorers! π