This project demonstrates how to use LangGraph with tool integration for document and image analysis, using Google Gemini models via LangChain. It is designed for educational purposes and showcases how to build a simple agent that can:
- Extract text from images using a vision-capable LLM
- Perform basic computations (e.g., division)
- Route tasks between an assistant and tool nodes in a graph structure
- Vision Tool: Extracts text from images (e.g., meal plans, notes)
- Computation Tool: Performs division operations
- Graph-based Agent: Uses LangGraph to manage control flow between assistant and tools
- Notebook-based: All code and examples are in a Jupyter notebook for easy experimentation
-
Extract Text from an Image
- Place an image (e.g.,
Batman_training_and_meals.png) in the project directory. - Run the notebook cell to extract text from the image using the
extract_texttool.
- Place an image (e.g.,
-
Ask Questions or Perform Calculations
- Example: "Divide 6790 by 5" or "What should I buy for the dinner menu according to the note?"
- The agent will use the appropriate tool and return the answer.
- Clone the repository and install dependencies:
pip install -r requirements.txt
- Set up your Google API key (and Langfuse keys if using tracing):
- Create a
.envfile with:GOOGLE_API_KEY=your_google_api_key LANGFUSE_PUBLIC_KEY=your_langfuse_public_key LANGFUSE_SECRET_KEY=your_langfuse_secret_key
- Create a
- Open
doc-analysis-graph.ipynbin Jupyter and follow the cells.
- The project is structured to show how to combine LLMs, tool use, and graph-based control flow.
- The agent ("Alfred") is designed to be extendable with more tools or logic.
- The code is commented for clarity and learning.
- LangGraph Documentation
- LangChain Google GenAI
- Google Gemini API
- Hugging Face Agents Course – educational material
For educational use only.