This is a starter project to help you get started with developing a GLiNER-based Metadata-Filtered RAG Research agent using LangGraph in LangSmith Studio.
-
GLiNER is an efficient model used for Named Entity Recognition(NER), Classification and Extraction. It has excellent support for CPU.
-
Since using LLMs to filter unstructured data (Articles, Legal Docs, Reports etc) can be very costly, GLiNER-based Filtered RAG pipeline provide an efficient and robust solution.
-
In
ingestor.py, the data is first chunked into LangChain Documents, these documents are then classified using GLiNER, the classified labels are stored in the document's metadata and then finally document indexing in the VectorDB is performed. -
At the time of retrieval, the LLM sends back multiple (default 3) queries and their corresponding filters(if any), which are then used to retrieve data from the VectorDB.
This project has two graphs:
- a "retrieval" graph (
src/retrieval_graph/graph.py) - a "researcher" subgraph (part of the retrieval graph) (
src/retrieval_graph/researcher_graph/graph.py)
The retrieval graph manages a chat history and responds based on the fetched documents. Specifically, it:
- Takes a user query as input
- Then the researcher subgraph runs these steps:
- it first generates a list of queries (default 3) along with metadata filters (if any).
- it then retrieves the relevant documents in parallel for all queries+filters and return the documents to the LLM.
- Finally, the LLM generates a response based on the retrieved documents and the conversation context.
- Create a
.envfile:
cp .env.example .env- Setup Qdrant:
docker run -p 6333:6333 -p 6334:6334 -v $(pwd)/qdrant_storage:/qdrant/storage:z qdrant/qdrant
Qdrant is a fast vectordb. It has an extensive support for metadata filtering.
- Install Dependencies:
uv sync- Ingest Documents from
./docs:
uv run python ingestor.py
The documents in ./docs are processed versions of the .sgm files of the Reuters-21578 text categorization data collection.
The processing is done using doc_processor.py.
Then run the processor script:
uv run python doc_processor.py
Make sure to provide the right .sgm file here.
- Start Langsmith Studio:
uv run langgraph dev --allow-blocking
- Next, open the
retrieval_graphusing the dropdown in the top-left. Ask it questions about LangChain to confirm it can fetch the required information!
The default values for response_model, query_model are shown below:
response_model: google_genai/gemini-2.0-flash-lite
query_model: google_genai/gemini-2.0-flash-liteTo use Google Gemini's chat models:
- Sign up for an Google AI Studio API key.
- Once you have your API key, add it to your
.envfile:
GOOGLE_API_KEY=your-api-key
The default values for embedding_model are shown below:
embedding_model: fastembed/BAAI/bge-base-en-v1.5You can customize this retrieval agent template in several ways:
-
Modify the embedding model: You can change the embedding model used for document indexing and query embedding by updating the
embedding_modelin the configuration. Options include various fastembed models. -
Customize the response generation: You can modify the
response_system_promptto change how the agent formulates its responses. This allows you to adjust the agent's personality or add specific instructions for answer generation. -
Change the language model: Update the
response_modelin the configuration to use different language models for response generation. Options include various Claude models from Anthropic, as well as models from other providers like Fireworks AI. -
Extend the graph: You can add new nodes or modify existing ones in the
src/retrieval_graph/graph.pyfile to introduce additional processing steps or decision points in the agent's workflow. -
Add tools: Implement tools to expand the researcher agent's capabilities beyond simple retrieval generation.
