A minimal agentic search and knowledge retrieval system that can be automated using CRON jobs and outputted to a knowledge management system like Notion, Obsidian, or other note-taking apps.
If you are reading online content for personal or professional reasons, it can be difficult to stay up-to-date on all of them. There are blogs, articles, newsletters, publications, websites, and virtually endless other sources that publish content on a daily basis - we simply don't have enough time to read and filter through it all.
This project is a minimal agentic search and knowledge retrieval system using Agno, Pydantic, and Exa. The goal is to provide a simple, yet effective, way to search for, fetch and summarize relevant content based on user interests and context - automatically.
This is for anyone who wants to build a custom search and knowledge retrieval system for their specific needs. It is not a general-purpose search "engine". It can be used for personal knowledge management, professional research, or any other use case where you need to stay up-to-date on specific topics, but automated daily or weekly based on your interests and context - saving you a ton of time.
General-purpose search engines are great for finding information using key words, but they are not very good at understanding the context of the user's and their personal interests. This leads to a lot of irrelevant results. This project combines semantic search and agentic summarization to provide an automatic and entirely catered "RSS feed".
This project uses Exa, a semantic search engine that can search using long query descriptions to find relevant content. In essence, the more descriptive and exhaustive your descriptions are, the more "semantically" relevant the search results will be. The search results are then filtered and ranked based on the user's interests and context. The filtered results are then fetched and summarized using Agno, a framework for building AI Agents. The summarized content can then be outputted to a file or integrated with a knowledge management system like Notion, Obsidian, or other note-taking apps.
- Python 3.8+
- OpenAI API Key
- Exa API Key
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export OPENAI_API_KEY=your_openai_api_key
export EXA_API_KEY=your_exa_api_keyThe project uses the user_config.yaml file (rename the user_config_example.yaml file) to instantiate the user's interests, context, and filters. These properties will be used to generate the search query, filter and rank the search results.
The user's interests are defined as a list of topics, each with a description and a list of domains to search, as well as an optional example response. The description is used to generate a more personalized and relevant search query through Exa. The domains are used to limit the search to specific sources.
-
topic: The topic of the interest. This is primarily used to categorize the interests in the output. -
description: The description of the interest. This should be a long-form description of the interest. It is recommended to include specific keywords and phrases that are relevant to the interest. This will be used to generate the search query through Exa. -
domain: A list of domains to search. This is used to limit the search to specific sources. It is not required, but will help narrow down the search results. This will only be used if the--no-domainflag is not passed to the CLI. -
example_response: An example response that the user would like to see. For example if you have two interests, like "Nutritional Supplement Research" and "Agentic AI", you may want to provide an example response for each interest to help the summarization agent understand the context of each interest. This is not necessary, but it is recommended to provide more personalized and relevant output.
-
exclude_keywords: A list of keywords to exclude from the search results. This is useful for excluding sponsored content or advertisements. This will be used to filter the search results through Exa. -
min_relevance_score: The minimum relevance score for a search result to be included. This will be used to filter the search results through Exa. Generally it is not recommended to change this value.
user_context: A description of the user's context. This should include information about the user's role, goals, and interests. This will be used to generate the summary prompt through Agno that will then be compared to the content fetched through Exa.
python scripts/main.py--no-domain: Ignore domain filtering and search all sources--limit <number>: Limit the number of content items processed (for testing)--weekly: Filter results to only include content published in the last 7 days--daily: Filter results to only include content published in the last 24 hours
- Add Exa search query to include daily or weekly results (Added 02/08/2025)
- Add arguments for filtering by date (Added 02/08/2025)
- Add arguments for verbose logging
- Add documentation for Pydantic models
- Add options to ExaSearchTool to make it more configurable
- Improve summarization using DSPy
- Implement Notion integration
- Implement Trend detection
Contributions are welcome! Please feel free to submit a Pull Request. The project is meant to be a starting point for building a custom search system, so feel free to modify it to your needs in any way you see fit.
If you have any questions or feedback, please feel free to contact me at kallemickelborg@gmail.com