Skip to content

kallemickelborg/knowledge-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Knowledge Agent

A minimal agentic search and knowledge retrieval system that can be automated using CRON jobs and outputted to a knowledge management system like Notion, Obsidian, or other note-taking apps.

Introduction

Aim

If you are reading online content for personal or professional reasons, it can be difficult to stay up-to-date on all of them. There are blogs, articles, newsletters, publications, websites, and virtually endless other sources that publish content on a daily basis - we simply don't have enough time to read and filter through it all.

This project is a minimal agentic search and knowledge retrieval system using Agno, Pydantic, and Exa. The goal is to provide a simple, yet effective, way to search for, fetch and summarize relevant content based on user interests and context - automatically.

Who is this for?

This is for anyone who wants to build a custom search and knowledge retrieval system for their specific needs. It is not a general-purpose search "engine". It can be used for personal knowledge management, professional research, or any other use case where you need to stay up-to-date on specific topics, but automated daily or weekly based on your interests and context - saving you a ton of time.

Why not just use a general-purpose search engine?

General-purpose search engines are great for finding information using key words, but they are not very good at understanding the context of the user's and their personal interests. This leads to a lot of irrelevant results. This project combines semantic search and agentic summarization to provide an automatic and entirely catered "RSS feed".

How does it work, technically?

This project uses Exa, a semantic search engine that can search using long query descriptions to find relevant content. In essence, the more descriptive and exhaustive your descriptions are, the more "semantically" relevant the search results will be. The search results are then filtered and ranked based on the user's interests and context. The filtered results are then fetched and summarized using Agno, a framework for building AI Agents. The summarized content can then be outputted to a file or integrated with a knowledge management system like Notion, Obsidian, or other note-taking apps.

Table of Contents

Tech Stack

  • Agno for agentic summarization
  • Pydantic for data modeling
  • Exa for semantic search

Requirements

  • Python 3.8+
  • OpenAI API Key
  • Exa API Key

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export OPENAI_API_KEY=your_openai_api_key
export EXA_API_KEY=your_exa_api_key

Usage

Prerequisites

The project uses the user_config.yaml file (rename the user_config_example.yaml file) to instantiate the user's interests, context, and filters. These properties will be used to generate the search query, filter and rank the search results.

The user's interests are defined as a list of topics, each with a description and a list of domains to search, as well as an optional example response. The description is used to generate a more personalized and relevant search query through Exa. The domains are used to limit the search to specific sources.

Configuring the user's interests

Interests object

  • topic: The topic of the interest. This is primarily used to categorize the interests in the output.

  • description: The description of the interest. This should be a long-form description of the interest. It is recommended to include specific keywords and phrases that are relevant to the interest. This will be used to generate the search query through Exa.

  • domain: A list of domains to search. This is used to limit the search to specific sources. It is not required, but will help narrow down the search results. This will only be used if the --no-domain flag is not passed to the CLI.

  • example_response: An example response that the user would like to see. For example if you have two interests, like "Nutritional Supplement Research" and "Agentic AI", you may want to provide an example response for each interest to help the summarization agent understand the context of each interest. This is not necessary, but it is recommended to provide more personalized and relevant output.

Filters object

  • exclude_keywords: A list of keywords to exclude from the search results. This is useful for excluding sponsored content or advertisements. This will be used to filter the search results through Exa.

  • min_relevance_score: The minimum relevance score for a search result to be included. This will be used to filter the search results through Exa. Generally it is not recommended to change this value.

Context object

  • user_context: A description of the user's context. This should include information about the user's role, goals, and interests. This will be used to generate the summary prompt through Agno that will then be compared to the content fetched through Exa.

Execution

python scripts/main.py

CLI Arguments

  • --no-domain: Ignore domain filtering and search all sources
  • --limit <number>: Limit the number of content items processed (for testing)
  • --weekly: Filter results to only include content published in the last 7 days
  • --daily: Filter results to only include content published in the last 24 hours

To-dos

  • Add Exa search query to include daily or weekly results (Added 02/08/2025)
  • Add arguments for filtering by date (Added 02/08/2025)
  • Add arguments for verbose logging
  • Add documentation for Pydantic models
  • Add options to ExaSearchTool to make it more configurable
  • Improve summarization using DSPy
  • Implement Notion integration
  • Implement Trend detection

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. The project is meant to be a starting point for building a custom search system, so feel free to modify it to your needs in any way you see fit.

Contact

If you have any questions or feedback, please feel free to contact me at kallemickelborg@gmail.com

About

AI Agent for information retrieval and knowledge consolidation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages