Skip to content

Llama Agent Kit is an open-source Node.js toolkit for building memory-enabled AI agents powered by Ollama. It supports conversation threads, lightweight JSONL memory, and tool calls (e.g., weather, time, reservations), making it easy to integrate LLM agents into your own projects.

Notifications You must be signed in to change notification settings

EmreMutlu99/Ollama-Agent-Kit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ollama-Agent-Kit

A simple memory-enabled LLM Agent capable of handling tool calls. It communicates with large language models through the Ollama API and can use predefined tools (weather, time, reservation, etc.).


Requirements

This project is designed to run on Linux systems (tested on Ubuntu 24.04.3 LTS) and requires the following dependencies:

  • ✅ Node.js v18.19.1 – Installed and available in your PATH
  • 📦 npm v9.2.0 – Comes bundled with Node.js, required for package management
  • 🐳 Docker v28.3.3 (Docker Desktop) – Container runtime environment

Installation

1- Clone the repository:

git clone https://github.com/EmreMutlu99/Llama-Based-Agent.git 
cd Llama-Based-Agent
npm install

2- Start Ollama with Docker Compose

cd ollama
docker compose up -d
image

3- Run it

  • You can run one of the following demo scripts depending on what you want to test:
# General demo
node src/main.js
image
# Direct tool call tests (weather, time, reservation)
node src/tools/tools-test.js
image
# Conversation threads + memory test
node src/threads-test.js
image

Usage in an External Project

1- Create a new project

mkdir demo-app && cd demo-app
npm init -y

2- Add this library as a local dependency

npm i ../Llama-Based-Agent

3- Create index.js

const { Agent, SimpleMemory } = require('llama-based-agent');

(async () => {
  // Agent başlat
  const agent = new Agent(); 

  // Yeni bir thread oluştur
  const { thread_id } = await agent.new_thread({ source: 'cli-demo', label: 'full-flow' });
  console.log('THREAD:', thread_id);

  // --- (Hafıza) Kendini tanıt + isim bilgisini ver
  let q = "Hi! My name is Omer. Keep replies short.";
  let r = await agent.generate({ input: q, thread_id });
  console.log('Q1:', q);
  console.log('A1:', r.text);

  // --- (Hafıza) İsmi hatırlıyor mu?
  q = "What is my name?";
  r = await agent.generate({ input: q, thread_id });
  console.log('Q2:', q);
  console.log('A2:', r.text);

  // --- Tool: get_weather
  q = "What's the weather in Paris?";
  r = await agent.generate({ input: q, thread_id, tool_choice: 'auto' });
  console.log('Q3:', q);
  console.log('A3:', r.text);

  // --- Tool: reserve_table (async)
  q = "Book a table for 3 tomorrow at 20:00 under Omer, phone 05001234567.";
  r = await agent.generate({ input: q, thread_id, tool_choice: 'auto' });
  console.log('Q5:', q);
  console.log('A5:', r.text);

  // --- Genel bilgi sorusu (tool yok)
  q = "Explain AI in two short sentences.";
  r = await agent.generate({ input: q, thread_id });
  console.log('Q7:', q);
  console.log('A7:', r.text);

  // --- (Hafıza) Sohbetin özetini iste
  q = "Summarize our conversation so far in 3 short bullet points.";
  r = await agent.generate({ input: q, thread_id });
  console.log('Q8:', q);
  console.log('A8:', r.text);

})();

4- Run it

node index.js
image

Features

Agent class:

  • Communicates with the LLM, handles tool calls, manages memory.

SimpleMemory:

  • JSONL-based lightweight memory layer.

Tools:

  • get_weather: returns weather from a static LUT.

  • get_time: returns local time from a static LUT.

  • reserve_table: logs a reservation request to an API or a queue.

Easy integration:

  • Usable in external projects with require('llama-based-agent').

📊 Benchmark Chart (English Response Times)

LLM Model Comparison (Short / Medium / Long Responses)

The chart above compares the English response times of different Large Language Models (LLMs) across short, medium, and long scenarios.

Execution Flow
  • Warmup runs: A few warmup requests (default: 2) are made before measurement to avoid cold-start latency.

  • Main measurement: Each scenario is executed with multiple runs (default: RUNS=20).

  • Prompt randomization: Each prompt is appended with a nonce (random token) and timestamp to avoid caching effects.

  • Timing: Response time is measured with process.hrtime.bigint() (nanosecond precision).

  • Before switching scenarios or models: docker restart is used. This ensures each test runs under fresh conditions without cached context.

Blue bars (avg_second): Average response time
Red bars (min_second): Fastest response time
Yellow bars (max_second): Slowest response time
image

This visualization shows how models perform when generating English responses with different input lengths. Some models are more consistent (narrow min–max range), while others show higher variability (large gap between fastest and slowest times).

image

LLM Model Comparison (Turkish / Instruction / Memory)

  • I built and ran a 10-task, 31-point Turkish benchmark across 13 local models (0.5B–8B).
  • I evaluated language fidelity, instruction following (1–2 sentence limits), memory recall (KV/facts: name=Ömer, language=Türkçe, codename=Atlas, short-reply preference), context use, translation, and simple reasoning.
  • I used automated checks per task, summed the scores, and normalized them to percentages for ranking. Mid-size instruct models (e.g., mistral:7b-instruct, llama3.1:8b) were consistently stronger, while very small models struggled with strict Turkish adherence and brevity constraints.
image

Extra Notes & Useful Commands

Docker

docker compose up -d     # start container from compose file (returns terminal immediately)
docker-compose up        # useful during development or when you want to follow logs in real time
docker compose down      # stops + removes the container
docker compose stop      # only stops, does not remove

docker compose logs -f   # follow logs (e.g., model download progress)
docker ps                # list running containers
docker ps -a             # list all containers

docker exec -it ollama bash   # enter the container shell
ollama list                   # list available models inside Ollama

Node.js & Scripts

npm install      # Installs all dependencies from package.json into node_modules/
node main.js     # Runs the example runner directly with Node (bypasses npm scripts)

rm memory*.jsonl          # Deletes persisted memory files (e.g., memory.jsonl) to reset conversation/history
npm start                 # Runs the project’s default start script (aliased to node src/main.js in package.json)

About

Llama Agent Kit is an open-source Node.js toolkit for building memory-enabled AI agents powered by Ollama. It supports conversation threads, lightweight JSONL memory, and tool calls (e.g., weather, time, reservations), making it easy to integrate LLM agents into your own projects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •