A simple memory-enabled LLM Agent capable of handling tool calls. It communicates with large language models through the Ollama API and can use predefined tools (weather, time, reservation, etc.).
This project is designed to run on Linux systems (tested on Ubuntu 24.04.3 LTS) and requires the following dependencies:
- ✅ Node.js v18.19.1 – Installed and available in your PATH
- 📦 npm v9.2.0 – Comes bundled with Node.js, required for package management
- 🐳 Docker v28.3.3 (Docker Desktop) – Container runtime environment
git clone https://github.com/EmreMutlu99/Llama-Based-Agent.git
cd Llama-Based-Agent
npm installcd ollama
docker compose up -d
- You can run one of the following demo scripts depending on what you want to test:
# General demo
node src/main.js
# Direct tool call tests (weather, time, reservation)
node src/tools/tools-test.js
# Conversation threads + memory test
node src/threads-test.js
mkdir demo-app && cd demo-app
npm init -ynpm i ../Llama-Based-Agentconst { Agent, SimpleMemory } = require('llama-based-agent');
(async () => {
// Agent başlat
const agent = new Agent();
// Yeni bir thread oluştur
const { thread_id } = await agent.new_thread({ source: 'cli-demo', label: 'full-flow' });
console.log('THREAD:', thread_id);
// --- (Hafıza) Kendini tanıt + isim bilgisini ver
let q = "Hi! My name is Omer. Keep replies short.";
let r = await agent.generate({ input: q, thread_id });
console.log('Q1:', q);
console.log('A1:', r.text);
// --- (Hafıza) İsmi hatırlıyor mu?
q = "What is my name?";
r = await agent.generate({ input: q, thread_id });
console.log('Q2:', q);
console.log('A2:', r.text);
// --- Tool: get_weather
q = "What's the weather in Paris?";
r = await agent.generate({ input: q, thread_id, tool_choice: 'auto' });
console.log('Q3:', q);
console.log('A3:', r.text);
// --- Tool: reserve_table (async)
q = "Book a table for 3 tomorrow at 20:00 under Omer, phone 05001234567.";
r = await agent.generate({ input: q, thread_id, tool_choice: 'auto' });
console.log('Q5:', q);
console.log('A5:', r.text);
// --- Genel bilgi sorusu (tool yok)
q = "Explain AI in two short sentences.";
r = await agent.generate({ input: q, thread_id });
console.log('Q7:', q);
console.log('A7:', r.text);
// --- (Hafıza) Sohbetin özetini iste
q = "Summarize our conversation so far in 3 short bullet points.";
r = await agent.generate({ input: q, thread_id });
console.log('Q8:', q);
console.log('A8:', r.text);
})();
node index.js
- Communicates with the LLM, handles tool calls, manages memory.
- JSONL-based lightweight memory layer.
-
get_weather: returns weather from a static LUT.
-
get_time: returns local time from a static LUT.
-
reserve_table: logs a reservation request to an API or a queue.
- Usable in external projects with require('llama-based-agent').
The chart above compares the English response times of different Large Language Models (LLMs) across short, medium, and long scenarios.
-
Warmup runs: A few warmup requests (default: 2) are made before measurement to avoid cold-start latency.
-
Main measurement: Each scenario is executed with multiple runs (default: RUNS=20).
-
Prompt randomization: Each prompt is appended with a nonce (random token) and timestamp to avoid caching effects.
-
Timing: Response time is measured with process.hrtime.bigint() (nanosecond precision).
-
Before switching scenarios or models: docker restart is used. This ensures each test runs under fresh conditions without cached context.
Blue bars (avg_second): Average response time
Red bars (min_second): Fastest response time
Yellow bars (max_second): Slowest response time
This visualization shows how models perform when generating English responses with different input lengths. Some models are more consistent (narrow min–max range), while others show higher variability (large gap between fastest and slowest times).
- I built and ran a 10-task, 31-point Turkish benchmark across 13 local models (0.5B–8B).
- I evaluated language fidelity, instruction following (1–2 sentence limits), memory recall (KV/facts: name=Ömer, language=Türkçe, codename=Atlas, short-reply preference), context use, translation, and simple reasoning.
- I used automated checks per task, summed the scores, and normalized them to percentages for ranking. Mid-size instruct models (e.g., mistral:7b-instruct, llama3.1:8b) were consistently stronger, while very small models struggled with strict Turkish adherence and brevity constraints.
docker compose up -d # start container from compose file (returns terminal immediately)
docker-compose up # useful during development or when you want to follow logs in real time
docker compose down # stops + removes the container
docker compose stop # only stops, does not remove
docker compose logs -f # follow logs (e.g., model download progress)
docker ps # list running containers
docker ps -a # list all containers
docker exec -it ollama bash # enter the container shell
ollama list # list available models inside Ollamanpm install # Installs all dependencies from package.json into node_modules/
node main.js # Runs the example runner directly with Node (bypasses npm scripts)
rm memory*.jsonl # Deletes persisted memory files (e.g., memory.jsonl) to reset conversation/history
npm start # Runs the project’s default start script (aliased to node src/main.js in package.json)