LogRAG is a Retrieval-Augmented Generation (RAG)-based framework that helps identify and explain system anomalies and potential cyberattacks by analyzing log data using large language models (LLMs). It allows security teams to ask high-level questions about logs and receive intelligent, context-aware summaries of suspicious behavior.
- 🚨 Detects anomalies like:
- Repeated failed logins
- Suspicious IP access patterns
- Port scans, probing attempts, DDoS spikes
- Abnormal command executions or privilege escalations
- 🤖 Uses LLMs to explain logs in human-readable form
- 🔍 Semantic log retrieval based on query context
- 📂 Supports multi-source logs (e.g., auth logs, system logs, network traffic logs)
- 🛠 Easily extendable for custom threat patterns
-
Log Ingestion
Raw logs are collected from various sources (e.g., syslogs, firewall, SSH, nginx). -
Preprocessing
Logs are parsed and normalized into a consistent structure. -
Embedding
Logs are converted into vector embeddings using sentence transformers or OpenAI embeddings. -
Vector Indexing
Stored in a vector database like FAISS for fast retrieval. -
Query + RAG
- User asks: "Show unusual access patterns in the last 24 hours"
- Top-k relevant logs are retrieved
- An LLM (e.g., GPT) summarizes potential anomalies and flags risks
- 🔓 Detect brute-force login attempts
- 🌐 Spot sudden traffic spikes from unknown IPs
- 🐚 Find unauthorized shell commands
- ⚙️ Analyze logs during an incident response