An advanced log classification system that combines three complementary approaches to handle varying levels of complexity in log patterns. This project provides a flexible and effective solution for processing predictable, complex, and poorly-labeled data patterns with real-time capabilities and analytics insights.
- Regular Expression (Regex): Handles simplified and predictable patterns using predefined rules
- Sentence Transformer + Logistic Regression: Manages complex patterns with sufficient training data using embeddings
- LLM (Large Language Models): Handles complex patterns when labeled training data is insufficient using Groq API 4.Real-time Log Streaming: WebSocket support for live log classification as logs arrive 5.Analytics Dashboard: Comprehensive statistics, trends, and insights about classified logs
- Confidence Scores: Get classification confidence levels for better decision-making
- Python 3.8 or higher
- pip package manager
-
Clone the repository:
git clone https://github.com/labdhiongithub7/log_classification.git cd log_classification -
Create a virtual environment (recommended):
python -m venv venv # On Windows venv\Scripts\activate # On Linux/Mac source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables: Create a
.envfile in the root directory:GROQ_API_KEY=your_groq_api_key_here
Get your API key from Groq Console
-
Download models (if needed): The BERT model will be downloaded automatically on first use. Ensure the
models/directory containslog_classifier.joblib.
Start the FastAPI server:
uvicorn server:app --reloadThe server will be available at:
- Main endpoint: http://127.0.0.1:8000
- Interactive API docs: http://127.0.0.1:8000/docs
- Alternative docs: http://127.0.0.1:8000/redoc