Distributed-logging-system

[ Note: This repo project has some ongoing updates by building more robust system using RabbitMq and some ML based log anomoly detection!! Stay tuned ]

This repository contains a set of microservices for inventory management, order processing, payment, and shipping, along with a Pub-Sub system for communication and monitoring. It also includes an alerting system and Kibana for log visualization.

Getting Started

Follow these steps to set up and run the project locally.

Prerequisites

Ensure you have the following installed:installation guide is included in README

Python 3.12 or higher
pip (Python package manager)
Kafka
Fluentd
ElasticSearch
Kibana

Installation

Clone the repository:

git clone https://github.com/Cloud-Computing-Big-Data/RR-Team-48-distributed-logging-system.git
cd Cloud-Computing-Big-Data/RR-Team-48-distributed-logging-system

Install required Python packages for all services and the Pub-Sub system:
```
pip install -r requirements.txt
```

[ Debug if any error occurs or try with --break-system-packages ]

Running the Services

From the parent directory, you can start each service and the Pub-Sub system.

Inventory Service:

python inventory_service/inventory_service.py

Order Service:
```
python order_service/order_service.py
```

Payment Service:

python payment_service/payment_service.py

Shipping Service:

python shipping_service/shipping_service.py

Pub-Sub System (handles communication and logging):
```
python PUB-SUB/app.py
```

Monitoring and Alerting

Open your browser and navigate to http://localhost:5000 for the alerting system.
For log visualization with Kibana, ensure your Elasticsearch and Kibana services are running, then open http://localhost:5601.

Additional Notes

Configuration files for Fluentd and Elasticsearch are included in the root directory. You may need to update fluentd.conf and update_conf.sh to match your environment.
Log data is managed using log_accumulator.py in each service and aggregated in the Pub-Sub system.

To install Fluentd and set it up for use with Python (using the fluent-logger library), here’s how you can include the installation and configuration steps in your README:

Installation of Fluentd

Install Fluentd using the package manager for your operating system.

On Ubuntu/Debian:

# Install dependencies
sudo apt-get update
sudo apt-get install -y sudo gnupg2 curl

# Add the Fluentd APT repository
curl -fsSL https://packages.fluentd.org/fluentd-apt-source.sh | sudo bash

# Install Fluentd
sudo apt-get install -y fluentd

On CentOS/RHEL:

# Install Fluentd
sudo yum install -y https://packages.fluentd.org/fluentd.rpm

On macOS (via Homebrew):

brew install fluentd

Verify Fluentd installation:

After installation, you can verify that Fluentd is installed correctly by checking the version:
```
fluentd --version
```
You should see the Fluentd version displayed in the terminal.

Fluentd Configuration

Fluentd Configuration File:

Fluentd’s configuration is typically stored in /etc/fluent/fluentd.conf. You'll need to define input sources, output destinations, and possibly filters for your use case.

Example fluentd.conf for a simple logging setup:
```
# fluentd.conf

# Input Plugin (e.g., listening on a TCP port for logs)
<source>
  @type tcp
  port 24224
  bind 0.0.0.0
  tag fluentd.test
</source>

# Output Plugin (e.g., sending logs to Elasticsearch)
<match fluentd.test>
  @type elasticsearch
  host localhost
  port 9200
  logstash_format true
  index_name fluentd-test
</match>
```
In this configuration:
- Input Plugin: Fluentd listens for logs over TCP on port 24224.
- Output Plugin: Logs are sent to an Elasticsearch instance running locally on port 9200.
Restart Fluentd to apply the configuration:

After updating the configuration, restart Fluentd to apply changes.
```
sudo service fluentd restart
```

Additional Setup (For Fluentd with Python)

To connect Fluentd with your Python application, you'll need the fluent-logger library. If you have already added it to requirements.txt, you can install it as follows:

Fluentd Installation and Configuration

1. Install Fluentd

Follow the instructions below to install Fluentd:

On Ubuntu/Debian:

sudo apt-get update
sudo apt-get install -y sudo gnupg2 curl
curl -fsSL https://packages.fluentd.org/fluentd-apt-source.sh | sudo bash
sudo apt-get install -y fluentd

On macOS:

brew install fluentd

Verify Fluentd Installation:

fluentd --version

Configuring Fluentd

After installation run the bash (.sh) file provided for fluentd installation

chmod +x update_conf.sh
./update_conf.sh

Kafka Installation Guide

Follow the steps below to install and set up Kafka on your system.

Prerequisites

Java 8 or later must be installed on your system.
Zookeeper: Kafka requires Zookeeper to manage its cluster. The kafka.sh script will handle the setup of Zookeeper if it's not already installed.

Steps for Kafka Installation

1. Download Kafka

Download the Kafka binary from the official website:

Kafka Downloads

Choose the appropriate version for your system and download the .tgz file.

2. Extract Kafka

After downloading the .tgz file, extract it to a directory of your choice. For example:

tar -xvf kafka_2.13-3.0.0.tgz
cd kafka_2.13-3.0.0

Kafka Configuration:

in /usr/local/kafka/config/serever.properties add:

advertised.listeners=PLAINTEXT://<hostname>:9092
listeners=PLAINTEXT://0.0.0.0:9092

Elasticsearch Installation

sudo su
apt update
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
sudo apt-get update && sudo apt-get install elasticsearch
sudo update-rc.d elasticsearch defaults 95 10
service elasticsearch restart
service elasticsearch status
tail -f /var/log/elasticsearch/elasticsearch.log
curl localhost:9200

Kibana Installation

sudo apt-get update && sudo apt-get install kibana
sudo update-rc.d kibana defaults 95 10
sudo -i service kibana start
service kibana status

[Refer any other fources for proper configuration of kafka service file]

Start The services

service elasticsearch restart
service kibana status
sudo systemctl start fluentd
sudo systemclt start kafka

About the Project

Project Overview

In a microservices architecture, effective log management is crucial for operational excellence. This project aims to streamline the collection, storage, and analysis of logs generated by various services, enhancing the ability to track application behavior and identify errors quickly. By capturing relevant metadata alongside each log entry and enabling real-time ingestion and querying, the system improves operational visibility and facilitates proactive responses to potential issues. Ultimately, this distributed logging framework enhances resilience and maintainability in a dynamic application landscape.

System Architecture and Flow

Key Components:

Microservices (Processes):
- Independent nodes (services) that generate logs and send heartbeat signals to monitor their health and status.
Log Accumulator:
- Collects logs from each node, structures the log data, and forwards it to the Pub-Sub model for centralized log management.
Pub-Sub Model:
- A communication layer that ensures the reliable and asynchronous distribution of log data from the accumulator to the storage system.
Log Storage:
- A centralized system for indexing, storing, and making logs searchable. The logs are stored in a format that is optimized for querying, providing easy access for monitoring and analysis.
Alerting System:
- Monitors logs for specific events, such as ERROR or FATAL log levels. When such events are detected, the system generates real-time alerts to facilitate quick response.
Heartbeat Mechanism:
- Detects failures by monitoring the heartbeat signals from each node. If a node stops sending heartbeats, an alert is triggered, indicating a potential failure.

System Flow:

Log Generation: Each microservice generates logs along with relevant metadata (e.g., service name, timestamp, severity).
Log Accumulation: Logs are collected by the log accumulator.
Log Distribution: The Pub-Sub model forwards logs to the storage and alerting system.
Log Storage and Querying: Logs are indexed and stored in a centralized database, making it easy to query and analyze them.
Real-time Alerting: The alerting system watches for critical errors and generates notifications for the relevant teams.
Failure Detection: The heartbeat mechanism detects service failures and triggers alerts.

Features

Centralized Log Management: Collect and manage logs from all microservices in one place.
Real-Time Log Processing: Log data is ingested and processed in real-time.
Queryable Log Storage: Indexed logs allow fast and efficient querying.
Real-Time Alerting: Alerts on critical log entries (e.g., ERROR, FATAL).
Failure Detection via Heartbeats: Automatically detect when a service fails to send a heartbeat.
Scalable: Easily scale to support an increasing number of microservices and logs.
Asynchronous Communication: Uses a Pub-Sub model for non-blocking log processing.

Technologies Used

Microservices: Various backend services that produce logs.
Kafka: For Pub-Sub communication and real-time log streaming.
Elasticsearch: For indexing and storing logs in a searchable format.

Architecture Diagram

Running Microservices:

eg: payment service:

PUB model:

Alerting UI:

Kibana Visualization:

Future Improvements

Distributed Tracing: Integrate distributed tracing tools like Jaeger for better visibility across microservices.
Data Anomaly Detection: Implement anomaly detection to identify unexpected patterns in logs.
Log Retention and Archiving: Implement log retention policies for managing the size of log data over time.

References & Documentation

To better understand and configure the components of the Distributed Logging System, here are the official documentation links for the key technologies used in this project:

1. Fluentd

Fluentd is a powerful open-source data collector for unified logging layers. It allows you to collect logs, parse them, and route them to various destinations like Elasticsearch.

Fluentd Documentation:

2. Kibana

Kibana is a data visualization and exploration tool used for visualizing log data stored in Elasticsearch. It offers a powerful interface for querying and analyzing logs in real-time.

Kibana Documentation:

3. Elasticsearch

Elasticsearch is a distributed, RESTful search and analytics engine that is designed to store and index large volumes of data. It powers the log storage and searching in this system.

Elasticsearch Documentation:

4. Kafka

Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications. Kafka enables the Pub-Sub communication model in this system, allowing logs to be asynchronously distributed.

Kafka Documentation:

Feel free to explore the documentation of these tools for a deeper understanding of how they work and how to configure them effectively for your logging system. These resources will help you troubleshoot, extend functionality, and leverage advanced features of each tool as needed.

Contact Me

Feel free to reach out with any questions, suggestions, or collaborations!

Email: ashleshat5@gmail.com

Thank You!

Thank you for checking out our project! If you find it helpful, please give it a ⭐️ and share your feedback. 😊

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
PUB-SUB		PUB-SUB
fluentd		fluentd
inventory_service		inventory_service
order_service		order_service
payment_service		payment_service
shipping_service		shipping_service
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
update_conf.sh		update_conf.sh

Folders and files

Latest commit

History

Repository files navigation

Distributed-logging-system

Getting Started

Prerequisites

Installation

Running the Services

Monitoring and Alerting

Additional Notes

Installation of Fluentd

Fluentd Configuration

Additional Setup (For Fluentd with Python)

Fluentd Installation and Configuration

1. Install Fluentd

On Ubuntu/Debian:

On macOS:

Verify Fluentd Installation:

Configuring Fluentd

Kafka Installation Guide

Prerequisites

Steps for Kafka Installation

1. Download Kafka

2. Extract Kafka

Kafka Configuration:

Elasticsearch Installation

Kibana Installation

Start The services

About the Project

Project Overview

System Architecture and Flow

Key Components:

System Flow:

Features

Technologies Used

Architecture Diagram

Running Microservices:

PUB model:

Alerting UI:

Kibana Visualization:

Future Improvements

References & Documentation

1. Fluentd

2. Kibana

3. Elasticsearch

4. Kafka

Contact Me

Thank You!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages