[ Note: This repo project has some ongoing updates by building more robust system using RabbitMq and some ML based log anomoly detection!! Stay tuned ]
This repository contains a set of microservices for inventory management, order processing, payment, and shipping, along with a Pub-Sub system for communication and monitoring. It also includes an alerting system and Kibana for log visualization.
Follow these steps to set up and run the project locally.
Ensure you have the following installed:installation guide is included in README
- Python 3.12 or higher
- pip (Python package manager)
- Kafka
- Fluentd
- ElasticSearch
- Kibana
-
Clone the repository:
git clone https://github.com/Cloud-Computing-Big-Data/RR-Team-48-distributed-logging-system.git cd Cloud-Computing-Big-Data/RR-Team-48-distributed-logging-system -
Install required Python packages for all services and the Pub-Sub system:
pip install -r requirements.txt
[ Debug if any error occurs or try with --break-system-packages ]
From the parent directory, you can start each service and the Pub-Sub system.
-
Inventory Service:
python inventory_service/inventory_service.py
-
Order Service:
python order_service/order_service.py
-
Payment Service:
python payment_service/payment_service.py
-
Shipping Service:
python shipping_service/shipping_service.py
-
Pub-Sub System (handles communication and logging):
python PUB-SUB/app.py
- Open your browser and navigate to http://localhost:5000 for the alerting system.
- For log visualization with Kibana, ensure your Elasticsearch and Kibana services are running, then open http://localhost:5601.
- Configuration files for Fluentd and Elasticsearch are included in the root directory. You may need to update
fluentd.confandupdate_conf.shto match your environment. - Log data is managed using
log_accumulator.pyin each service and aggregated in the Pub-Sub system.
To install Fluentd and set it up for use with Python (using the fluent-logger library), here’s how you can include the installation and configuration steps in your README:
-
Install Fluentd using the package manager for your operating system.
On Ubuntu/Debian:
# Install dependencies sudo apt-get update sudo apt-get install -y sudo gnupg2 curl # Add the Fluentd APT repository curl -fsSL https://packages.fluentd.org/fluentd-apt-source.sh | sudo bash # Install Fluentd sudo apt-get install -y fluentd
On CentOS/RHEL:
# Install Fluentd sudo yum install -y https://packages.fluentd.org/fluentd.rpmOn macOS (via Homebrew):
brew install fluentd
-
Verify Fluentd installation:
After installation, you can verify that Fluentd is installed correctly by checking the version:
fluentd --version
You should see the Fluentd version displayed in the terminal.
-
Fluentd Configuration File:
Fluentd’s configuration is typically stored in
/etc/fluent/fluentd.conf. You'll need to define input sources, output destinations, and possibly filters for your use case.Example
fluentd.conffor a simple logging setup:# fluentd.conf # Input Plugin (e.g., listening on a TCP port for logs) <source> @type tcp port 24224 bind 0.0.0.0 tag fluentd.test </source> # Output Plugin (e.g., sending logs to Elasticsearch) <match fluentd.test> @type elasticsearch host localhost port 9200 logstash_format true index_name fluentd-test </match>
In this configuration:
- Input Plugin: Fluentd listens for logs over TCP on port 24224.
- Output Plugin: Logs are sent to an Elasticsearch instance running locally on port 9200.
-
Restart Fluentd to apply the configuration:
After updating the configuration, restart Fluentd to apply changes.
sudo service fluentd restart
To connect Fluentd with your Python application, you'll need the fluent-logger library. If you have already added it to requirements.txt, you can install it as follows:
Follow the instructions below to install Fluentd:
sudo apt-get update
sudo apt-get install -y sudo gnupg2 curl
curl -fsSL https://packages.fluentd.org/fluentd-apt-source.sh | sudo bash
sudo apt-get install -y fluentdbrew install fluentdfluentd --versionAfter installation run the bash (.sh) file provided for fluentd installation
chmod +x update_conf.sh
./update_conf.shFollow the steps below to install and set up Kafka on your system.
- Java 8 or later must be installed on your system.
- Zookeeper: Kafka requires Zookeeper to manage its cluster. The
kafka.shscript will handle the setup of Zookeeper if it's not already installed.
Download the Kafka binary from the official website:
Choose the appropriate version for your system and download the .tgz file.
After downloading the .tgz file, extract it to a directory of your choice. For example:
tar -xvf kafka_2.13-3.0.0.tgz
cd kafka_2.13-3.0.0in /usr/local/kafka/config/serever.properties add:
advertised.listeners=PLAINTEXT://<hostname>:9092
listeners=PLAINTEXT://0.0.0.0:9092sudo su
apt update
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
sudo apt-get update && sudo apt-get install elasticsearch
sudo update-rc.d elasticsearch defaults 95 10
service elasticsearch restart
service elasticsearch status
tail -f /var/log/elasticsearch/elasticsearch.log
curl localhost:9200sudo apt-get update && sudo apt-get install kibana
sudo update-rc.d kibana defaults 95 10
sudo -i service kibana start
service kibana status[Refer any other fources for proper configuration of kafka service file]
service elasticsearch restart
service kibana status
sudo systemctl start fluentd
sudo systemclt start kafkaIn a microservices architecture, effective log management is crucial for operational excellence. This project aims to streamline the collection, storage, and analysis of logs generated by various services, enhancing the ability to track application behavior and identify errors quickly. By capturing relevant metadata alongside each log entry and enabling real-time ingestion and querying, the system improves operational visibility and facilitates proactive responses to potential issues. Ultimately, this distributed logging framework enhances resilience and maintainability in a dynamic application landscape.
-
Microservices (Processes):
- Independent nodes (services) that generate logs and send heartbeat signals to monitor their health and status.
-
Log Accumulator:
- Collects logs from each node, structures the log data, and forwards it to the Pub-Sub model for centralized log management.
-
Pub-Sub Model:
- A communication layer that ensures the reliable and asynchronous distribution of log data from the accumulator to the storage system.
-
Log Storage:
- A centralized system for indexing, storing, and making logs searchable. The logs are stored in a format that is optimized for querying, providing easy access for monitoring and analysis.
-
Alerting System:
- Monitors logs for specific events, such as
ERRORorFATALlog levels. When such events are detected, the system generates real-time alerts to facilitate quick response.
- Monitors logs for specific events, such as
-
Heartbeat Mechanism:
- Detects failures by monitoring the heartbeat signals from each node. If a node stops sending heartbeats, an alert is triggered, indicating a potential failure.
- Log Generation: Each microservice generates logs along with relevant metadata (e.g., service name, timestamp, severity).
- Log Accumulation: Logs are collected by the log accumulator.
- Log Distribution: The Pub-Sub model forwards logs to the storage and alerting system.
- Log Storage and Querying: Logs are indexed and stored in a centralized database, making it easy to query and analyze them.
- Real-time Alerting: The alerting system watches for critical errors and generates notifications for the relevant teams.
- Failure Detection: The heartbeat mechanism detects service failures and triggers alerts.
- Centralized Log Management: Collect and manage logs from all microservices in one place.
- Real-Time Log Processing: Log data is ingested and processed in real-time.
- Queryable Log Storage: Indexed logs allow fast and efficient querying.
- Real-Time Alerting: Alerts on critical log entries (e.g., ERROR, FATAL).
- Failure Detection via Heartbeats: Automatically detect when a service fails to send a heartbeat.
- Scalable: Easily scale to support an increasing number of microservices and logs.
- Asynchronous Communication: Uses a Pub-Sub model for non-blocking log processing.
- Microservices: Various backend services that produce logs.
- Kafka: For Pub-Sub communication and real-time log streaming.
- Elasticsearch: For indexing and storing logs in a searchable format.
- Distributed Tracing: Integrate distributed tracing tools like Jaeger for better visibility across microservices.
- Data Anomaly Detection: Implement anomaly detection to identify unexpected patterns in logs.
- Log Retention and Archiving: Implement log retention policies for managing the size of log data over time.
To better understand and configure the components of the Distributed Logging System, here are the official documentation links for the key technologies used in this project:
Fluentd is a powerful open-source data collector for unified logging layers. It allows you to collect logs, parse them, and route them to various destinations like Elasticsearch.
- Fluentd Documentation:
Kibana is a data visualization and exploration tool used for visualizing log data stored in Elasticsearch. It offers a powerful interface for querying and analyzing logs in real-time.
- Kibana Documentation:
Elasticsearch is a distributed, RESTful search and analytics engine that is designed to store and index large volumes of data. It powers the log storage and searching in this system.
- Elasticsearch Documentation:
Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications. Kafka enables the Pub-Sub communication model in this system, allowing logs to be asynchronously distributed.
- Kafka Documentation:
Feel free to explore the documentation of these tools for a deeper understanding of how they work and how to configure them effectively for your logging system. These resources will help you troubleshoot, extend functionality, and leverage advanced features of each tool as needed.
Feel free to reach out with any questions, suggestions, or collaborations!
- Email: ashleshat5@gmail.com
Thank you for checking out our project! If you find it helpful, please give it a ⭐️ and share your feedback. 😊