A distributed file management system for seamless file synchronization, processing and storage across multiple devices. Data Orchestrate is designed for environments where files need to be uploaded, processed (including text extraction from PDFs) and synchronized in real-time between several devices, with robust metadata tracking and notification support.
- Java 17 or higher
- Maven 3.8 or higher
- Kafka
- MongoDB Atlas account (cloud-hosted)
- (Optional) Docker & Docker Compose (for containerized deployments)
- (Optional) Kubernetes & Minikube (for local or cloud orchestration)
Note: Docker, Kubernetes and Minikube deployment files are available in the
containerizedbranch of this repository. Switch to that branch to access and use them.
Data Orchestrate supports containerized deployment using Docker and orchestration via Docker Compose and Kubernetes (including Minikube for local development).
- All core services (Kafka, Zookeeper, file-upload, processing, storage, notification, orchestrator) are containerized.
- Use the provided
deployment/docker-compose.ymlto spin up the full stack for local or development use.
To start everything with Docker Compose:
docker-compose -f deployment/docker-compose.yml up --build -d- This will build images (if needed), start all services and create a bridge network.
- Service healthchecks are included for robust startup.
- Data volumes are mapped for persistence.
To stop and remove containers:
docker-compose -f deployment/docker-compose.yml down- Kubernetes manifests for each service are provided in
deployment/kubernetes/. - Supports scaling, rolling updates, resource limits and liveness/readiness probes.
- Persistent volumes and secrets (e.g., MongoDB URI, SMTP credentials) are managed via YAML.
To deploy on Minikube:
- Start Minikube:
minikube start
- Apply persistent volume (PVC) and secrets:
kubectl apply -f deployment/kubernetes/data-pvc.yaml kubectl apply -f deployment/kubernetes/mongodb-secret.yaml
- Deploy all services:
kubectl apply -f deployment/kubernetes/
- Expose services as needed (e.g., NodePort or Ingress for accessing from host).
Notes:
- Images are referenced as
deployment-<service>:latestand must be built and loaded into your Minikube Docker environment:eval $(minikube docker-env) # Then build each image, e.g.: docker build -t deployment-file-upload-service backend/file-upload-service # Repeat for other services
- Update secrets and environment variables as per your environment.
- Prometheus annotations are included for monitoring.
Data Orchestrate is composed of multiple microservices, each responsible for a specific aspect of the system. Services communicate via Kafka and persist metadata in MongoDB Atlas (cloud-hosted). The frontend is built with JavaFX, providing a user-friendly GUI for uploading and monitoring files.
.
├── backend/
│ ├── common-utils/ # Shared utilities and components (e.g., DeviceIdentifier)
│ ├── file-upload-service/ # Handles file uploads and metadata storage
│ ├── storage-service/ # Manages file storage and retrieval
│ ├── processing-service/ # Processes uploaded files (e.g., PDF text extraction)
│ ├── notification-service/ # Handles real-time notifications (WebSocket, etc.)
│ └── orchestrator-service/ # Coordinates file replication and device sync
└── frontend/
└── gui-app/ # JavaFX desktop interface for users
- file-upload-service: Accepts file uploads, stores metadata in MongoDB Atlas and triggers downstream processing.
- processing-service: Processes incoming files (e.g., extracts text from PDFs), updates metadata and forwards processed files for storage.
- storage-service: Manages persistent storage of files in the data directory, handles retrieval requests and ensures files are available for sync.
- orchestrator-service: Coordinates file replication and synchronization between devices, ensuring consistency across the distributed network.
- notification-service: Sends real-time notifications to the frontend and other services (e.g., WebSocket updates for file status).
- common-utils: Shared codebase for utilities such as device identification and common configuration.
Devices participating in synchronization are described in backend/common-utils/src/main/resources/devices.json:
[
{
"name": "Kushagra",
"ip": "10.23.78.68",
"port": "8081",
"file_upload_port": "8081",
"processing_port": "8083",
"notification_port": "8084",
"storage_port": "8085"
},
{
"name": "Anil Cerejo",
"ip": "10.23.48.160",
"port": "8081",
"file_upload_port": "8081",
"processing_port": "8083",
"notification_port": "8084",
"storage_port": "8085"
},
{
"name": "Third",
"ip": "192.168.1.9",
"port": "8081",
"file_upload_port": "8081",
"processing_port": "8083",
"notification_port": "8084",
"storage_port": "8085"
}
]
Each device entry contains the device name, IP, and service ports. Update this file to add or remove devices in your distributed network.
- File Upload: Users upload files via the JavaFX frontend. The file-upload-service receives the file and stores metadata in MongoDB Atlas.
- Processing: The processing-service picks up new files, performs necessary processing (e.g., text extraction) and updates the database.
- Storage: Processed files are stored in the storage-service, making them available for retrieval and synchronization.
- Synchronization: The orchestrator-service ensures files are replicated to all connected devices, maintaining consistency.
- Notifications: Users receive real-time status updates via the notification-service and frontend.
Sample .env:
MONGODB_URI=mongodb+srv://username:password@cluster.example.mongodb.net/?retryWrites=true&w=majority&appName=example-cluster
SPRING_PROFILES_ACTIVE=prod
SPRING_KAFKA_BOOTSTRAP_SERVERS=localhost:9092
KAFKA_BOOTSTRAP_SERVERS=localhost:9092
UPLOAD_DIR=./data/uploads
STORAGE_DIR=./data/storage
PROCESSED_DIR=./data/processed
SYNC_DIR=./data/sync
FILE_UPLOAD_PORT=8081
PROCESSING_PORT=8082
ORCHESTRATOR_PORT=8083
NOTIFICATION_PORT=8084
STORAGE_PORT=8085
spring.data.mongodb.uri=mongodb+srv://username:password@cluster.example.mongodb.net/?retryWrites=true&w=majority&appName=example-cluster
spring.data.mongodb.database=file-orchestrator
MONGODB_DATABASE=file-orchestrator
DEVICE_NAME=Kushagra
- Ensure you have a running Kafka instance and a MongoDB Atlas cluster.
- Switch to the
containerizedbranch to use Docker Compose files. - Start all services:
docker-compose -f deployment/docker-compose.yml up --build -d
- Stop and remove containers:
docker-compose -f deployment/docker-compose.yml down
- Switch to the
containerizedbranch to use Kubernetes manifests. - Start Minikube:
minikube start
- Apply persistent volume and secrets:
kubectl apply -f deployment/kubernetes/data-pvc.yaml kubectl apply -f deployment/kubernetes/mongodb-secret.yaml # (and smtp-secret.yaml if needed) - Build and load images into Minikube:
eval $(minikube docker-env) docker build -t deployment-file-upload-service backend/file-upload-service # Repeat for other services
- Deploy all services:
kubectl apply -f deployment/kubernetes/
- Expose services as needed (NodePort/Ingress).
-
Start Kafka:
# Start Zookeeper bin/zookeeper-server-start.sh config/zookeeper.properties # Start Kafka bin/kafka-server-start.sh config/server.properties
-
(Optional) List Kafka topics:
./kafka-topics.bat --list --bootstrap-server localhost:9092
-
Build the backend:
cd backend mvn clean install -
Start the services (in separate terminals):
cd backend/file-upload-service && mvn spring-boot:run cd backend/storage-service && mvn spring-boot:run cd backend/processing-service && mvn spring-boot:run cd backend/notification-service && mvn spring-boot:run cd backend/orchestrator-service && mvn spring-boot:run
-
Start the JavaFX frontend:
cd frontend/gui-app mvn javafx:run
- Automatic device identification and registration
- File upload with metadata tracking
- File processing (including PDF text extraction)
- Real-time notifications and status updates
- Cross-device file synchronization and replication
- Robust logging and monitoring
Each service has its own application.properties file where you can configure:
- Server ports
- Kafka topics
- MongoDB Atlas connections
- File storage directories
- Upload a file through the JavaFX GUI
- Check the logs to see the file being processed
- The file should be available on all connected devices
- Check service logs for errors
- Verify Kafka is running
- Ensure all services can connect to Kafka and MongoDB Atlas
- Check file permissions for upload and storage directories


