Skip to content

a go daemon that syncs MongoDB to Elasticsearch and milvus in realtime. you know, for search. Based on original Monstache by Ryan Wynn

License

Notifications You must be signed in to change notification settings

doing-cr7/monstache-milvus

Β 
Β 

Repository files navigation

Monstache-Milvus

Go Version License GitHub Stars GitHub Issues

A Go daemon that syncs MongoDB to Elasticsearch and Milvus in realtime. Perfect for building hybrid search systems combining traditional text search with vector similarity search.

Based on Monstache by Ryan Wynn, with added support for Milvus vector database.


✨ Features

🎯 Core Capabilities

  • Dual-Engine Sync: Simultaneously sync MongoDB to both Elasticsearch and Milvus
  • Real-time Streaming: Uses MongoDB Change Streams for instant data updates
  • Vector Search Ready: Native support for Milvus/Zilliz Cloud vector database
  • High Availability: Cluster mode with automatic failover
  • Direct Reads: Bulk load existing data with parallel processing

πŸš€ Advanced Features

  • Flexible Mapping: Custom field mappings and transformations via JavaScript or Go plugins
  • Data Relations: Support for document relationships across collections
  • GridFS Support: Index file content from MongoDB GridFS
  • Time Machine: Historical data indexing with timestamps
  • Alerting: Built-in alert system (Feishu/Lark integration, customizable)
  • Monitoring: HTTP endpoints for health checks and statistics

πŸ”§ Production Ready

  • Resume from last position (timestamp or token-based)
  • Configurable batch sizes and concurrency
  • Comprehensive logging and metrics
  • Docker and Kubernetes support
  • Automatic reconnection and error handling

πŸ“‹ Table of Contents


🎯 Why Monstache-Milvus?

Use Cases

Perfect for building:

  • πŸ” Hybrid Search Systems: Combine keyword search (Elasticsearch) with semantic search (Milvus)
  • πŸ€– AI/ML Applications: Sync embeddings from MongoDB to Milvus for similarity search
  • πŸ“Š Real-time Analytics: Keep your search and vector databases in sync with MongoDB
  • πŸ”„ Data Migration: Migrate large MongoDB datasets to Elasticsearch and Milvus efficiently

Comparison with Original Monstache

Feature Original Monstache Monstache-Milvus
Elasticsearch Sync βœ… βœ…
Milvus/Zilliz Sync ❌ βœ…
Dual Engine Writes ❌ βœ…

πŸ—οΈ Architecture

graph TB
    MongoDB[(MongoDB)] -->|Change Streams| Monstache[Monstache-Milvus]
    MongoDB -->|Direct Read| Monstache
    
    Monstache -->|Text Data| Elasticsearch[(Elasticsearch)]
    Monstache -->|Vector Data| Milvus[(Milvus/Zilliz)]
    
    Monstache -->|GridFS| Files[File Processing]
    Monstache -->|Scripts| Transform[JS/Go Plugins]
    
    style Monstache fill:#4CAF50
    style MongoDB fill:#47A248
    style Elasticsearch fill:#005571
    style Milvus fill:#00ADD8
Loading

Detailed architecture diagram: architecture.mermaid

Data Flow

  1. Change Detection: Monitors MongoDB using Change Streams or Oplog
  2. Transformation: Apply custom mappings, filters, and transformations
  3. Dual Write:
    • Milvus receives vector data for similarity search
    • Elasticsearch receives full documents for text search
  4. Progress Tracking: Save resume tokens for fault tolerance

πŸš€ Quick Start

Prerequisites

  • Go 1.21+ (for building from source)
  • MongoDB 3.6+ (4.0+ recommended for Change Streams)
  • Elasticsearch 7.0+ (optional)
  • Milvus 2.0+ or Zilliz Cloud account (optional)

One-Minute Setup

# 1. Clone the repository
git clone https://github.com/doing-cr7/monstache-milvus.git
cd monstache-milvus

# 2. Copy and configure
cp config.example.toml config.toml
vim config.toml  # Edit with your MongoDB, ES, and Milvus credentials

# 3. Build and run
make build
./bin/monstache -f config.toml

Basic Configuration

# MongoDB connection
mongo-url = "mongodb://user:pass@localhost:27017"

# Elasticsearch (optional)
elasticsearch-urls = ["http://localhost:9200"]

# Milvus/Zilliz (optional)
zilliz-enabled = true
zilliz-addr = "https://your-cluster.zillizcloud.com:19530"
zilliz-api-key = "your-api-key"
zilliz-collection-name = "your_collection"

# What to sync
change-stream-namespaces = ["mydb.mycollection"]

Verify It's Working

# Check health
curl http://localhost:8080/healthz

# Check statistics
curl http://localhost:8080/stats

πŸ“¦ Installation

Option 1: Build from Source

# Clone repository
git clone https://github.com/doing-cr7/monstache-milvus.git
cd monstache-milvus

# Build binary
go build -o bin/monstache monstache.go

# Or use Makefile
make build

Option 2: Docker

# Using Docker
docker pull doing-cr7/monstache-milvus:latest

docker run -d \
  -v /path/to/config.toml:/config.toml \
  doing-cr7/monstache-milvus:latest \
  -f /config.toml

Option 3: Docker Compose

version: '3.8'
services:
  monstache:
    image: doing-cr7/monstache-milvus:latest
    volumes:
      - ./config.toml:/config.toml
    command: -f /config.toml
    environment:
      - MONSTACHE_MONGO_URL=${MONGO_URL}
      - MONSTACHE_ZILLIZ_API_KEY=${ZILLIZ_API_KEY}
    restart: unless-stopped

Option 4: Kubernetes

See docker/release/README.md for Kubernetes deployment examples.


βš™οΈ Configuration

Minimal Configuration

mongo-url = "mongodb://localhost:27017"
elasticsearch-urls = ["http://localhost:9200"]
change-stream-namespaces = [""]  # Watch all databases

Production Configuration

# MongoDB
mongo-url = "mongodb://user:pass@mongo1:27017,mongo2:27017/admin?replicaSet=rs0"

# Elasticsearch
elasticsearch-urls = ["http://es1:9200", "http://es2:9200"]
elasticsearch-max-conns = 10

# Milvus
zilliz-enabled = true
zilliz-addr = "your-milvus-endpoint:19530"
zilliz-api-key = "your-api-key"
zilliz-collection-name = "embeddings"
zilliz-max-conns = 4
zilliz-max-docs = 256

# High Availability
cluster-name = "prod-sync-cluster"
resume = true
resume-strategy = 1  # Token-based

# Performance
direct-read-concur = 4
elasticsearch-max-docs = 1000

# Monitoring
enable-http-server = true
http-server-addr = ":8080"

Configuration File Examples


πŸ’‘ Usage Examples

Example 1: Sync MongoDB to Elasticsearch

mongo-url = "mongodb://localhost:27017"
elasticsearch-urls = ["http://localhost:9200"]
change-stream-namespaces = ["mydb.products"]

[[mapping]]
namespace = "mydb.products"
index = "products_index"

Example 2: Sync Vectors to Milvus

mongo-url = "mongodb://localhost:27017"

# Enable Milvus sync
zilliz-enabled = true
zilliz-addr = "localhost:19530"
zilliz-api-key = "your-key"
zilliz-collection-name = "document_embeddings"

# Sync specific collection with embeddings
change-stream-namespaces = ["mydb.documents"]

Example 3: Dual Engine Sync (Hybrid Search)

# Sync to both Elasticsearch and Milvus
mongo-url = "mongodb://localhost:27017"

# Text search in Elasticsearch
elasticsearch-urls = ["http://localhost:9200"]

# Vector search in Milvus
zilliz-enabled = true
zilliz-addr = "localhost:19530"
zilliz-api-key = "your-key"
zilliz-collection-name = "vectors"

# Watch same collection
change-stream-namespaces = ["mydb.articles"]

Example 4: Custom Transformations

Create a JavaScript transformation:

// transform.js
module.exports = function(doc) {
  // Add computed field
  doc.fullName = doc.firstName + " " + doc.lastName;
  
  // Filter out sensitive data
  delete doc.password;
  
  return doc;
}

Configure it:

[[script]]
namespace = "mydb.users"
path = "./transform.js"

Example 5: Direct Read (Initial Sync)

# Bulk load existing data
direct-read-namespaces = ["mydb.products"]
direct-read-concur = 4  # Parallel workers
direct-read-split-max = 4  # Split large collections

# Exit after initial sync (optional)
exit-after-direct-reads = true

πŸ” MongoDB Setup

Required Permissions

Monstache requires specific MongoDB permissions to function properly.

Option 1: Minimal Permissions (Recommended)

// Connect to MongoDB
use admin

// Create dedicated user
db.createUser({
  user: "",
  pwd: "",
  roles: [
    { role: "readWrite", db: "admin" },
    { role: "readWrite", db: "<logic db>" },
    { role: "readWrite", db: "monstache" },
    { role: "clusterMonitor", db: "admin" }
  ]
})

Connection String

# Replica Set (recommended)
mongodb://monstache:password@mongo1:27017,mongo2:27017/?replicaSet=rs0

# Standalone (for development only)
mongodb://monstache:password@localhost:27017

# With authentication database
mongodb://monstache:password@localhost:27017/admin?authSource=admin

πŸ‘¨β€πŸ’» Development

Building from Source

# Install dependencies
go mod download

# Build
make build

# Build for specific platform
GOOS=linux GOARCH=amd64 make build

Project Structure

monstache-milvus/
β”œβ”€β”€ monstache.go           # Main application
β”œβ”€β”€ monstache_test.go      # Tests
β”œβ”€β”€ dao/
β”‚   └── milvus/           # Milvus integration
β”œβ”€β”€ pkg/
β”‚   └── oplog/            # Oplog processing
β”œβ”€β”€ monstachemap/         # Plugin system
β”œβ”€β”€ docker/               # Docker configurations
└── config.example.toml   # Example configuration

Steps to contribute:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

❓ FAQ

General Questions

Q: What's the difference from original Monstache?
A: We've added native Milvus/Zilliz support for vector search, enabling hybrid search systems that combine text and semantic search.

Q: Can I sync to only Milvus (without Elasticsearch)?
A: Yes! Set zilliz-enabled = true and omit elasticsearch-urls. You can use either or both.

Q: Does it support MongoDB standalone?
A: For development, yes. For production, MongoDB replica set is required for Change Streams.

Q: What happens if Monstache crashes?
A: It resumes from the last saved position (timestamp or token) when resume = true is configured.

Performance

Q: How fast is the sync?
A: Depends on your setup. Typically processes 1000-5000 docs/sec. Use elasticsearch-max-conns and zilliz-max-conns to tune.

Q: How to handle large existing datasets?
A: Use direct-read-namespaces with direct-read-concur for parallel bulk loading.

Troubleshooting

Q: "Unable to connect to MongoDB"
A: Check connection string, replica set name, and network connectivity. Verify with mongo CLI first.

Q: "Change streams are not supported"
A: Requires MongoDB 3.6+ in replica set mode. For standalone, use enable-oplog = true (legacy).

Q: "Zilliz collection not found"
A: Create the collection in Milvus/Zilliz first. Monstache doesn't auto-create collections.

Q: Performance is slow
A: Tune batch sizes (elasticsearch-max-docs, zilliz-max-docs), increase workers (elasticsearch-max-conns), or check network latency.

For more issues: GitHub Issues

Monitoring & Alerting

# Enable HTTP server
enable-http-server = true
http-server-addr = ":8080"

# Feishu/Lark alerts (customize for your system)
is-feishu = true
alert-api-url = "https://your-webhook-url"
alert-robot-key = "your-key"

Endpoints:

  • GET /healthz - Health check
  • GET /stats - Sync statistics
  • GET /instance - Instance information

πŸ“Š Performance Tuning

Elasticsearch Optimization

elasticsearch-max-conns = 10      # Concurrent workers
elasticsearch-max-docs = 1000     # Batch size
elasticsearch-max-bytes = 8388608 # 8MB batch size
elasticsearch-max-seconds = 1     # Flush interval

Milvus Optimization

zilliz-max-conns = 4         # Concurrent workers
zilliz-max-docs = 256        # Batch size
zilliz-max-bytes = 2097152   # 2MB batch size
zilliz-max-seconds = 500     # 0.5s flush interval (in ms)

Direct Read Performance

direct-read-concur = 4      # Parallel workers
direct-read-split-max = 4   # Split large collections
direct-read-no-timeout = true  # No cursor timeout

πŸ™ Acknowledgments

This project is built upon the excellent work of:

Special thanks to all contributors who help improve this project!


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Third-Party Licenses

This project uses:


πŸ“¬ Contact & Support


⭐ Star History

Star History Chart

If you find this project helpful, please consider giving it a ⭐!

Made with ❀️ by the Monstache-Milvus community

Documentation β€’ Issues β€’ Contributing

About

a go daemon that syncs MongoDB to Elasticsearch and milvus in realtime. you know, for search. Based on original Monstache by Ryan Wynn

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Go 92.8%
  • Shell 4.9%
  • Other 2.3%