A high-performance, privacy-focused microservice for redacting Personally Identifiable Information (PII) from text using BERT-based Named Entity Recognition (NER). Built with Rust and Actix Web for maximum performance and reliability.
Planned enhancements for upcoming versions:
- Fine-tuned model for better entity recognition
- Support for additional PII types (emails, phone numbers, etc.)
- Custom entity recognition patterns
-
🔍 PII Detection: Uses BERT-based NER to identify various types of PII including:
- Person names
- Locations
- Organizations
-
⚡ High Performance: Built with Rust and Actix-Web for exceptional throughput and low latency
-
🎯 Simple API: Easy-to-use HTTP endpoints for integration with any application
-
🔄 Concurrent Processing: Handles multiple requests efficiently with async/await
-
🏗️ Production-Ready: Includes proper error handling and health checks
-
🚀 Efficient Model Loading: Lazy initialization of the BERT model for faster startup
-
Rust (latest stable version)
-
Clone the repository:
git clone https://github.com/k5602/textminer-rs.git cd textminer-rs -
Build the project:
cargo build --release
cargo run --releaseThe service will start on http://0.0.0.0:8080
POST /api/redact - Redact PII from a single text
{
"text": "Your text containing PII like John Smith and New York"
}{
"redacted_text": "Your text containing PII like [PER] and [LOC]",
"processing_time_ms": 42,
"entities_found": 2,
"entity_types": ["PER", "LOC"],
"confidence_scores": [0.98, 0.92]
}POST /api/redact/batch - Redact PII from multiple texts in a single request
{
"texts": [
"First text with PII like John Smith",
"Second text with locations like New York and London"
],
"options": {
"include_confidence": true
}
}{
"results": [
{
"redacted_text": "First text with PII like [PER]",
"processing_time_ms": 0,
"entities_found": 1,
"entity_types": ["PER"],
"confidence_scores": [0.98]
},
{
"redacted_text": "Second text with locations like [LOC] and [LOC]",
"processing_time_ms": 0,
"entities_found": 2,
"entity_types": ["LOC", "LOC"],
"confidence_scores": [0.95, 0.92]
}
],
"total_processing_time_ms": 65,
"total_entities_found": 3
}GET /api/health - Health check endpoint
{
"status": "ok",
"model_loaded": true
}The service is optimized for high throughput and low latency, with the NER model loaded just once at startup and shared across requests using thread-safe reference counting. On average, it processes text in under 100ms, though the first request may take longer as it loads the BERT model into memory.
Access the web interface at http://localhost:8080 to test the redaction service interactively.
Contributions are welcome! Please feel free to submit a Pull Request.
- Actix-Web - Powerful, pragmatic, and extremely fast web framework for Rust
Made with ❤️ in Rust by Khaled