"Any document, any platform, in milliseconds."
Prism is a next-generation document processing SDK built in Rust, designed to view, convert, and extract content from 600+ file formats. It's the modern, developer-friendly alternative to Oracle Outside In.
-
Comprehensive Format Support: Support for 600+ document formats (200+ in Phase 1)
- Office: DOCX, XLSX, PPTX, DOC, XLS, PPT, RTF, ONE, VSDX, MPP, XPS, EPUB
- PDF: PDF 1.x-2.0, PDF/A
- Email: MSG, EML, PST, MHT
- Images: JPEG, PNG, TIFF, GIF, BMP, WebP, HEIC
- Archives: ZIP, RAR, 7z, TAR, GZIP
- CAD: DWG, DXF
- And many more...
-
Modern Architecture: Built with Rust for memory safety, performance, and reliability
-
Cloud-Native: Designed for containerization, horizontal scaling, and serverless deployment
-
Secure by Default: WebAssembly sandboxing for parser isolation
-
Developer-Friendly: Clean APIs with SDKs for 10+ languages
-
High Performance: Parallel processing, streaming support, and optimized rendering
| Component | Description | Status |
|---|---|---|
| prism-core | Core engine, Unified Document Model (UDM), parser/renderer traits | β Foundation complete |
| prism-parsers | Format parser implementations | π§ In development |
| prism-render | Rendering engine (HTML, PDF, Image output) | π§ Basic HTML renderer |
| prism-sandbox | WebAssembly sandboxing for secure parser execution | π§ Framework ready |
| prism-server | REST API server (Axum-based) | π§ Basic endpoints |
| prism-cli | Command-line interface | π§ Structure ready |
- Rust 1.75 or later
- Cargo (comes with Rust)
# Clone the repository
git clone https://github.com/abahjat/prism.git
cd prism
# Build all crates
cargo build --release
# Run tests
cargo test
# Build optimized binaries
cargo build --releaseAfter building, you'll find the binaries in target/release/:
prism- CLI toolprism-server- REST API server
# Detect document format
prism detect document.pdf
# Convert a document
prism convert input.docx --output output.pdf
# Extract text
prism extract-text document.pdf --output text.txt
# Extract metadata
prism metadata document.pdf# Start the server (default: 127.0.0.1:8080)
cargo run --bin prism-server
# Custom host and port
cargo run --bin prism-server -- --host 0.0.0.0 --port 3000
# Or use environment variables
PRISM_HOST=0.0.0.0 PRISM_PORT=3000 cargo run --bin prism-server
# Health check
curl http://localhost:8080/api/health
# Version information
curl http://localhost:8080/api/versionAdd Prism to your Cargo.toml:
[dependencies]
prism-core = "0.1.0"
prism-parsers = "0.1.0"
prism-render = "0.1.0"Example usage:
use prism_core::format::detect_format;
use prism_core::Document;
#[tokio::main]
async fn main() -> prism_core::Result<()> {
// Initialize Prism
prism_core::init();
// Read a document
let data = std::fs::read("document.pdf")?;
// Detect the format
let format_result = detect_format(&data, Some("document.pdf"))
.ok_or_else(|| prism_core::Error::DetectionFailed("Unknown format".to_string()))?;
println!("Detected format: {}", format_result.format.name);
println!("MIME type: {}", format_result.format.mime_type);
println!("Confidence: {:.2}%", format_result.confidence * 100.0);
Ok(())
}use prism_core::format::detect_format;
// Detect from bytes
let data = std::fs::read("document.pdf")?;
let result = detect_format(&data, Some("document.pdf"));
if let Some(detection) = result {
println!("Format: {}", detection.format.name);
println!("MIME: {}", detection.format.mime_type);
println!("Confidence: {:.2}%", detection.confidence * 100.0);
}use prism_core::Document;
use prism_render::html::HtmlRenderer;
use prism_core::render::{Renderer, RenderContext};
async fn render_to_html(document: &Document) -> prism_core::Result<String> {
let renderer = HtmlRenderer::new();
let context = RenderContext {
options: Default::default(),
filename: Some("output.html".to_string()),
};
let html_bytes = renderer.render(document, context).await?;
Ok(String::from_utf8(html_bytes.to_vec())?)
}Prism can be integrated into .NET applications (Windows Forms, WPF, MAUI) via the prism-bindings crate, which exposes a standard C API.
-
Build the DLL:
cargo build -p prism-bindings --release
This produces
target/release/prism_bindings.dll. -
Add to C# Project:
- Copy the DLL to your project output directory.
- Use
[DllImport]to call the functions.
See
examples/dotnet/for a complete working example.
All document formats are parsed into a common intermediate representation:
Document
βββ Metadata (title, author, dates, custom properties)
βββ Pages[]
β βββ Dimensions
β βββ Content Blocks[]
β β βββ Text (runs, styles, positions)
β β βββ Images (embedded, linked)
β β βββ Tables (rows, cols, cells)
β β βββ Vectors (paths, shapes)
β βββ Annotations
βββ Styles (fonts, colors, paragraph styles)
βββ Resources (fonts, images, embeddings)
βββ Structure (headings, TOC, bookmarks)
Each format parser implements the Parser trait:
#[async_trait]
pub trait Parser: Send + Sync {
fn format(&self) -> Format;
fn can_parse(&self, data: &[u8]) -> bool;
async fn parse(&self, data: Bytes, context: ParseContext) -> Result<Document>;
}Renderers implement the Renderer trait to produce output in various formats:
#[async_trait]
pub trait Renderer: Send + Sync {
fn output_format(&self) -> Format;
async fn render(&self, document: &Document, context: RenderContext) -> Result<Bytes>;
}prism/
βββ Cargo.toml # Workspace root
βββ crates/
β βββ prism-core/ # Core engine, UDM, traits
β βββ prism-parsers/ # Format parser implementations
β βββ prism-render/ # Rendering engine
β βββ prism-sandbox/ # WASM sandboxing
β βββ prism-server/ # REST API server
β βββ prism-cli/ # Command-line interface
βββ tests/ # Integration tests
βββ docs/ # Documentation
# Run all tests
cargo test
# Run tests for a specific crate
cargo test --package prism-core
# Run tests with output
cargo test -- --nocapture
# Run only unit tests
cargo test --lib
# Run only documentation tests
cargo test --doc# Check code without building
cargo check
# Run Clippy linter
cargo clippy --all-targets --all-features
# Format code
cargo fmt
# Check formatting
cargo fmt -- --checkInstall cargo-watch:
cargo install cargo-watch
# Watch and run checks
cargo watch -x check
# Watch and run tests
cargo watch -x test| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/version |
GET | Version information |
/detect |
POST | Detect document format |
/convert |
POST | Convert document to another format |
/extract/text |
POST | Extract text from document |
/extract/metadata |
POST | Extract metadata from document |
/render |
POST | Render document to output format |
# Health check
curl http://localhost:8080/health
# Get version information
curl http://localhost:8080/version
# Detect format (planned)
curl -X POST http://localhost:8080/detect \
-F "file=@document.pdf"
# Convert document (planned)
curl -X POST http://localhost:8080/convert \
-F "file=@document.docx" \
-F "output_format=pdf" \
-o output.pdfFROM rust:1.75 as builder
WORKDIR /app
COPY . .
RUN cargo build --release --bin prism-server
FROM debian:bookworm-slim
COPY --from=builder /app/target/release/prism-server /usr/local/bin/
EXPOSE 8080
CMD ["prism-server", "--host", "0.0.0.0"]Build and run:
# Build image
docker build -t prism-server .
# Run container
docker run -p 8080:8080 prism-server
# Custom port
docker run -p 3000:3000 -e PRISM_PORT=3000 prism-serverversion: '3.8'
services:
prism:
image: prism/server:latest
ports:
- "8080:8080"
environment:
- PRISM_HOST=0.0.0.0
- PRISM_PORT=8080
volumes:
- ./data:/data
- ./cache:/cacheapiVersion: apps/v1
kind: Deployment
metadata:
name: prism-server
spec:
replicas: 3
selector:
matchLabels:
app: prism
template:
metadata:
labels:
app: prism
spec:
containers:
- name: prism
image: prism/server:latest
ports:
- containerPort: 8080
env:
- name: PRISM_HOST
value: "0.0.0.0"
- name: PRISM_PORT
value: "8080"
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"Current performance targets:
| Operation | Target (p95) | Status |
|---|---|---|
| Format Detection | <10ms | β Achieved |
| Simple Conversion (10 pages) | <500ms | π§ In progress |
| Text Extraction | <100ms | π§ In progress |
| Thumbnail Generation | <200ms | π§ In progress |
- Parser Sandboxing: All parsers run in WebAssembly sandboxes with strict memory/CPU limits
- No Code Execution: Documents cannot execute code; macros are parsed but not run
- Memory Limits: Configurable memory limits per parser instance
- Timeout Protection: Execution time limits prevent infinite loops
- No I/O Access: Sandboxed parsers cannot access filesystem or network
- β Core architecture and UDM
- β Basic format detection
- β HTML renderer
- π§ 200 format support
- π§ REST API
- π§ CLI tool
- 400 format support
- AI-powered features (classification, summarization)
- SOC 2 Type II compliance
- Enterprise features
- 600+ format support
- FedRAMP certification
- Format parity with Oracle Outside In
We welcome contributions! Please see CONTRIBUTING.md for details.
- Follow Rust best practices and idioms
- Write tests for new functionality
- Document public APIs with rustdoc comments
- Run
cargo clippybefore submitting - Ensure
cargo testpasses - Update documentation as needed
Prism is dual-licensed under:
- AGPL-3.0: For open source projects and internal use where source availability is acceptable (LICENSE).
- Commercial: For proprietary/closed-source applications (LICENSE_COMMERCIAL).
See DUAL_LICENSING.md for a detailed guide on which license you need.
Prism is inspired by and aims to be a modern alternative to:
- Oracle Outside In
- Apache POI
- LibreOffice
- Various document processing libraries
- Documentation: docs.prism.dev (planned)
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Discord: Join our community
Current Status: Early Development (v0.1.0)
- β Core architecture complete
- β Format detection working
- β Basic HTML renderer
- π§ Parser implementations in progress
- π§ Additional renderers in development
- π§ REST API under construction
Built with β€οΈ in Rust


