Skip to content

ridcl/bliq

Repository files navigation

Bliq

Bliq is a lightweight dataset catalog that provides versioning, efficient storage, and easy querying.

Bliq supports local filesystem, S3 and Azure Blob Storage for data storage, as well as SQLite and PostgreSQL for metadata.

Bliq consists of three principal components:

  • client - lightweight CLI and a Python library
  • API - a backend server in Python
  • UI - a frontend in Node.js

Run in Docker

The simplest way to run API and UI is using docker compose.

git clone https://github.com/ridcl/bliq.git
cd bliq

docker compose up

By default, this command creates local datastore in ./data/datastore and SQLite metastore in ./data/metastore/bliq.db. To overwrite it, set corresponding environment variables. For example:

export METASTORE_URL="postgresql://user:pass@localhost:5432/bliq"
export DATASTORE_URL="azure://my-bucket/datasets"

docker compose up

(See/modify docker-compose.yml for additional settings or credentials).

Installation

Client Only (Default)

For users who want to connect to existing Bliq servers:

pip install bliq

This installs only the client library with minimal dependencies (requests, pandas, pyarrow).

With Server

To run your own Bliq server:

pip install bliq[server]

This includes FastAPI, uvicorn, and database management tools.

With Storage Backends

# PostgreSQL support
pip install bliq[server,postgresql]

# S3 support
pip install bliq[server,s3]

# Azure Blob Storage support
pip install bliq[server,azure]

# Everything
pip install bliq[all]

Quick Start

Using the Client

from bliq import BliqClient
import pandas as pd

# Connect to server
client = BliqClient("http://localhost:8000")

# Create a dataset
df = pd.DataFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35]
})

result = client.create("team/users", "User data", df)
print(f"Created: {result}")  # team/users/v1

# Load the dataset
df = client.load("team/users/v1")
print(df)

# Add more data (creates new version)
new_data = pd.DataFrame({
    "id": [4],
    "name": ["David"],
    "age": [40]
})

result = client.extend("team/users/v1", new_data)
print(f"Extended: {result}")  # team/users/v2

# Query with filtering
df = client.load("team/users/v2",
                 columns=["name", "age"],
                 filter="age > 30",
                 limit=10)

# List all datasets
datasets = client.list(namespace="team")
for ds in datasets:
    print(f"{ds["name"]}: {ds["row_count"]} rows")

# Get detailed info
info = client.describe("team/users/v2")
print(info)

# Delete when done
client.erase("team/users/v1")  # Delete specific version
client.erase("team/users")     # Delete all versions

Running the API

# Start server (runs migrations automatically)
bliq serve

# Custom port
bliq serve --port 9000

# Development mode with auto-reload
bliq serve --reload

The server will be available at http://localhost:8000 with:

Configure via environment variables:

# Metadata database (default: SQLite)
export METASTORE_URL="postgresql://user:pass@localhost:5432/bliq"

# Dataset storage (default: local filesystem)
export DATASTORE_URL="s3://my-bucket/datasets"

# For S3
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."

# For Azure
export AZURE_STORAGE_CONNECTION_STRING="..."

# Start server
bliq serve

Running the UI

cd frontend
npm run dev

The UI will be available at http://localhost:5173

About

Dataset catalog

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published