DataLine

A lightweight Python library for building data transformation pipelines. Define reusable operations, chain them together, and get structured reports on each transformation step.

Features

Zero dependencies -- pure Python, works with any data type
Chain of Responsibility pattern -- compose operations into sequential pipelines
Built-in reporting -- each operation can track metadata (e.g. shape changes, statistics)
Verbose mode -- optional logging for debugging pipeline execution

Installation

pip install dataline

Or install from source:

git clone https://github.com/tiagobotari/dataline.git
cd dataline
pip install .

Quick Start

Create custom operations by subclassing Operation and implementing the process method:

import numpy as np
import dataline as dl


class SumColumns(dl.Operation):
    """Sum columns at index 1 and 2 into a new column."""

    def process(self, data):
        self.report["shape_before"] = data.shape
        data = np.c_[data, data[:, 1] + data[:, 2]]
        self.report["shape_after"] = data.shape
        return data


data = np.array([[0, 1, 2], [1, 2, 2]])

pipe = dl.Pipeline()
pipe.add(SumColumns())
result, report = pipe.process(data)

print(result)
# [[0 1 2 3]
#  [1 2 2 4]]

print(report)
# [{'shape_before': (2, 3), 'shape_after': (2, 4), 'operation_name': 'SumColumns', ...}]

Chaining Operations

Add multiple operations to process data in sequence:

class DropFirstColumn(dl.Operation):
    """Remove the first column."""

    def process(self, data):
        return data[:, 1:]


pipe = dl.Pipeline(verbose=True)
pipe.add(SumColumns())
pipe.add(DropFirstColumn())
result, report = pipe.process(data)

Each operation receives the output of the previous one, and the final report contains one entry per operation.

Running Tests

pytest

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
dataline		dataline
example		example
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataLine

Features

Installation

Quick Start

Chaining Operations

Running Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataLine

Features

Installation

Quick Start

Chaining Operations

Running Tests

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages