Skip to content

tiagobotari/dataline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataLine

A lightweight Python library for building data transformation pipelines. Define reusable operations, chain them together, and get structured reports on each transformation step.

Features

  • Zero dependencies -- pure Python, works with any data type
  • Chain of Responsibility pattern -- compose operations into sequential pipelines
  • Built-in reporting -- each operation can track metadata (e.g. shape changes, statistics)
  • Verbose mode -- optional logging for debugging pipeline execution

Installation

pip install dataline

Or install from source:

git clone https://github.com/tiagobotari/dataline.git
cd dataline
pip install .

Quick Start

Create custom operations by subclassing Operation and implementing the process method:

import numpy as np
import dataline as dl


class SumColumns(dl.Operation):
    """Sum columns at index 1 and 2 into a new column."""

    def process(self, data):
        self.report["shape_before"] = data.shape
        data = np.c_[data, data[:, 1] + data[:, 2]]
        self.report["shape_after"] = data.shape
        return data


data = np.array([[0, 1, 2], [1, 2, 2]])

pipe = dl.Pipeline()
pipe.add(SumColumns())
result, report = pipe.process(data)

print(result)
# [[0 1 2 3]
#  [1 2 2 4]]

print(report)
# [{'shape_before': (2, 3), 'shape_after': (2, 4), 'operation_name': 'SumColumns', ...}]

Chaining Operations

Add multiple operations to process data in sequence:

class DropFirstColumn(dl.Operation):
    """Remove the first column."""

    def process(self, data):
        return data[:, 1:]


pipe = dl.Pipeline(verbose=True)
pipe.add(SumColumns())
pipe.add(DropFirstColumn())
result, report = pipe.process(data)

Each operation receives the output of the previous one, and the final report contains one entry per operation.

Running Tests

pytest

License

MIT

About

Simple class to aim the data cleaning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages