Andar Package

Andar is a Python library that provides an abstraction layer for managing path structures, helping to create and parse paths programmatically via templated file paths.

Install Package

With pip:

pip install andar

Key features

Clean code

Andar promotes clean code by using a composition approach to avoid inheritance hell. Furthermore, it allows to define a path conventions in a single place using a clear and intuitive syntax. The use of templated path strings with field definitions helps to avoid the error-prone split/index syntax

Reusability

Andar allows using a single path convention via a PathModel for both generating and parsing paths. PathModels can be reused to create new path conventions with minimal effort without modifying the parent PathModel.

Separation of Concerns

Andar helps to separate I/O layer from path generation layer resulting in a code easier to maintain.

Predictability

Andar provides field name checking via regular expressions and functions to assert bijection between path generation and path parsing.

Flexibility

Andar allows for a quick start just by defining a path template thanks to its predefined fields and patterns. It also include more advance capabilities for customizing field parsing and generation via regular expression and string converters while maintaining a simple syntax.

Lightweight

Andar is written using standard Python library, so it is very lightweight without any external dependencies.

Concepts

PathModel

PathModel is the main class that allows to easy define path conventions and manage path structures. It is based on two main components: templates and fields. Templates are strings that define the names of the fields in the path structure using a simple syntax (inspired by f-string) , for example: "/{folder}/{prefix}_{name}_{suffix}.{ext}" Fields are the basic components that allow to map an object to a string in order to build or parse a path. Fields are defined via a class named FieldConf (see next section).

A PathModel can be defined only with the template string because there is already a default value for fields. Once a PathModel is defined it can be used to generate a new path or to parse an existing path in order to get its fields. See Quick Start for a simple example. For more details check the Docs.

FieldConf

FieldConf is the class that defines how to parse and build a given field. It can be customized by specifying its regex pattern and how to convert the input object to a string and vice versa. It comes with a handy way for automatically manage dates and datetimes. See Examples section for some applied use cases. For more details check the Docs.

Quick Start

Simple PathModel definition using default field configurations:

from andar import PathModel

simple_path_model = PathModel(
    template="/{base_folder}/{subfolder}/{base_name}__{suffix}.{extension}"
)

Generate a path:

result_path = simple_path_model.get_path(
    base_folder="parent_folder",
    subfolder="other_folder",
    base_name="mydata",
    suffix="2000-01-01",
    extension="csv",
)
print(result_path)

"/parent_folder/other_folder/mydata__2000-01-01.csv"

Parse a path:

file_path = "/data/reports/summary__2025-12-31.csv"
parsed_fields = simple_path_model.parse_path(file_path)
print(parsed_fields)

{
    'base_folder': 'data', 
    'subfolder': 'reports', 
    'base_name': 'summary', 
    'suffix': '2025-12-31', 
    'extension': 'csv',
}

Examples

How to create a path generator / parser for a date tree structure

Define a PathModel following a date tree folder structure with datetime a suffix using the next template and fields:

from andar import FieldConf, PathModel, SafePatterns

date_archived_pm = PathModel(
    template="{base_path}/{subfolder}/{date_path}/{date_prefix}_{name}_{datetime_suffix}.{ext}",
    fields={
        "base_path": FieldConf(pattern=SafePatterns.DIRPATH),
        "subfolder": FieldConf(pattern=SafePatterns.NAME),
        "date_path": FieldConf(pattern=r"\d{4}/\d{2}/\d{2}", date_format="%Y/%m/%d"),
        "date_prefix": FieldConf(pattern=r"\d{4}-\d{2}-\d{2}", date_format="%Y-%m-%d"),
        "name": FieldConf(pattern=SafePatterns.FIELD),
        "datetime_suffix": FieldConf(pattern=r"\d{8}_\d{6}", datetime_format="%Y%m%d_%H%M%S"),
        "ext": FieldConf(pattern=SafePatterns.EXTENSION),
    },
)

Then, for generating the paths just iterate over dates:

import datetime as dt

base_path = "/company/reports"
subfolder = "finance"
report_name = "revenue"
extension = "xls"
start_date = dt.date(2025, 12, 1)
report_date_list = [start_date + dt.timedelta(days=d) for d in range(10)]

for report_date in report_date_list:
    creation_datetime = dt.datetime.now()
    report_path = date_archived_pm.get_path(
        base_path=base_path,
        subfolder=subfolder,
        date_path=report_date,
        date_prefix=report_date,
        name=report_name,
        datetime_suffix=creation_datetime,
        ext=extension,
    )
    print(report_path)

For parsing already existing paths use a library that allows to recursive search (e.g. pathlib, glob, os, etc) and output a fullpath for each file:

import pathlib
base_path = "/company/reports"
search_folder = pathlib.Path(base_path)
path_list = [str(i) for i in search_folder.rglob("*") if i.is_file()]

for file_path in path_list:
    parsed_fields = date_archived_pm.parse_path(file_path)
    print(parsed_fields)

How to define path conventions for a datalake

For example Data Mesh propose conventions for separating data into domains, layers and products. This could be implemented with the following PathModel template and fields:

from andar import FieldConf, PathModel, SafePatterns

data_mesh_pm = PathModel(
    template="/{domain}/{layer}/{product}/{aggregation}/{date}_{product}.{ext}",
    fields={
        "domain": FieldConf(pattern=SafePatterns.NAME),  # sales, marketing, HR, finance, etc
        "layer": FieldConf(pattern=SafePatterns.NAME),  # raw, intermediate, mart, etc
        "product": FieldConf(pattern=SafePatterns.NAME),  # orders, revenues, taxes, campaigns, etc
        "aggregation": FieldConf(pattern=SafePatterns.NAME),  # daily, weekly, monthly, etc
        "date": FieldConf(pattern=r"\d{8}", datetime_format="%Y%m%d"),  # product date
        "ext": FieldConf(pattern=SafePatterns.EXTENSION),  # csv, xls, parquet, etc
    },
)

For improving traceability, it's a good practice to also include run datetime (i.e. generation date) as a simple version system:

from andar import FieldConf, PathModel, SafePatterns

data_mesh_pm = PathModel(
    template="/{domain}/{layer}/{product}/{aggregation}/{product_date}_{product}_{run_datetime}.{ext}",
    fields={
        "domain": FieldConf(pattern=SafePatterns.NAME),  # sales, marketing, HR, finance, etc
        "layer": FieldConf(pattern=SafePatterns.NAME),  # raw, intermediate, mart, etc
        "product": FieldConf(pattern=SafePatterns.NAME),  # orders, revenues, taxes, campaigns, etc
        "aggregation": FieldConf(pattern=SafePatterns.NAME),  # daily, weekly, monthly, etc
        "product_date": FieldConf(pattern=r"\d{8}", datetime_format="%Y%m%d"),  # product target date
        "run_datetime": FieldConf(pattern=r"\d{8}_\d{6}", datetime_format="%Y%m%d_%H%M%S"),  # generation datetime
        "ext": FieldConf(pattern=SafePatterns.EXTENSION),  # csv, xls, parquet, etc
    },
)

How to reorganize files and folders in a datalake

In this example we will reorganize a flatten file structure into a nested one. First define the two PathModels, the old one and the new one:

from andar import FieldConf, PathModel, SafePatterns

old_flat_pm = PathModel(
    template="{base_path}/{category}_{name}_{date}.{ext}",
    fields={
        "base_path": FieldConf(pattern=SafePatterns.DIRPATH),
        "category": FieldConf(pattern=SafePatterns.NAME),
        "name": FieldConf(pattern=SafePatterns.FIELD),
        "date": FieldConf(pattern=r"\d{8}", datetime_format="%Y%m%d"),
        "ext": FieldConf(pattern=SafePatterns.EXTENSION),
    },
)

# we can just update the template if the fields are de same
new_nested_pm = old_flat_pm.update(
    template="{base_path}/{category}/{date}/{name}.{ext}"
)

Example of file creating in a temporary directory using a flatten structure with the old PathModel:

import pathlib
import tempfile
import datetime as dt

base_path = tempfile.mkdtemp()
start_date = dt.datetime(2025, 12, 1)
date_list = [start_date + dt.timedelta(days=d) for d in range(10)]

for date in date_list:
    creation_datetime = dt.datetime.now()
    file_path = old_flat_pm.get_path(
        base_path=base_path,
        category="sales",
        name="orders",
        date=date,
        ext="csv",
    )
    print(file_path)
    pathlib.Path(file_path).touch()  # create an empty file

Example of nesting file paths using the parser of the old PathModel and the get_path of the new PathModel:

# First list existing files in target base path
search_folder = pathlib.Path(base_path)
path_list = [str(i) for i in search_folder.rglob("*") if i.is_file()]

for file_path in path_list:
    parsed_fields = old_flat_pm.parse_path(file_path)
    # As the fields are the same we can reuse them directly
    new_file_path = new_nested_pm.get_path(**parsed_fields)
    # create new parent directories
    pathlib.Path(new_file_path).parent.mkdir(parents=True, exist_ok=True)
    # move old file to new location using the new name
    pathlib.Path(file_path).replace(new_file_path)

The same strategy could be adapted to flatten a nested path structure using PathModels.

Documentation

See the official documentation to learn more.

Package name origin

The package name originates from a verse by the Spanish poet Antonio Machado:

"Caminante, no hay camino, se hace camino al andar."

Antonio Machado

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
andar		andar
docs		docs
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Andar Package

Install Package

Key features

Clean code

Reusability

Separation of Concerns

Predictability

Flexibility

Lightweight

Concepts

PathModel

FieldConf

Quick Start

Examples

How to create a path generator / parser for a date tree structure

How to define path conventions for a datalake

How to reorganize files and folders in a datalake

Documentation

Package name origin

About

Uh oh!

Releases

Packages

Languages

License

fabarca/andar

Folders and files

Latest commit

History

Repository files navigation

Andar Package

Install Package

Key features

Clean code

Reusability

Separation of Concerns

Predictability

Flexibility

Lightweight

Concepts

PathModel

FieldConf

Quick Start

Examples

How to create a path generator / parser for a date tree structure

How to define path conventions for a datalake

How to reorganize files and folders in a datalake

Documentation

Package name origin

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages