Financial Reporting ETL Pipeline

A general-purpose ETL pipeline that ingests any financial Excel workbook, transforms the data, and loads it into a SQLite database.

Project Roadmap

ETL Pipeline ← current phase
Economic Transformation (inflation, PPP adjustments)
Interactive Dashboard
AI-powered Analysis

Quick Start

pip install -r requirements.txt

# 1. Drop your Excel file(s) into data/raw/
# 2. Edit config/settings.py → SOURCES to describe the sheets
# 3. Run:
python run_etl.py            # process all configured sources
python run_etl.py data_reel  # process a single table

Project Structure

PFA/
├── config/
│   └── settings.py          # Paths, logging, SOURCES configuration
├── data/
│   ├── raw/                  # Drop Excel files here (.gitignored)
│   └── processed/            # Optional Parquet output
├── db/
│   └── pfa.db                # SQLite database (.gitignored)
├── etl/
│   ├── extract.py            # Read Excel sheets → DataFrames
│   ├── transform.py          # Clean, normalise, reshape data
│   └── load.py               # Write to SQLite / Parquet
├── logs/                     # ETL run logs (.gitignored)
├── run_etl.py                # Main entry point
├── verify_db.py              # Quick DB inspection helper
└── requirements.txt

How to Add a New Data Source

Place the Excel file in data/raw/.
Open config/settings.py and add an entry to the SOURCES dict:

SOURCES = {
    "my_report.xlsx": {
        "revenue": {
            "sheet_name": "Sheet1",
            "header_row": 0,       # 0-based row with column headers
            "usecols": "A:F",      # Excel column range (None = all)
            "transform_type": "transactional",
        },
    },
}

Run python run_etl.py.

Available Transform Types

Type	Use case
`transactional`	Row-per-transaction data (partner, month, amount)
`budget`	Budget/forecast tables with a label column
`balance_sheet`	ID columns + monthly date columns → melted long format
`mapping`	Account code → hierarchy levels
`aging`	Entity name + aging buckets (clients/suppliers)
`time_series`	ID columns + monthly date columns → melted long format
`generic`	Clean names, drop empties, deduplicate only

Each type accepts optional transform_opts passed as keyword arguments. For example:

"transform_opts": {"label": "topline_net"}   # for budget type
"transform_opts": {"entity_type": "client"}  # for aging type

Tech Stack

Python 3.12 – pandas, openpyxl
SQLite – lightweight embedded database
Parquet – optional columnar snapshots

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial Reporting ETL Pipeline

Project Roadmap

Quick Start

Project Structure

How to Add a New Data Source

Available Transform Types

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
etl		etl
.gitignore		.gitignore
README.md		README.md
create_sample_data.py		create_sample_data.py
requirements.txt		requirements.txt
run_etl.py		run_etl.py
verify_db.py		verify_db.py

Folders and files

Latest commit

History

Repository files navigation

Financial Reporting ETL Pipeline

Project Roadmap

Quick Start

Project Structure

How to Add a New Data Source

Available Transform Types

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages