Cross-Platform ETL Pipeline: SQL Server to PostgreSQL Migration

This repository contains a modular ETL orchestration pipeline from SQL Server to PostgreSQL databases. It automates the full lifecycle of database teardown, schema creation, query mapping generation, data extraction, and data loading into production and test environments.

🪄 Overview

The ETL workflow is coordinated through a single entry point (main.py) and integrates reusable modules for:

Dropping and rebuilding PostgreSQL databases and roles
Generating source and destination query mappings from a CSV file
Extracting source data using parameterized SQL SELECT queries
Loading extracted data into destination tables using INSERT queries
Optionally seeding a test database with the same data

Usage:

python3 -m main.py --database aw_sales --seed-test-database False

The ETL pipeline processes data across 35 distinct tables, resulting in a total of 485,625 rows extracted, transformed, and loaded into the destination system. The volume reflects a moderately complex schema with substantial row-level operations, suitable for analytics, reporting, or downstream integration.

Stat	Count
Total Number of Tables	35
Total Number of Rows	485,625

---
  config:
    theme: base
    look: neo
    themeVariables:
      primaryColor: "#000"
      primaryTextColor: "#fff"
      fontColor: "#000"
---
sequenceDiagram
    autonumber
    participant CLI as Command Line
    participant Main as main.py
    participant PG as PostgreSQL
    participant Map as MappingHandler
    participant Extract as SourceExtractor
    participant Load as DestinationLoader
    participant Log as Logger

    CLI->>Main: Start ETL with args (database, seed_test_database)
    Main->>Log: Log "STARTING ETL"
    
    Main->>PG: drop_database(database)
    PG-->>Main: confirm drop
    Main->>PG: drop_database(test_database)
    PG-->>Main: confirm drop
    Main->>PG: drop_role(database)
    PG-->>Main: confirm drop

    Main->>Map: build_mapping_data("source", mapping.csv, mapping_source.json)
    Map-->>Main: confirm source mapping
    Main->>Map: build_mapping_data("destination", mapping.csv, mapping_destination.json)
    Map-->>Main: confirm destination mapping

    Main->>PG: create_database(database)
    PG-->>Main: confirm create
    Main->>PG: create_role(database)
    PG-->>Main: confirm create
    Main->>PG: grant_database_permissions(database)
    PG-->>Main: confirm grant
    Main->>PG: create_database_schemas(database)
    PG-->>Main: confirm create
    Main->>PG: grant_table_permissions(database)
    PG-->>Main: confirm grant

    Main->>PG: create_database(test_database)
    PG-->>Main: confirm create
    Main->>PG: grant_database_permissions(test_database)
    PG-->>Main: confirm grant
    Main->>PG: create_database_schemas(test_database)
    PG-->>Main: confirm create
    Main->>PG: grant_table_permissions(test_database)
    PG-->>Main: confirm grant

    Main->>PG: create_database_tables(database, mapping_destination.json)
    PG-->>Main: confirm tables
    Main->>PG: create_database_tables(test_database, mapping_destination.json)
    PG-->>Main: confirm tables

    Main->>Extract: extract_source_data(mapping_source.json)
    Extract-->>Main: return source_data

    Main->>Load: load_destination_tables(database, mapping_destination.json, source_data)
    Load-->>Main: confirm load

    alt seed_test_database == True
        Main->>Load: load_destination_tables(test_database, mapping_destination.json, source_data)
        Load-->>Main: confirm load
    end

    Main->>Log: Log "ETL COMPLETED"

📦 Packages

Domain	Package	Minimum Version
Database Drivers	psycopg2	2.9.10
	pyodbc	5.2.0
Environment	python-decouple	3.8
	python-dotenv	1.1.1
CLI Utilities	argparse	1.4.0
Data Processing	pandas	2.3.3
Code Formatting	black	25.9.0

🧩 Modular Components

Each stage of the pipeline is modular and testable:

create_database(): Creates PostgreSQL databases
create_role(): Creates database roles
grant_database_permissions(): Grants database-level access
create_database_schemas(): Creates schemas
create_database_tables(): Creates tables from SQL files
extract_source_data(): Extracts data from source queries
load_destination_tables(): Loads data into destination tables
build_mapping_data(): Generates query/table mappings from CSV

📦 Repository Structure

├── sql/                                   # SQL scripts and mappings
│   ├── source/
│   ├── destination/
│   ├── mapping.csv                        # Source-to-destination table/query mapping
│   ├── mapping_source.json                # Generated source query mappings
│   └── mapping_destination.json           # Generated destination query mappings
├── logs/                                  # Runtime logging
│   └── app.log                            # Application log output
├── src/
│   ├── build/
│   │   └── build_destination_database.py  # PostgreSQL DDL and permission utilities
│   ├── extract/
│   │   └── get_source_data.py            # Source data extraction logic
│   ├── load/
│   │   └── insert_source_data.py         # Destination data insertion logic
│   ├── utils/
│   │   ├── file_handler.py               # JSON and SQL file utilities
│   │   ├── logging_handler.py            # Structured logging
│   │   └── query_mapping_handler.py      # Mapping generation from CSV
│   └── main.py                           # ETL orchestration entry point

✅ Logging

Structured logging is enabled throughout the pipeline for traceability:

Success and failure messages for each stage
Exception handling with stack traces
Visual markers for pipeline progress

Example: app.log

🤝 Contributing

This project welcomes modular enhancements, teardown logic improvements, and onboarding clarity. Please submit pull requests with:

Isolated, testable changes
Updated docstrings and logging
Contributor-friendly scaffolds

📄 License

This repository is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
docs		docs
logs		logs
sql		sql
src		src
.env example		.env example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cross-Platform ETL Pipeline: SQL Server to PostgreSQL Migration

🪄 Overview

📦 Packages

🧩 Modular Components

📦 Repository Structure

✅ Logging

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Languages

License

kimberly-emerson/python-etl-sql-postgres

Folders and files

Latest commit

History

Repository files navigation

Cross-Platform ETL Pipeline: SQL Server to PostgreSQL Migration

🪄 Overview

📦 Packages

🧩 Modular Components

📦 Repository Structure

✅ Logging

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages