This project showcases different ETL (Extract, Transform, Load) methods using Python and MySQL. The goal is to compare execution times for three approaches:
- One-by-One Transfer – Transfers data row by row.
- Import/Export Using CSV – Transfers data using a CSV file.
- Multithreading – Uses parallel processing for faster execution.
After execution, the three methods are compared based on execution time using a graph.
- Python (Scripting language)
- MySQL (Database)
- MySQL Connector (Python library for MySQL)
- Pandas (For CSV handling)
- Threading (For parallel execution)
- Matplotlib (For visualization)
ETL_Project/
│── TransferDataOneByOne.py # Case 1: One-by-One Transfer
│── TransferDataImportExport.py # Case 2: Using Import/Export
│── TransferDataMultithreading.py # Case 3: Using Multithreading
│── CompareExecutionTimes.py # Generates comparison graph
│── config.py # Database configuration
│── README.md # Project documentation
│── requirements.txt # Dependencies
Ensure you have Python installed. Then, install dependencies:
pip install mysql-connector-python pandas threading time matplotlibCreate a database and tables in MySQL:
CREATE DATABASE etl_db;
USE etl_db;
CREATE TABLE source_table (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
age INT
);
CREATE TABLE destination_table (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
age INT
);Insert sample data:
INSERT INTO source_table (name, age) VALUES
('Alice', 25),
('Bob', 30),
('Charlie', 22),
('David', 35);Modify config.py or update connection settings directly in each script:
DB_CONFIG = {
"host": "localhost",
"user": "root",
"password": "yourpassword",
"database": "etl_db",
"port": 3306
}python TransferDataOneByOne.pypython TransferDataImportExport.pypython TransferDataMultithreading.pyAfter running all three scripts, update CompareExecutionTimes.py with execution times and generate a comparison graph:
python CompareExecutionTimes.py| ETL Method | Execution Time (Seconds) |
|---|---|
| One-by-One | X.XX |
| Import/Export | X.XX |
| Multithreading | X.XX |
- One-by-One Transfer: Slowest due to row-by-row insertion.
- Import/Export: Faster due to batch processing.
- Multithreading: Fastest due to parallel execution.
This project highlights how different ETL methods perform in data transfer scenarios. It demonstrates the advantages of using multithreading for efficient processing.
🔹 One-by-One Transfer: Basic method, slowest. 🔹 Import/Export: Medium speed, uses batch operations. 🔹 Multithreading: Fastest, due to parallel execution.