Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[flake8]
max-line-length = 88
extend-ignore = E501
4 changes: 1 addition & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@ on:
push:
branches:
- main
pull_request:
branches:
- main


jobs:
test:
Expand Down
36 changes: 36 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: Lint

on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
lint:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install black flake8 isort

- name: Run Black
run: black --check .

- name: Run Flake8
run: flake8 .

- name: Run isort
run: isort --check-only .
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
# Ignore Databricks folder
# Ignore Databricks, vscode
.databricks/
.vscode/

# virtual environments
venv*/

# Ignore edgetrain folders
models/
logs/
images/
results/

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
2 changes: 2 additions & 0 deletions .isort.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[settings]
profile = black
15 changes: 15 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
repos:
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black

- repo: https://github.com/pycqa/flake8
rev: 6.1.0
hooks:
- id: flake8

- repo: https://github.com/timothycrosley/isort
rev: 5.12.0
hooks:
- id: isort
97 changes: 63 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,44 @@
# EdgeTrain: Automated Resource Adjustment for Efficient Edge AI Training
**Version: 0.1.1-alpha**
**Version: 0.2.0**

EdgeTrain is a Python package designed to dynamically adjust deep learning training parameters and strategies based on CPU and GPU performance. It optimizes the training process by adjusting batch size and learning rate to ensure efficient training without overutilizing or underutilizing available resources. This package is specifically designed to reduce memory usage for model training on edge AI devices, laptops or other setups that have limited memory.
EdgeTrain is a Python package designed to dynamically adjust deep learning training parameters and strategies based on CPU and GPU performance. It optimizes the training process by adjusting batch size and learning rate to ensure efficient training without overutilizing or underutilizing available resources. This package is specifically designed to balance model training performance and memory usage on edge AI devices, laptops or other setups that have limited memory.

## Features

### Automated Resource Adjustment
EdgeTrain currently adjusts the following hyperparameters based on CPU/GPU usage:
- **Batch Size**: Automatically adjusts batch size for better memory optimization based on resource usage.
- **Learning Rate**: Dynamically adjusts the learning rate to improve training efficiency.

These adjustments optimize resource utilization throughout training, enabling efficient use of available resources on edge AI devices.
### Dynamic Resource-Based Training Adjustments
EdgeTrain monitors CPU and GPU usage in real-time and automatically adjusts hyperparameters during training:
- **Batch Size**: Increases or decreases to optimize memory usage.
- **Learning Rate**: Adjusts based on model performance to improve training efficiency.

### Resource Logging & Visualization
EdgeTrain logs critical system metrics (e.g., CPU and GPU usage) and training parameters (batch size, learning rate) for each epoch. The logs enable post-hoc visualization and analysis of:
EdgeTrain logs system performance and training parameters, allowing post-hoc visualization of:
- Resource utilization over time.
- Training parameter adjustments across epochs.
- Correlations between resource usage and model performance.

The built-in **visualization tools** help you understand how system resources are being utilized and how training parameters evolve during training.
The provided visualization tools illustrate how system resources are being utilized and how training parameters evolve during training.

### Customizable
### Customization and control
EdgeTrain is highly customizable. You can easily modify:
- **Resource Adjustment Thresholds**: Set CPU/GPU usage ranges to trigger adjustments.
- **Training Configuration Settings**: Adjust batch size increment, learning rate adjustments, and more.
- Tailor the optimization process to fit various setups, especially on edge devices with limited resources.
- **Fixed Pruning Strategy**: Pruning is applied with a constant ratio and stripped at the end to improve deployment efficiency.

## Release Notes for v0.2.0
This version introduces a **refined adaptive training strategy with a constant pruning ratio**. Key updates:

## Release Notes
Version: 0.1.1-alpha
- Fixed circular import issue in `create_model.py`. Now users should not encounter import errors during initialization.
- **Score Calculation**: This version now computes an **accuracy score** and a **memory score** based on resource usage and model performance.
- **Parameter Prioritization**: Accuracy and memory scores are weighted according to default or user-defined priority weighting schemes to idenfity a priority order for parameter adjustment. Only the top priority paramater is adjusted in each epoch.
- **Batch size priority** is weighted by memory usage.
- **Learning rate priority** is inversely weighted by accuracy improvement (i.e. increases if accuracy stagnates).
- **Fixed Pruning Ratio**: Pruning is constant and is stripped at the end.
- **Code Quality Improvements**: Added pre-commit hooks and CI linting for consistency.

## Installation
You can install the latest version of EdgeTrain via pip:

```bash
pip install https://github.com/BradleyEdelman/EdgeTrain/releases/download/v0.1.1-alpha/edgetrain-0.1.1a0.tar.gz
pip install https://github.com/BradleyEdelman/EdgeTrain/releases/download/v0.2.0-alpha/edgetrain-0.2.0.tar.gz
```

Alternatively, clone the repository and install manually:
Expand All @@ -45,45 +49,66 @@ git clone https://github.com/BradleyEdelman/edgetrain.git

# Checkout the desired version
cd edgetrain
git checkout tags/v0.1.1-alpha
git checkout tags/v0.2.0

# Install the package
pip install .
```

## Usage
## Usage Example
To use EdgeTrain, simply import the package and configure your training environment. Below is an example of using EdgeTrain with a TensorFlow model:
```
# Import library
import edgetrain

# Example of resource monitoring and training with dynamic adjustments
train_dataset = {'images': train_images, 'labels': train_labels}
history = edgetrain.dynamic_train(train_dataset, epochs=10, batch_size=32, lr=1e-3, log_file="resource_log.csv", dynamic_adjustments=True)
final_model, history = edgetrain.dynamic_train(
train_dataset,
epochs=10,
batch_size=32,
lr=1e-3,
log_file="resource_log.csv",
dynamic_adjustments=True
)

# Plot resource usage, parameter scoring and prioritization, and parameter values over time
edgetrain.log_usage_plot("resource_log.csv")
```

## File Tree
```
EdgeTrain/
├── edgetrain/
│ ├── __init__.py
│ ├── adjust_train_parameters.py
│ ├── calculate_priorities.py
│ ├── calculate_scores.py
│ ├── create_model.py
│ ├── dynamic_train.py
│ ├── edgetrain_folder
│ ├── resource_adjust.py
│ ├── edgetrain_folder.py
│ ├── resource_monitor.py
│ ├── train_visualize.py
│ └── train_visualize.py
├── notebooks/
│ └── EdgeTrain_example.ipynb
├── tests/
│ ├── __init__.py
│ ├── test_adjust_batch_size.py
│ ├── test_adjust_learning_rate.py
│ ├── test_adjust_train_parameters.py
│ ├── test_calculate_priorities.py
│ ├── test_calculate_scores.py
│ ├── test_create_model_tf.py
│ ├── test_log_usage_once.py
│ ├── test_sys_resources.py
│ ├── test_dynamic_train.py
├── example_notebooks/
│ ├── EdgeTrain_example.ipynb
│ └── test_sys_resources.py
├── .github/workflows/
│ ├── ci.yml
│ └──lint.yml
├── .flake8
├── .gitignore
├── .isort.cfg
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── LICENSE
├── README.md
Expand All @@ -93,13 +118,17 @@ EdgeTrain/
```

## Contributions
You can contribute by:
Contributions are welcomed:
- Reporting bugs or requesting features: [GitHub Issues](https://github.com/BradleyEdelman/edgetrain/issues)
- Improve documentation: Help refine explanations and add examples
- Testing: Test EdgeTrain using mode complex models and datasets in heavily resource-constrained environments


## License
This project is licensed under the MIT License - see the LICENSE file for details.

## Known Limitations (Alpha)
- The package currently supports TensorFlow only. Support for other frameworks, especially lightweight ones is planned for future releases.
- Model pruning and quantization are future features.
- Resource usage thresholds for dynamic adjustments are in the initial phase and may require tuning based on the training setup.

## Known Limitations (v0.2.0)
- Currently supports **TensorFlow only**. Future updates will expand framework support.
- **Gradient accumulation**: Planned for a future release to further optimize memory usage
- **Resource usage thresholds** are still in an experimental phase and may require fine-tuning.
21 changes: 15 additions & 6 deletions edgetrain/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
from .resource_monitor import sys_resources, log_usage_once
from .resource_adjust import adjust_threads, adjust_batch_size, adjust_grad_accum, adjust_learning_rate
from .edgetrain_folder import get_edgetrain_folder
from .train_visualize import log_usage_plot, log_train_time, training_history_plot
from .create_model import create_model_tf, create_model_torch
from .dynamic_train import dynamic_train
__all__ = [
"adjust_training_parameters",
"define_priorities",
"compute_scores",
"normalize_scores",
"check_sparsity",
"create_model_tf",
"dynamic_train",
"get_edgetrain_folder",
"log_usage_once",
"sys_resources",
"log_train_time",
"log_usage_plot",
"training_history_plot",
]
53 changes: 53 additions & 0 deletions edgetrain/adjust_train_parameters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
from edgetrain import sys_resources


def adjust_training_parameters(
priority_values, batch_size, lr, accuracy_score, resources=None
):
"""
Adjust the training parameters (batch size, learning rate) based on the highest priority score,
moving parameters in the opposite direction if resource usage or accuracy trends improve.

Parameters:
- priority_values (dict): Dictionary containing priority scores for batch size, pruning, and learning rate.
- batch_size (int): Current batch size.
- lr (float): Current learning rate.
- accuracy_score (float): Current accuracy score from the latest epoch (0-1).

Returns:
- adjusted_batch_size (int): Adjusted batch size.
- adjusted_lr (float): Adjusted learning rate.
"""

# Get system resources
if resources is None:
resources = sys_resources()

# Determine which parameter has the highest priority score
highest_priority = max(priority_values, key=priority_values.get)

# Adjust the parameter based on system resources and highest priority score
if highest_priority == "batch_size":
# Adjust batch size based on memory usage
if resources["cpu_memory_percent"] > 75 or resources["gpu_memory_percent"] > 75:
adjusted_batch_size = max(16, batch_size // 2) # Halve batch size
elif (
resources["cpu_memory_percent"] < 50
and resources["gpu_memory_percent"] < 50
):
adjusted_batch_size = min(128, batch_size * 2) # Double batch size
else:
adjusted_batch_size = batch_size
adjusted_lr = lr

elif highest_priority == "learning_rate":
# Adjust learning rate based on accuracy score
if accuracy_score < 0.05: # Example threshold for low accuracy
adjusted_lr = max(1e-5, lr * 0.5) # Reduce learning rate
elif accuracy_score > 0.95: # Example threshold for high accuracy
adjusted_lr = min(1e-2, lr * 1.2) # Slightly increase learning rate
else:
adjusted_lr = lr
adjusted_batch_size = batch_size

return adjusted_batch_size, adjusted_lr
33 changes: 33 additions & 0 deletions edgetrain/calculate_priorities.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
def define_priorities(normalized_scores, user_priorities=None):
"""
Calculate priority scores for adjustments based on resource usage and accuracy.

Parameters:
- normalized_scores (dict): Dictionary containing normalized scores for memory usage and accuracy.
- memory_score (float): Score indicating memory usage pressure (0-100).
- accuracy_score (float): Score indicating stagnation in accuracy improvement (0-1).
- user_priorities (dict, optional): Optional user-defined priorities for resource conservation and accuracy improvement.

Returns:
- priority_value (dict): A dictionary of priority scores for batch size and learning rate.
"""

# Default weights if user priorities are not provided
default_priorities = {
"batch_size_adjustment": 0.4,
"accuracy_improvement": 0.6,
}

# Use user-defined priorities if available
priorities = user_priorities if user_priorities else default_priorities

# Calculate weighted priority scores
priority_value = {
"batch_size": priorities["batch_size_adjustment"]
* normalized_scores.get("memory_score"),
"learning_rate": (
priorities["accuracy_improvement"] * normalized_scores.get("accuracy_score")
),
}

return priority_value
Loading