Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
f528cee
initial flwr integration commit
kminhta Oct 21, 2024
f0c41b8
further enabling work
kminhta Oct 21, 2024
aa08b5e
additional updates
kminhta Oct 22, 2024
e0b1103
updates
kminhta Oct 23, 2024
6a815b7
enable supernode process configuration to pull number of partitions a…
kminhta Nov 1, 2024
be6927a
Merge branch 'develop' into flwr-integration-taskrunner
kminhta Nov 22, 2024
6bc91e2
Merge branch 'develop' into flwr-integration-taskrunner
kminhta Dec 7, 2024
a40dc48
update for flwr-nightly
kminhta Dec 7, 2024
3a0e01e
update to flwr 1.14
kminhta Dec 8, 2024
49ef2eb
enable runner to automatically set different client ports
kminhta Dec 16, 2024
f529fe2
add queue-based processing to avoid communication cancellation throug…
kminhta Dec 16, 2024
ffe7b47
add todo
kminhta Dec 16, 2024
a7ebd91
add FLEX component
kminhta Dec 17, 2024
ebf9552
move local grpc server to FLEX
kminhta Dec 17, 2024
8020a2f
change fim to flex
kminhta Dec 17, 2024
700b21b
add method to flex base class for acquiring local grpc client
kminhta Dec 17, 2024
5995fcf
move message conversion out of openfl client methods
kminhta Dec 18, 2024
37893a7
add flwr run to FLEX, add FLEXAssigner
kminhta Dec 18, 2024
48475fd
remove some TODOs and commented out code
kminhta Dec 18, 2024
2c72feb
click path for data.yaml set to true
kminhta Dec 18, 2024
83e6c80
add requirements
kminhta Dec 19, 2024
07f2db7
make importing flwr components conditioned on existing lib
kminhta Dec 19, 2024
da975e5
enable send local task results in order to gracefully terminate
kminhta Dec 20, 2024
d2d6907
fix superlink shutdown
kminhta Dec 20, 2024
18aed1f
gracefully terminate agg processes
kminhta Dec 20, 2024
e290e31
graceful shutdown at collaborators
kminhta Dec 20, 2024
bb000f4
add automatic shutdown to taskrunner
kminhta Dec 20, 2024
81dfdda
remove flower readme, add --insecure as default
kminhta Dec 20, 2024
961982e
add readme
kminhta Dec 20, 2024
ba02e9c
modify plan and update readme
kminhta Dec 20, 2024
7db176d
install openfl instructions
kminhta Dec 20, 2024
bf9f955
update torch and torchvision
kminhta Dec 20, 2024
4a3b37d
expand conditional
kminhta Jan 2, 2025
76b7936
edit taskrunner docstrings
kminhta Jan 2, 2025
ab3093e
update name FLEX to Connector
kminhta Jan 2, 2025
21c13b5
more docstring
kminhta Jan 3, 2025
12ee6df
move app-pytorch to src so that workspace can be properly exported an…
kminhta Jan 7, 2025
27ea8d1
testing gramine
kminhta Jan 9, 2025
caa0ff0
fix monitor subprocess
kminhta Jan 13, 2025
b91e887
adding try excepts for subprocess shutdown
kminhta Jan 14, 2025
1a0149c
termination event fix
kminhta Jan 14, 2025
d1e85d3
more signal handling
kminhta Jan 14, 2025
0f48742
docstrings
kminhta Jan 14, 2025
7b82004
improve docstrings
kminhta Jan 14, 2025
78bd7fd
patch flower
kminhta Jan 16, 2025
f738e49
method to ctrl+c shutdown
kminhta Jan 16, 2025
eb959b2
add os env variable to install flower in workspace
kminhta Jan 17, 2025
753ab7c
save tmp and fab in flwr home
kminhta Jan 17, 2025
806975d
make flwr_home if it doesn't exist
kminhta Jan 17, 2025
8bb0740
fixes to patch
kminhta Jan 17, 2025
cea7bd9
monitor output instead of subprocess calls
kminhta Jan 22, 2025
3d750ef
add exceptions for graceful shutdown to forcefully terminate
kminhta Jan 23, 2025
5fe3c37
updating auto shutdown mechanism to shut off supernode and local grpc…
kminhta Jan 28, 2025
eddb028
update to run shut down command from server
kminhta Jan 28, 2025
121570c
download data beforehand
kminhta Jan 30, 2025
0e74f1c
remove debug break
kminhta Jan 30, 2025
f9b6f20
Merge branch 'develop' into flwr-integration-taskrunner
kminhta Jan 31, 2025
7b84635
Merge branch 'develop' into flwr-integration-taskrunner
kminhta Jan 31, 2025
7664cc6
fixes around log and persistent db
kminhta Jan 31, 2025
f5903e9
fixes to workspace
kminhta Jan 31, 2025
ce9ceb5
update auto shutdown and dataset
kminhta Jan 31, 2025
f996199
give time for child processes to stop
kminhta Jan 31, 2025
3809ff9
write logs set to false to avoid issues with gramine
kminhta Jan 31, 2025
46fa4f9
fix condition to forcefully shutdown process
kminhta Feb 5, 2025
5f48691
add sleep to let client app close
kminhta Feb 6, 2025
07870e8
fixing termination
kminhta Feb 6, 2025
c26c9d2
create a separate try-except block for subprocess
kminhta Feb 6, 2025
96d8570
adjust exception in signal handler
kminhta Feb 7, 2025
80f515a
update connector.yaml
kminhta Feb 10, 2025
d4fa28c
update flwr run command
kminhta Feb 11, 2025
e859d36
fix run-id flag
kminhta Feb 11, 2025
e9f1e36
run_id test
kminhta Feb 11, 2025
efb6752
add debug steps for sgx
kminhta Feb 11, 2025
343de5e
fix flwr run patch
kminhta Feb 11, 2025
38b4f88
new automatic shutdown mech
kminhta Feb 11, 2025
28fa32d
changes to local_grpc_client to enable new shutdown mechanism
kminhta Feb 12, 2025
c219c66
fix automatic_shutdown flag
kminhta Feb 12, 2025
499dedc
debugging
kminhta Feb 12, 2025
9607bb0
more debug lines
kminhta Feb 12, 2025
156ba1f
debug 3
kminhta Feb 12, 2025
951db9d
additional debug statements
kminhta Feb 13, 2025
3af36fe
use process mode and directly track serverapp
kminhta Feb 13, 2025
02c45db
remove some debug stuff
kminhta Feb 13, 2025
d9c9913
remove auto shutdown
kminhta Feb 13, 2025
160ab8a
adding additional failsafes
kminhta Feb 13, 2025
04d173c
logic to save out model
kminhta Feb 14, 2025
6a558c7
stable commit. TODO: fix docstrings. NotImplemented: callbacks
kminhta Feb 20, 2025
306f7ff
stable commit 2.0: switched order of stop to kill serverapp first
kminhta Feb 20, 2025
32e96d0
stable commit 3.0: new self attribute for serverapp
kminhta Feb 20, 2025
7c413f1
stable commit 4.0: adding flower taskrunner and dataloader to init
kminhta Feb 20, 2025
676c273
keep openfl_client at the collaborator
kminhta Feb 20, 2025
7dc9de0
move serverapp callback out
kminhta Feb 20, 2025
70b02a8
attempts to decouple lgs from task runner - still WIP
kminhta Feb 20, 2025
9265d21
add connector utils
kminhta Feb 20, 2025
54cb629
more seamless separate of base class and extensibility of sub class
kminhta Feb 21, 2025
806c0f8
run local grpc server on separate ports
kminhta Feb 21, 2025
1a4bb4b
add flower app installation and fab installation to task runner
kminhta Feb 24, 2025
27866fc
update dataloader to accept datapath
kminhta Feb 24, 2025
95bc634
move installation back to requirements.txt
kminhta Feb 24, 2025
1a34378
add initialize_tensorkey_for_functions method with pass
kminhta Feb 25, 2025
b32054b
debug
kminhta Feb 25, 2025
092c55d
update data loader
kminhta Feb 28, 2025
0525520
fix defaults
kminhta Feb 28, 2025
40b58e4
clean up save function
kminhta Feb 28, 2025
7f25dd0
udpate README.md
kminhta Feb 28, 2025
1253150
pass local_server_port key from yaml to taksrunner
kminhta Mar 3, 2025
5188b3c
remove atomic connection by default
kminhta Mar 3, 2025
f98189f
Merge branch 'develop' into flwr-integration-taskrunner
kminhta Mar 6, 2025
d99d62f
updates
kminhta Mar 6, 2025
1f3de0a
more changes
kminhta Mar 7, 2025
422af90
Merge branch 'develop' into flwr-integration-taskrunner
kminhta Mar 7, 2025
4cba60a
fix headers
kminhta Mar 7, 2025
fb26660
add .workspace
kminhta Mar 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions openfl-workspace/flower-app-pytorch/.workspace
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
current_plan_name: default

280 changes: 280 additions & 0 deletions openfl-workspace/flower-app-pytorch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
# Open(FL)ower

This workspace demonstrates a new functionality in OpenFL to interoperate with [Flower](https://flower.ai/). In particular, a user can now use the Flower API to run on OpenFL infrastructure. OpenFL will act as an intermediary step between the Flower SuperLink and Flower SuperNode to relay messages across the network using OpenFL's transport mechanisms.

## Overview

In this repository, you'll notice a directory under `src` called `app-pytorch`. This is essentially a Flower PyTorch app created using Flower's `flwr new` command that has been modified to run a local federation. The `client_app.py` and `server_app.py` dictate what will be run by the client and server respectively. `task.py` defines the logic that will be executed by each app, such as the model definition, train/test tasks, etc. Under `server_app.py` a section titled "Save Model" is added in order to save the `best.pbuf` and `last.pbuf` models from the experiment in your local workspace under `./save`. This uses native OpenFL logic to store the model as a `.pbuf` in order to later be retrieved by `fx model save` into a native format (limited to `.npz` to be deep learning framework agnostic), but this can be overridden to save the model directly following Flower's recommended method for [saving model checkpoints](https://flower.ai/docs/framework/how-to-save-and-load-model-checkpoints.html).

## Execution Methods

There are two ways to execute this:

1. Automatic shutdown which will spawn a `server-app` in isolation and trigger an experiment termination once the it shuts down. (Default/Recommended)
2. Running `SuperLink` and `SuperNode` as [long-lived components](#long-lived-superlink-and-supernode) that will indefinitely wait for new runs. (Limited Functionality)

## Getting Started

### Install OpenFL

Create virtual env
```sh
pip install virtualenv
virtualenv ./venv
source ./venv/bin/activate
```

Install OpenFL from source
```sh
git clone https://github.com/securefederatedai/openfl.git
cd openfl
pip install -e .
```

### Create a Workspace

Start by creating a workspace:

```sh
fx workspace create --template flower-app-pytorch --prefix my_workspace
cd my_workspace
```

This will create a workspace in your current working directory called `./my_workspace` as well as install the Flower app defined in `./app-pytorch.` This will be where the experiment takes place.

### Configure the Experiment
Notice under `./plan`, you will find the familiar OpenFL YAML files to configure the experiment. `col.yaml` and `data.yaml` will be populated by the collaborators that will run the Flower client app and the respective data shard or directory they will perform their training and testing on.
`plan.yaml` configures the experiment itself. The Open-Flower integration makes a few key changes to the `plan.yaml`:

1. Introduction of a new top-level key (`connector`) to configure a newly introduced component called `Connector`. Specifically, the Flower integration uses a `Connector` subclass called `ConnectorFlower`. This component is run by the aggregator and is responsible for initializing the Flower `SuperLink` and connecting to the OpenFL server. The `SuperLink` parameters can be configured using `connector.settings.superlink_params`. If nothing is supplied, it will simply run `flower-superlink --insecure` with the command's default settings as dictated by Flower. It also includes the option to run the flwr run command via `connector.settings.flwr_run_params`. If `flwr_run_params` are not provided, the user will be expected to run `flwr run <app>` from the aggregator machine to initiate the experiment.

```yaml
connector:
defaults: plan/defaults/connector.yaml
template: openfl.component.ConnectorFlower
settings:
superlink_params:
insecure: True
serverappio-api-address: 127.0.0.1:9091
fleet-api-address: 127.0.0.1:9092
exec-api-address: 127.0.0.1:9093
flwr_run_params:
flwr_app_name: "app-pytorch"
federation_name: "local-poc"
```

2. `ConnectorAssigner` and tasks designed to explicitly run `start_client_adapter` task for every authorized collaborator, which is defined by the Task Runner.

```yaml
assigner:
defaults: plan/defaults/assigner.yaml
template: openfl.component.ConnectorAssigner
settings:
task_groups:
- name: Connector_Flower
tasks:
- start_client_adapter
```

3. `FlowerTaskRunner` which will execute the `start_client_adapter` task. This task starts the Flower SuperNode and makes a connection to the OpenFL client. Additionally, the `FlowerTaskRunner` has an additional setting `FlowerTaskRunner.settings.auto_shutdown` which is default set to `True`. When set to `True`, the task runner will shut the SuperNode at the completion of an experiment, otherwise, it will run continuously.

```yaml
task_runner:
defaults: plan/defaults/task_runner.yaml
template: openfl.federated.task.runner_flower.FlowerTaskRunner
settings:
auto_shutdown: True
```
3. `FlowerDataLoader` with similar high-level functionality to other dataloaders.

**IMPORTANT NOTE**: `aggregator.settings.rounds_to_train` is set to 1. __Do not edit this__. The actual number of rounds for the experiment is controlled by Flower logic inside of `./app-pytorch/pyproject.toml`. The entirety of the Flower experiment will run in a single OpenFL round. The aggregator round is there to stop the OpenFL components at the completion of the experiment.

## Running the Workspace
Run the workspace as normal (certify the workspace, initialize the plan, register the collaborators, etc.):

```SH
# Generate a Certificate Signing Request (CSR) for the Aggregator
fx aggregator generate-cert-request

# The CA signs the aggregator's request, which is now available in the workspace
fx aggregator certify --silent

# Initialize FL Plan and Model Weights for the Federation
fx plan initialize

################################
# Setup Collaborator 1
################################

# Create a collaborator named "collaborator1" that will use shard "0"
fx collaborator create -n collaborator1 -d 0

# Generate a CSR for collaborator1
fx collaborator generate-cert-request -n collaborator1

# The CA signs collaborator1's certificate
fx collaborator certify -n collaborator1 --silent

################################
# Setup Collaborator 2
################################

# Create a collaborator named "collaborator2" that will use shard "1"
fx collaborator create -n collaborator2 -d 1

# Generate a CSR for collaborator2
fx collaborator generate-cert-request -n collaborator2

# The CA signs collaborator2's certificate
fx collaborator certify -n collaborator2 --silent

##############################
# Start to Run the Federation
##############################

# Run the Aggregator
fx aggregator start
```

This will prepare the workspace and start the OpenFL aggregator, Flower superlink, and Flower serverapp. You should see something like:

```SH
INFO 🧿 Starting the Aggregator Service.
.
.
.
INFO : Starting Flower SuperLink
WARNING : Option `--insecure` was set. Starting insecure HTTP server.
INFO : Flower Deployment Engine: Starting Exec API on 127.0.0.1:9093
INFO : Flower ECE: Starting ServerAppIo API (gRPC-rere) on 127.0.0.1:9091
INFO : Flower ECE: Starting Fleet API (GrpcAdapter) on 127.0.0.1:9092
.
.
.
INFO : [INIT]
INFO : Using initial global parameters provided by strategy
INFO : Starting evaluation of initial global parameters
INFO : Evaluation returned no results (`None`)
INFO :
INFO : [ROUND 1]
```

### Start Collaborators
Open 2 additional terminals for collaborators.
For collaborator 1's terminal, run:
```SH
fx collaborator start -n collaborator1
```
For collaborator 2's terminal, run:
```SH
fx collaborator start -n collaborator2
```
This will start the collaborator nodes, the Flower `SuperNode`, and Flower `ClientApp`, and begin running the Flower experiment. You should see something like:

```SH
INFO 🧿 Starting a Collaborator Service.
.
.
.
INFO : Starting Flower SuperNode
WARNING : Option `--insecure` was set. Starting insecure HTTP channel to 127.0.0.1:...
INFO : Starting Flower ClientAppIo gRPC server on 127.0.0.1:...
INFO :
INFO : [RUN 297994661073077505, ROUND 1]
```
### Completion of the Experiment
Upon the completion of the experiment, on the `aggregator` terminal, the Flower components should send an experiment summary as the `SuperLink `continues to receive requests from the supernode:
```SH
INFO : [SUMMARY]
INFO : Run finished 3 round(s) in 93.29s
INFO : History (loss, distributed):
INFO : round 1: 2.0937052175497555
INFO : round 2: 1.8027011854633406
INFO : round 3: 1.6812996898487116
```
If `automatic_shutdown` is enabled, this will be shortly followed by the OpenFL `aggregator` receiving "results" from the `collaborator` and subsequently shutting down:

```SH
INFO Round 0: Collaborators that have completed all tasks: ['collaborator1', 'collaborator2']
INFO Experiment Completed. Cleaning up...
INFO Sending signal to collaborator collaborator2 to shutdown...
INFO Sending signal to collaborator collaborator1 to shutdown...
INFO [OpenFL Connector] Stopping server process with PID: ...
INFO : SuperLink terminated gracefully.
INFO [OpenFL Connector] Server process stopped.
```
Upon the completion of the experiment, on the `collaborator` terminals, the Flower components should be outputting the information about the run:

```SH
INFO : [RUN ..., ROUND 3]
INFO : Received: evaluate message
INFO : Start `flwr-clientapp` process
INFO : [flwr-clientapp] Pull `ClientAppInputs` for token ...
INFO : [flwr-clientapp] Push `ClientAppOutputs` for token ...
```

If `automatic_shutdown` is enabled, this will be shortly followed by the OpenFL `collaborator` shutting down:

```SH
INFO : SuperNode terminated gracefully.
INFO SuperNode process terminated.
INFO Shutting down local gRPC server...
INFO local gRPC server stopped.
INFO Waiting for tasks...
INFO Received shutdown signal. Exiting...
```
Congratulations, you have run a Flower experiment through OpenFL's task runner!

## Advanced Usage
### Long-lived SuperLink and SuperNode
A user can set `automatic_shutdown: False` in the `Connector` settings of the `plan.yaml`.

```yaml
connector :
defaults : plan/defaults/connector.yaml
template : openfl.component.ConnectorFlower
settings :
automatic_shutdown: False
```

By doing so, Flower's `ServerApp` and `ClientApp` will still shut down at the completion of the Flower experiment, but the `SuperLink` and `SuperNode` will continue to run. As a result, on the `aggregator` terminal, you will see a constant request coming from the `SuperNode`:

```SH
INFO : GrpcAdapter.PullTaskIns
INFO : GrpcAdapter.PullTaskIns
INFO : GrpcAdapter.PullTaskIns
```
You can run another experiment by opening another terminal, navigating to this workspace, and running:
```SH
flwr run ./src/app-pytorch
```
It will run another experiment. Once you are done, you can manually shut down OpenFL's `collaborator` and Flower's `SuperNode` with `CTRL+C`. This will trigger a task-completion by the task runner that'll subsequently begin the graceful shutdown process of the OpenFL and Flower components.

### Running in SGX Enclave
Gramine does not support all Linux system calls. Flower FAB is built and installed at runtime. During this, `utime()` is called, which is an [unsupported call](https://gramine.readthedocs.io/en/latest/devel/features.html#list-of-system-calls), resulting in error or unexpected behavior. To navigate this, when running in an SGX enclave, we opt to build and install the FAB during initialization and package it alongside the OpenFL workspace. To make this work, we introduce some patches to Flower's build command. In addition, since secure enclaves have strict read/write permissions, dictate by a set of trusted/allowed files, we also patch Flower's telemetry command in order to consolidate written file locations.

To run these patches, simply add `patch: True` to the `Connector` and `Task Runner` settings. For the `Task Runner` also include the name of the Flower app for building and installation.

```yaml
connector :
defaults : plan/defaults/connector.yaml
template : openfl.component.ConnectorFlower
settings :
superlink_params :
insecure : True
serverappio-api-address : 127.0.0.1:9091
fleet-api-address : 127.0.0.1:9092
exec-api-address : 127.0.0.1:9093
patch : True
flwr_run_params :
flwr_app_name : "app-pytorch"
federation_name : "local-poc"
patch : True

task_runner :
defaults : plan/defaults/task_runner.yaml
template : openfl.federated.task.runner_flower.FlowerTaskRunner
settings :
patch : True
flwr_app_name : "app-pytorch"
```
5 changes: 5 additions & 0 deletions openfl-workspace/flower-app-pytorch/plan/cols.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Copyright (C) 2024 Intel Corporation
# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.

collaborators:

2 changes: 2 additions & 0 deletions openfl-workspace/flower-app-pytorch/plan/data.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Copyright (C) 2024 Intel Corporation
# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.
58 changes: 58 additions & 0 deletions openfl-workspace/flower-app-pytorch/plan/plan.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Copyright (C) 2024 Intel Corporation
# Licensed subject to the terms of the separately executed evaluation license agreement between Intel Corporation and you.

aggregator :
defaults : plan/defaults/aggregator.yaml
template : openfl.component.Aggregator
settings :
rounds_to_train : 1 # DO NOT EDIT. This is to indicate OpenFL communication rounds
persist_checkpoint : false
write_logs : false

connector :
defaults : plan/defaults/connector.yaml
template : openfl.component.ConnectorFlower
settings :
superlink_params :
insecure : True
serverappio-api-address : 127.0.0.1:9091 # note [kta-intel]: ServerApp will connect here
fleet-api-address : 127.0.0.1:9092 # note [kta-intel]: local gRPC client will connect here
exec-api-address : 127.0.0.1:9093 # note [kta-intel]: port for server-app toml (for flwr run)
flwr_run_params :
flwr_app_name : "app-pytorch"
federation_name : "local-poc"

collaborator :
defaults : plan/defaults/collaborator.yaml
template : openfl.component.Collaborator

data_loader :
defaults : plan/defaults/data_loader.yaml
template : openfl.federated.data.loader_flower.FlowerDataLoader
settings :
collaborator_count : 2

task_runner :
defaults : plan/defaults/task_runner.yaml
template : openfl.federated.task.runner_flower.FlowerTaskRunner

network :
defaults : plan/defaults/network.yaml

assigner :
defaults : plan/defaults/assigner.yaml
template : openfl.component.RandomGroupedAssigner
settings :
task_groups :
- name : Connector_Flower
percentage : 1.0
tasks :
- start_client_adapter

tasks :
defaults : plan/defaults/tasks_connector.yaml
settings :
connect_to : Flower

compression_pipeline :
defaults : plan/defaults/compression_pipeline.yaml
1 change: 1 addition & 0 deletions openfl-workspace/flower-app-pytorch/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
./src/app-pytorch
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""app-pytorch: A Flower / PyTorch app."""
Loading
Loading