Skip to content

Commit 0aa9f7b

Browse files
committed
modified run
1 parent f870d8d commit 0aa9f7b

6 files changed

Lines changed: 1154 additions & 0 deletions

DELTA_HPC_SETUP.md

Lines changed: 391 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,391 @@
1+
# Running OpenFHE NC on Delta HPC
2+
3+
## Prerequisites
4+
5+
1. Access to Delta HPC (NCSA)
6+
2. Active allocation/account
7+
3. SSH access to Delta login nodes
8+
9+
---
10+
11+
## Quick Start
12+
13+
### Step 1: Upload Files to Delta
14+
15+
From your local machine:
16+
17+
```bash
18+
# Upload the batch script
19+
scp run_openfhe_delta.slurm YOUR_USERNAME@login.delta.ncsa.illinois.edu:~/
20+
21+
# Or clone directly on Delta (recommended)
22+
ssh YOUR_USERNAME@login.delta.ncsa.illinois.edu
23+
git clone -b gcn_v2 https://github.com/FedGraph/fedgraph.git
24+
cd fedgraph
25+
```
26+
27+
### Step 2: Check Your Account
28+
29+
On Delta login node:
30+
31+
```bash
32+
accounts
33+
```
34+
35+
Note your account name under "Project" column. You'll need this for the batch script.
36+
37+
### Step 3: Edit Batch Script
38+
39+
```bash
40+
cd ~/fedgraph
41+
nano run_openfhe_delta.slurm
42+
```
43+
44+
Change this line:
45+
```bash
46+
#SBATCH --account=REPLACE_WITH_YOUR_ACCOUNT
47+
```
48+
49+
To your actual account, for example:
50+
```bash
51+
#SBATCH --account=bbka-delta-cpu
52+
```
53+
54+
### Step 4: Submit Job
55+
56+
```bash
57+
sbatch run_openfhe_delta.slurm
58+
```
59+
60+
### Step 5: Monitor Job
61+
62+
```bash
63+
# Check job status
64+
squeue -u $USER
65+
66+
# Check output (replace JOBID with your actual job ID)
67+
tail -f openfhe-JOBID.out
68+
69+
# Check errors
70+
tail -f openfhe-JOBID.err
71+
```
72+
73+
---
74+
75+
## Option 2: Interactive Session (Faster Testing)
76+
77+
### Quick Interactive Test
78+
79+
```bash
80+
# On Delta login node
81+
srun --account=YOUR_ACCOUNT --partition=cpu-interactive \
82+
--nodes=1 --tasks=1 --cpus-per-task=8 --mem=16g \
83+
--time=01:00:00 --pty bash
84+
85+
# Once on compute node, check GLIBC
86+
ldd --version
87+
88+
# Load Python and test OpenFHE
89+
module load python/3.11
90+
pip install --user openfhe==1.2.3.0.24.4
91+
92+
# Quick test
93+
python3 -c "import openfhe; print('OpenFHE works!')"
94+
```
95+
96+
### Full Interactive Setup
97+
98+
```bash
99+
# 1. Make the script executable
100+
chmod +x run_openfhe_interactive_delta.sh
101+
102+
# 2. Edit the script to add your account
103+
nano run_openfhe_interactive_delta.sh
104+
# Change: ACCOUNT="REPLACE_WITH_YOUR_ACCOUNT"
105+
106+
# 3. Run it
107+
./run_openfhe_interactive_delta.sh
108+
```
109+
110+
---
111+
112+
## Comparing Plaintext vs OpenFHE
113+
114+
Create a custom comparison script:
115+
116+
```bash
117+
nano compare_openfhe.slurm
118+
```
119+
120+
Add this content:
121+
122+
```bash
123+
#!/bin/bash
124+
#SBATCH --mem=32g
125+
#SBATCH --nodes=1
126+
#SBATCH --ntasks-per-node=1
127+
#SBATCH --cpus-per-task=16
128+
#SBATCH --partition=cpu
129+
#SBATCH --account=YOUR_ACCOUNT
130+
#SBATCH --job-name=compare_openfhe
131+
#SBATCH --time=03:00:00
132+
#SBATCH -e compare-%j.err
133+
#SBATCH -o compare-%j.out
134+
135+
source $HOME/openfhe_env/bin/activate
136+
cd $HOME/fedgraph
137+
138+
python3 << 'PYEOF'
139+
from fedgraph.federated_methods import run_NC
140+
from attridict import AttriDict
141+
142+
# Test 1: Plaintext
143+
print("\n" + "="*60)
144+
print("TEST 1: PLAINTEXT (Baseline)")
145+
print("="*60)
146+
config = {
147+
"fedgraph_task": "NC",
148+
"method": "FedGCN",
149+
"use_encryption": False,
150+
"dataset": "cora",
151+
"num_trainers": 3,
152+
"num_rounds": 10,
153+
"seed": 42,
154+
}
155+
run_NC(AttriDict(config))
156+
157+
# Test 2: OpenFHE
158+
print("\n" + "="*60)
159+
print("TEST 2: OPENFHE (Two-Party Threshold HE)")
160+
print("="*60)
161+
config["use_encryption"] = True
162+
config["he_backend"] = "openfhe"
163+
run_NC(AttriDict(config))
164+
165+
print("\n" + "="*60)
166+
print("COMPARISON COMPLETE")
167+
print("Check the output above for accuracy difference")
168+
print("Expected: < 1% difference")
169+
print("="*60)
170+
PYEOF
171+
```
172+
173+
Then submit:
174+
175+
```bash
176+
sbatch compare_openfhe.slurm
177+
```
178+
179+
---
180+
181+
## File System Usage
182+
183+
Following Delta best practices:
184+
185+
- **HOME** (`$HOME`): Store scripts, environments, small files
186+
- **SCRATCH** (`$SCRATCH`): Store datasets, temporary outputs
187+
- **WORK/PROJECTS** (`$WORK`): Store results, checkpoints
188+
189+
Example directory structure:
190+
191+
```bash
192+
$HOME/
193+
├── openfhe_env/ # Python virtual environment
194+
└── fedgraph/ # Git repository
195+
196+
$SCRATCH/
197+
└── fedgraph_results/ # Training outputs, logs
198+
```
199+
200+
---
201+
202+
## GPU Version (Optional)
203+
204+
To use GPU nodes for faster training:
205+
206+
```bash
207+
#!/bin/bash
208+
#SBATCH --mem=64g
209+
#SBATCH --nodes=1
210+
#SBATCH --ntasks-per-node=1
211+
#SBATCH --cpus-per-task=16
212+
#SBATCH --partition=gpuA40x4 # A40 GPU partition
213+
#SBATCH --account=YOUR_ACCOUNT
214+
#SBATCH --gpus-per-node=1
215+
#SBATCH --job-name=openfhe_nc_gpu
216+
#SBATCH --time=01:00:00
217+
#SBATCH -e openfhe-gpu-%j.err
218+
#SBATCH -o openfhe-gpu-%j.out
219+
220+
module reset
221+
module load python/3.11
222+
module load cuda/11.8 # or appropriate CUDA version
223+
224+
source $HOME/openfhe_env/bin/activate
225+
cd $HOME/fedgraph/tutorials
226+
227+
# Run with GPU support
228+
python FGL_NC_HE.py
229+
```
230+
231+
---
232+
233+
## Troubleshooting
234+
235+
### Issue: GLIBC version error
236+
237+
```bash
238+
# Check GLIBC version on compute node
239+
srun --account=YOUR_ACCOUNT --partition=cpu-interactive \
240+
--nodes=1 --cpus-per-task=1 --mem=4g --time=00:05:00 \
241+
ldd --version
242+
```
243+
244+
**Expected:** GLIBC 2.31+ (Delta has RedHat 8.4)
245+
**Required:** GLIBC 2.29+ for OpenFHE 1.2.3
246+
247+
If version is too old, try:
248+
```bash
249+
pip install openfhe==1.1.0 # Earlier version
250+
```
251+
252+
### Issue: Module not found
253+
254+
```bash
255+
# Check available Python modules
256+
module avail python
257+
258+
# Try different Python version
259+
module load python/3.12
260+
```
261+
262+
### Issue: Out of memory
263+
264+
Increase memory in SBATCH directive:
265+
```bash
266+
#SBATCH --mem=64g # instead of 32g
267+
```
268+
269+
### Issue: Job pending for long time
270+
271+
```bash
272+
# Check why job is pending
273+
squeue -u $USER -l
274+
275+
# Try interactive partition for testing
276+
#SBATCH --partition=cpu-interactive
277+
278+
# Or use preempt partition (cheaper, may be interrupted)
279+
#SBATCH --partition=cpu-preempt
280+
```
281+
282+
---
283+
284+
## Expected Resource Usage
285+
286+
Based on the FedGCN NC implementation:
287+
288+
| Resource | Cora Dataset | Citeseer | Pubmed |
289+
|----------|--------------|----------|--------|
290+
| **Memory** | ~16GB | ~20GB | ~32GB |
291+
| **Cores** | 8-16 | 8-16 | 16-32 |
292+
| **Time** | ~30 min | ~45 min | ~90 min |
293+
| **Storage** | ~2GB | ~3GB | ~5GB |
294+
295+
---
296+
297+
## Monitoring Your Job
298+
299+
### While job is running:
300+
301+
```bash
302+
# SSH to compute node
303+
squeue -u $USER # Get node name
304+
ssh NODE_NAME # e.g., ssh cn042
305+
306+
# Once on node:
307+
top -u $USER
308+
htop
309+
nvidia-smi # if using GPU
310+
```
311+
312+
### Check output in real-time:
313+
314+
```bash
315+
# Get job ID
316+
JOBID=$(squeue -u $USER -h -o %i | head -1)
317+
318+
# Tail output
319+
tail -f openfhe-${JOBID}.out
320+
321+
# Or use watch
322+
watch -n 5 tail -20 openfhe-${JOBID}.out
323+
```
324+
325+
---
326+
327+
## Batch Job Array (Multiple Experiments)
328+
329+
To run multiple configurations:
330+
331+
```bash
332+
#!/bin/bash
333+
#SBATCH --array=0-4 # 5 jobs
334+
#SBATCH --mem=32g
335+
#SBATCH --nodes=1
336+
#SBATCH --cpus-per-task=16
337+
#SBATCH --partition=cpu
338+
#SBATCH --account=YOUR_ACCOUNT
339+
#SBATCH --time=02:00:00
340+
#SBATCH -e array-%A_%a.err # %A=job ID, %a=array index
341+
#SBATCH -o array-%A_%a.out
342+
343+
# Define datasets
344+
DATASETS=("cora" "citeseer" "pubmed" "cora" "citeseer")
345+
HE_BACKENDS=("openfhe" "openfhe" "openfhe" "tenseal" "tenseal")
346+
347+
DATASET=${DATASETS[$SLURM_ARRAY_TASK_ID]}
348+
HE_BACKEND=${HE_BACKENDS[$SLURM_ARRAY_TASK_ID]}
349+
350+
echo "Running: Dataset=$DATASET, Backend=$HE_BACKEND"
351+
352+
source $HOME/openfhe_env/bin/activate
353+
cd $HOME/fedgraph
354+
355+
python3 << PYEOF
356+
from fedgraph.federated_methods import run_NC
357+
from attridict import AttriDict
358+
359+
config = {
360+
"fedgraph_task": "NC",
361+
"method": "FedGCN",
362+
"use_encryption": True,
363+
"he_backend": "$HE_BACKEND",
364+
"dataset": "$DATASET",
365+
"num_trainers": 3,
366+
"num_rounds": 10,
367+
"seed": 42,
368+
}
369+
run_NC(AttriDict(config))
370+
PYEOF
371+
```
372+
373+
Submit array job:
374+
```bash
375+
sbatch array_experiment.slurm
376+
```
377+
378+
---
379+
380+
## Next Steps
381+
382+
1. **First time setup**: Run interactive session to verify everything works
383+
2. **Single experiment**: Use `run_openfhe_delta.slurm` for single runs
384+
3. **Comparisons**: Use custom scripts to compare plaintext vs OpenFHE
385+
4. **Production**: Use batch arrays for multiple experiments
386+
387+
**See also:**
388+
- `README_OPENFHE.md` - Implementation details
389+
- `OPENFHE_NC_IMPLEMENTATION.md` - Technical documentation
390+
- Delta docs: https://docs.ncsa.illinois.edu/systems/delta/
391+

0 commit comments

Comments
 (0)