|
| 1 | +# Running OpenFHE NC on Delta HPC |
| 2 | + |
| 3 | +## Prerequisites |
| 4 | + |
| 5 | +1. Access to Delta HPC (NCSA) |
| 6 | +2. Active allocation/account |
| 7 | +3. SSH access to Delta login nodes |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## Quick Start |
| 12 | + |
| 13 | +### Step 1: Upload Files to Delta |
| 14 | + |
| 15 | +From your local machine: |
| 16 | + |
| 17 | +```bash |
| 18 | +# Upload the batch script |
| 19 | +scp run_openfhe_delta.slurm YOUR_USERNAME@login.delta.ncsa.illinois.edu:~/ |
| 20 | + |
| 21 | +# Or clone directly on Delta (recommended) |
| 22 | +ssh YOUR_USERNAME@login.delta.ncsa.illinois.edu |
| 23 | +git clone -b gcn_v2 https://github.com/FedGraph/fedgraph.git |
| 24 | +cd fedgraph |
| 25 | +``` |
| 26 | + |
| 27 | +### Step 2: Check Your Account |
| 28 | + |
| 29 | +On Delta login node: |
| 30 | + |
| 31 | +```bash |
| 32 | +accounts |
| 33 | +``` |
| 34 | + |
| 35 | +Note your account name under "Project" column. You'll need this for the batch script. |
| 36 | + |
| 37 | +### Step 3: Edit Batch Script |
| 38 | + |
| 39 | +```bash |
| 40 | +cd ~/fedgraph |
| 41 | +nano run_openfhe_delta.slurm |
| 42 | +``` |
| 43 | + |
| 44 | +Change this line: |
| 45 | +```bash |
| 46 | +#SBATCH --account=REPLACE_WITH_YOUR_ACCOUNT |
| 47 | +``` |
| 48 | + |
| 49 | +To your actual account, for example: |
| 50 | +```bash |
| 51 | +#SBATCH --account=bbka-delta-cpu |
| 52 | +``` |
| 53 | + |
| 54 | +### Step 4: Submit Job |
| 55 | + |
| 56 | +```bash |
| 57 | +sbatch run_openfhe_delta.slurm |
| 58 | +``` |
| 59 | + |
| 60 | +### Step 5: Monitor Job |
| 61 | + |
| 62 | +```bash |
| 63 | +# Check job status |
| 64 | +squeue -u $USER |
| 65 | + |
| 66 | +# Check output (replace JOBID with your actual job ID) |
| 67 | +tail -f openfhe-JOBID.out |
| 68 | + |
| 69 | +# Check errors |
| 70 | +tail -f openfhe-JOBID.err |
| 71 | +``` |
| 72 | + |
| 73 | +--- |
| 74 | + |
| 75 | +## Option 2: Interactive Session (Faster Testing) |
| 76 | + |
| 77 | +### Quick Interactive Test |
| 78 | + |
| 79 | +```bash |
| 80 | +# On Delta login node |
| 81 | +srun --account=YOUR_ACCOUNT --partition=cpu-interactive \ |
| 82 | + --nodes=1 --tasks=1 --cpus-per-task=8 --mem=16g \ |
| 83 | + --time=01:00:00 --pty bash |
| 84 | + |
| 85 | +# Once on compute node, check GLIBC |
| 86 | +ldd --version |
| 87 | + |
| 88 | +# Load Python and test OpenFHE |
| 89 | +module load python/3.11 |
| 90 | +pip install --user openfhe==1.2.3.0.24.4 |
| 91 | + |
| 92 | +# Quick test |
| 93 | +python3 -c "import openfhe; print('OpenFHE works!')" |
| 94 | +``` |
| 95 | + |
| 96 | +### Full Interactive Setup |
| 97 | + |
| 98 | +```bash |
| 99 | +# 1. Make the script executable |
| 100 | +chmod +x run_openfhe_interactive_delta.sh |
| 101 | + |
| 102 | +# 2. Edit the script to add your account |
| 103 | +nano run_openfhe_interactive_delta.sh |
| 104 | +# Change: ACCOUNT="REPLACE_WITH_YOUR_ACCOUNT" |
| 105 | + |
| 106 | +# 3. Run it |
| 107 | +./run_openfhe_interactive_delta.sh |
| 108 | +``` |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +## Comparing Plaintext vs OpenFHE |
| 113 | + |
| 114 | +Create a custom comparison script: |
| 115 | + |
| 116 | +```bash |
| 117 | +nano compare_openfhe.slurm |
| 118 | +``` |
| 119 | + |
| 120 | +Add this content: |
| 121 | + |
| 122 | +```bash |
| 123 | +#!/bin/bash |
| 124 | +#SBATCH --mem=32g |
| 125 | +#SBATCH --nodes=1 |
| 126 | +#SBATCH --ntasks-per-node=1 |
| 127 | +#SBATCH --cpus-per-task=16 |
| 128 | +#SBATCH --partition=cpu |
| 129 | +#SBATCH --account=YOUR_ACCOUNT |
| 130 | +#SBATCH --job-name=compare_openfhe |
| 131 | +#SBATCH --time=03:00:00 |
| 132 | +#SBATCH -e compare-%j.err |
| 133 | +#SBATCH -o compare-%j.out |
| 134 | + |
| 135 | +source $HOME/openfhe_env/bin/activate |
| 136 | +cd $HOME/fedgraph |
| 137 | + |
| 138 | +python3 << 'PYEOF' |
| 139 | +from fedgraph.federated_methods import run_NC |
| 140 | +from attridict import AttriDict |
| 141 | +
|
| 142 | +# Test 1: Plaintext |
| 143 | +print("\n" + "="*60) |
| 144 | +print("TEST 1: PLAINTEXT (Baseline)") |
| 145 | +print("="*60) |
| 146 | +config = { |
| 147 | + "fedgraph_task": "NC", |
| 148 | + "method": "FedGCN", |
| 149 | + "use_encryption": False, |
| 150 | + "dataset": "cora", |
| 151 | + "num_trainers": 3, |
| 152 | + "num_rounds": 10, |
| 153 | + "seed": 42, |
| 154 | +} |
| 155 | +run_NC(AttriDict(config)) |
| 156 | +
|
| 157 | +# Test 2: OpenFHE |
| 158 | +print("\n" + "="*60) |
| 159 | +print("TEST 2: OPENFHE (Two-Party Threshold HE)") |
| 160 | +print("="*60) |
| 161 | +config["use_encryption"] = True |
| 162 | +config["he_backend"] = "openfhe" |
| 163 | +run_NC(AttriDict(config)) |
| 164 | +
|
| 165 | +print("\n" + "="*60) |
| 166 | +print("COMPARISON COMPLETE") |
| 167 | +print("Check the output above for accuracy difference") |
| 168 | +print("Expected: < 1% difference") |
| 169 | +print("="*60) |
| 170 | +PYEOF |
| 171 | +``` |
| 172 | + |
| 173 | +Then submit: |
| 174 | + |
| 175 | +```bash |
| 176 | +sbatch compare_openfhe.slurm |
| 177 | +``` |
| 178 | + |
| 179 | +--- |
| 180 | + |
| 181 | +## File System Usage |
| 182 | + |
| 183 | +Following Delta best practices: |
| 184 | + |
| 185 | +- **HOME** (`$HOME`): Store scripts, environments, small files |
| 186 | +- **SCRATCH** (`$SCRATCH`): Store datasets, temporary outputs |
| 187 | +- **WORK/PROJECTS** (`$WORK`): Store results, checkpoints |
| 188 | + |
| 189 | +Example directory structure: |
| 190 | + |
| 191 | +```bash |
| 192 | +$HOME/ |
| 193 | + ├── openfhe_env/ # Python virtual environment |
| 194 | + └── fedgraph/ # Git repository |
| 195 | + |
| 196 | +$SCRATCH/ |
| 197 | + └── fedgraph_results/ # Training outputs, logs |
| 198 | +``` |
| 199 | + |
| 200 | +--- |
| 201 | + |
| 202 | +## GPU Version (Optional) |
| 203 | + |
| 204 | +To use GPU nodes for faster training: |
| 205 | + |
| 206 | +```bash |
| 207 | +#!/bin/bash |
| 208 | +#SBATCH --mem=64g |
| 209 | +#SBATCH --nodes=1 |
| 210 | +#SBATCH --ntasks-per-node=1 |
| 211 | +#SBATCH --cpus-per-task=16 |
| 212 | +#SBATCH --partition=gpuA40x4 # A40 GPU partition |
| 213 | +#SBATCH --account=YOUR_ACCOUNT |
| 214 | +#SBATCH --gpus-per-node=1 |
| 215 | +#SBATCH --job-name=openfhe_nc_gpu |
| 216 | +#SBATCH --time=01:00:00 |
| 217 | +#SBATCH -e openfhe-gpu-%j.err |
| 218 | +#SBATCH -o openfhe-gpu-%j.out |
| 219 | + |
| 220 | +module reset |
| 221 | +module load python/3.11 |
| 222 | +module load cuda/11.8 # or appropriate CUDA version |
| 223 | + |
| 224 | +source $HOME/openfhe_env/bin/activate |
| 225 | +cd $HOME/fedgraph/tutorials |
| 226 | + |
| 227 | +# Run with GPU support |
| 228 | +python FGL_NC_HE.py |
| 229 | +``` |
| 230 | + |
| 231 | +--- |
| 232 | + |
| 233 | +## Troubleshooting |
| 234 | + |
| 235 | +### Issue: GLIBC version error |
| 236 | + |
| 237 | +```bash |
| 238 | +# Check GLIBC version on compute node |
| 239 | +srun --account=YOUR_ACCOUNT --partition=cpu-interactive \ |
| 240 | + --nodes=1 --cpus-per-task=1 --mem=4g --time=00:05:00 \ |
| 241 | + ldd --version |
| 242 | +``` |
| 243 | + |
| 244 | +**Expected:** GLIBC 2.31+ (Delta has RedHat 8.4) |
| 245 | +**Required:** GLIBC 2.29+ for OpenFHE 1.2.3 |
| 246 | + |
| 247 | +If version is too old, try: |
| 248 | +```bash |
| 249 | +pip install openfhe==1.1.0 # Earlier version |
| 250 | +``` |
| 251 | + |
| 252 | +### Issue: Module not found |
| 253 | + |
| 254 | +```bash |
| 255 | +# Check available Python modules |
| 256 | +module avail python |
| 257 | + |
| 258 | +# Try different Python version |
| 259 | +module load python/3.12 |
| 260 | +``` |
| 261 | + |
| 262 | +### Issue: Out of memory |
| 263 | + |
| 264 | +Increase memory in SBATCH directive: |
| 265 | +```bash |
| 266 | +#SBATCH --mem=64g # instead of 32g |
| 267 | +``` |
| 268 | + |
| 269 | +### Issue: Job pending for long time |
| 270 | + |
| 271 | +```bash |
| 272 | +# Check why job is pending |
| 273 | +squeue -u $USER -l |
| 274 | + |
| 275 | +# Try interactive partition for testing |
| 276 | +#SBATCH --partition=cpu-interactive |
| 277 | + |
| 278 | +# Or use preempt partition (cheaper, may be interrupted) |
| 279 | +#SBATCH --partition=cpu-preempt |
| 280 | +``` |
| 281 | + |
| 282 | +--- |
| 283 | + |
| 284 | +## Expected Resource Usage |
| 285 | + |
| 286 | +Based on the FedGCN NC implementation: |
| 287 | + |
| 288 | +| Resource | Cora Dataset | Citeseer | Pubmed | |
| 289 | +|----------|--------------|----------|--------| |
| 290 | +| **Memory** | ~16GB | ~20GB | ~32GB | |
| 291 | +| **Cores** | 8-16 | 8-16 | 16-32 | |
| 292 | +| **Time** | ~30 min | ~45 min | ~90 min | |
| 293 | +| **Storage** | ~2GB | ~3GB | ~5GB | |
| 294 | + |
| 295 | +--- |
| 296 | + |
| 297 | +## Monitoring Your Job |
| 298 | + |
| 299 | +### While job is running: |
| 300 | + |
| 301 | +```bash |
| 302 | +# SSH to compute node |
| 303 | +squeue -u $USER # Get node name |
| 304 | +ssh NODE_NAME # e.g., ssh cn042 |
| 305 | + |
| 306 | +# Once on node: |
| 307 | +top -u $USER |
| 308 | +htop |
| 309 | +nvidia-smi # if using GPU |
| 310 | +``` |
| 311 | + |
| 312 | +### Check output in real-time: |
| 313 | + |
| 314 | +```bash |
| 315 | +# Get job ID |
| 316 | +JOBID=$(squeue -u $USER -h -o %i | head -1) |
| 317 | + |
| 318 | +# Tail output |
| 319 | +tail -f openfhe-${JOBID}.out |
| 320 | + |
| 321 | +# Or use watch |
| 322 | +watch -n 5 tail -20 openfhe-${JOBID}.out |
| 323 | +``` |
| 324 | + |
| 325 | +--- |
| 326 | + |
| 327 | +## Batch Job Array (Multiple Experiments) |
| 328 | + |
| 329 | +To run multiple configurations: |
| 330 | + |
| 331 | +```bash |
| 332 | +#!/bin/bash |
| 333 | +#SBATCH --array=0-4 # 5 jobs |
| 334 | +#SBATCH --mem=32g |
| 335 | +#SBATCH --nodes=1 |
| 336 | +#SBATCH --cpus-per-task=16 |
| 337 | +#SBATCH --partition=cpu |
| 338 | +#SBATCH --account=YOUR_ACCOUNT |
| 339 | +#SBATCH --time=02:00:00 |
| 340 | +#SBATCH -e array-%A_%a.err # %A=job ID, %a=array index |
| 341 | +#SBATCH -o array-%A_%a.out |
| 342 | + |
| 343 | +# Define datasets |
| 344 | +DATASETS=("cora" "citeseer" "pubmed" "cora" "citeseer") |
| 345 | +HE_BACKENDS=("openfhe" "openfhe" "openfhe" "tenseal" "tenseal") |
| 346 | + |
| 347 | +DATASET=${DATASETS[$SLURM_ARRAY_TASK_ID]} |
| 348 | +HE_BACKEND=${HE_BACKENDS[$SLURM_ARRAY_TASK_ID]} |
| 349 | + |
| 350 | +echo "Running: Dataset=$DATASET, Backend=$HE_BACKEND" |
| 351 | + |
| 352 | +source $HOME/openfhe_env/bin/activate |
| 353 | +cd $HOME/fedgraph |
| 354 | + |
| 355 | +python3 << PYEOF |
| 356 | +from fedgraph.federated_methods import run_NC |
| 357 | +from attridict import AttriDict |
| 358 | +
|
| 359 | +config = { |
| 360 | + "fedgraph_task": "NC", |
| 361 | + "method": "FedGCN", |
| 362 | + "use_encryption": True, |
| 363 | + "he_backend": "$HE_BACKEND", |
| 364 | + "dataset": "$DATASET", |
| 365 | + "num_trainers": 3, |
| 366 | + "num_rounds": 10, |
| 367 | + "seed": 42, |
| 368 | +} |
| 369 | +run_NC(AttriDict(config)) |
| 370 | +PYEOF |
| 371 | +``` |
| 372 | + |
| 373 | +Submit array job: |
| 374 | +```bash |
| 375 | +sbatch array_experiment.slurm |
| 376 | +``` |
| 377 | + |
| 378 | +--- |
| 379 | + |
| 380 | +## Next Steps |
| 381 | + |
| 382 | +1. **First time setup**: Run interactive session to verify everything works |
| 383 | +2. **Single experiment**: Use `run_openfhe_delta.slurm` for single runs |
| 384 | +3. **Comparisons**: Use custom scripts to compare plaintext vs OpenFHE |
| 385 | +4. **Production**: Use batch arrays for multiple experiments |
| 386 | + |
| 387 | +**See also:** |
| 388 | +- `README_OPENFHE.md` - Implementation details |
| 389 | +- `OPENFHE_NC_IMPLEMENTATION.md` - Technical documentation |
| 390 | +- Delta docs: https://docs.ncsa.illinois.edu/systems/delta/ |
| 391 | + |
0 commit comments