-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathstorage.qmd
More file actions
311 lines (226 loc) · 8.81 KB
/
storage.qmd
File metadata and controls
311 lines (226 loc) · 8.81 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
---
title: "Session 3: Storage on Aire"
subtitle: "Understanding File Systems and Data Management"
format: html
---
# Session content
## Session aims
By the end of this session, you will be able to:
- Distinguish between different storage areas and their purposes
- Navigate between storage locations using environment variables
- Monitor your disk usage and quotas
- Transfer files between storage areas and your local machine
[**View Interactive Slides: Storage Systems on Aire**](storage-slides.qmd){.btn .btn-primary target="_blank"}
Scroll to the bottom of this section for practical exercises.
## Storage Areas Overview
The Aire HPC file system includes several special directories:
- **Home directory** (`/users/<username>`, env var `$HOME`) for personal files
- **Scratch directory** (`/mnt/scratch/<username>`, env var `$SCRATCH`) for large, temporary data
- **Flash (NVMe) scratch** (`$TMP_SHARED`, usually `/mnt/flash/tmp/job.<JOB-ID>`) for very fast I/O during jobs
- **Node-local scratch** (`$TMP_LOCAL` or `$TMPDIR`, typically `/tmp/job.<JOB-ID>`) for fast local storage on each compute node
::: {.callout-important}
## Critical Storage Rules
- **Home** is backed up and not automatically purged
- **Scratch and flash** are not backed up and flash is deleted after each job
- **Always** copy important results from temporary storage (flash or node-local) back to your home directory or another permanent area **before** the job ends
:::
## Storage Types and Quotas
| Storage Type | Quota | Backup | Auto-Delete | Best For |
|--------------|-------|--------|-------------|----------|
| **Home** (`$HOME`) | ~30 GB, 1M files | ✅ Yes | ❌ No | Scripts, configs, small files |
| **Scratch** (`$SCRATCH`) | ~1 TB, 500K files | ❌ No | ❌ No | Large datasets, job data |
| **Flash** (`$TMP_SHARED`) | ~1 TB per job | ❌ No | ✅ Yes | I/O-intensive tasks |
| **Local** (`$TMP_LOCAL`) | Node-specific | ❌ No | ✅ Yes | Single-node fast storage |
::: {.callout-warning}
## Data Loss Risk
Data in scratch/flash areas is temporary. These are **not backed up** and may be purged. Always move important data to your home or external storage when done.
:::
## Checking Disk Usage
### Using the `quota` command
Check your current usage and limits:
```bash
quota -s # Human readable format
```
Example output:
```
Disk quotas for user yourusername (uid 12345):
Filesystem blocks quota limit grace files quota limit
/users 15000* 30000 33000 200000 1000000 1100000
/mnt/scratch 100000 1000000 1100000 500000 1500000 1650000
```
### Using the `du` command
Check disk usage of directories:
```bash
du -hs * # Size of each directory
du -hs $HOME # Size of home directory
du -hs $SCRATCH # Size of scratch directory
```
::: {.callout-tip}
## Pro Tip
Run `du -hs *` in your scratch directory to see which subdirectories are taking up the most space. This can be slow for large directories!
:::
## Navigation Commands
Use standard Linux commands with environment variables:
```bash
cd $HOME # Go to home directory
cd $SCRATCH # Go to scratch directory
cd # Also goes to home
cd ~ # Also goes to home
ls $HOME # List home contents
ls $SCRATCH # List scratch contents
pwd # Show current directory
```
## File Transfer Methods
### Between Storage Areas on Aire
```bash
# Copy TO scratch
cp data.txt $SCRATCH/
# Copy FROM scratch back to home
cp $SCRATCH/output.dat $HOME/results/
# Copy entire directory
cp -r $HOME/myproject $SCRATCH/
```
### To/From Your Local Machine
#### Using scp (Secure Copy)
From local machine **to** Aire:
```bash
scp myfile.txt <username>@target-system:$SCRATCH/
```
From Aire **to** local machine:
```bash
scp <username>@target-system:/path/results.txt .
```
#### Using rsync (Recommended for large transfers)
```bash
rsync -avh data/ <username>@target-system:path/data/
```
Benefits of rsync:
- Only transfers changed files
- Resumes interrupted transfers
- Progress indicators with `--info=progress2`
#### Off-campus transfers (with jumphost)
```bash
# Using rsync
rsync -r --info=progress2 -e 'ssh -J username@jump-host' file.txt username@target-system:path/
# Using scp
scp -rq -J username@jump-host username@target-system:path/file.txt local-folder/
```
### Download from Internet
```bash
wget https://example.com/data.zip
curl -O https://example.com/data.zip
```
### GUI Tools
For those who prefer graphical interfaces:
- **FileZilla** (cross-platform SFTP client)
- **WinSCP** (Windows)
- **Cyberduck** (Mac/Windows)
## Best Practices Summary
::: {.callout-important}
## Key Storage Principles
1. **Multiple storage areas**: Use each storage type appropriately
2. **Regular cleanup**: Run `quota -s` regularly and clean up old files
3. **Data transfer workflow**: Transfer input to `$SCRATCH` before jobs, copy results back after
4. **Avoid data loss**: Always backup important data from temporary storage
5. **Organization**: Keep your home directory organized and place large data in scratch only when needed
:::
### Typical Workflow
1. **Prepare**: Upload input data to `$SCRATCH`
2. **Process**: Run jobs using scratch storage
3. **Preserve**: Copy important results to backed up research storage
4. **Cleanup**: Remove temporary files from `$SCRATCH`
# Exercises
Work through these hands-on exercises to practice storage management on Aire.
### Exercise 1: Explore Your Storage Environment
Check your storage locations and current usage:
```bash
# Check your environment variables and location
echo "Home: $HOME"
echo "Scratch: $SCRATCH"
pwd
# Check your disk quotas
quota -s
```
### Exercise 2: Create Organized Directory Structure
Set up a proper directory structure in both storage areas:
```bash
# Create project structure in home (for permanent files)
cd $HOME
mkdir hpc1-practice
cd hpc1-practice
mkdir scripts results
# Create working structure in scratch (for large/temporary data)
cd $SCRATCH
mkdir hpc1-data
cd hpc1-data
mkdir input output
ls -la
```
### Exercise 3: Practice File Operations
Create sample files and practice moving data between storage areas:
```bash
# Create a sample script in home
cd $HOME/hpc1-practice/scripts
nano process_data.sh
```
Add this content to the script:
```bash
#!/bin/bash
echo "Processing data in: $(pwd)"
echo "Available space in scratch:"
df -h $SCRATCH
```
Save with `Ctrl + O`, `Enter`, `Ctrl + X`, then continue:
```bash
# Make executable and test
chmod +x process_data.sh
./process_data.sh
# Create sample data in scratch
echo "sample,value1,value2" > $SCRATCH/hpc1-data/input/data.csv
echo "exp1,10.5,20.3" >> $SCRATCH/hpc1-data/input/data.csv
echo "exp2,8.9,22.1" >> $SCRATCH/hpc1-data/input/data.csv
# Copy important results back to home (simulate job completion)
cp $SCRATCH/hpc1-data/input/data.csv $HOME/hpc1-practice/results/
```
### Exercise 4: Monitor Usage and Clean Up
Practice disk usage monitoring and cleanup:
```bash
# Check sizes of your directories
du -hs $HOME/hpc1-practice
du -hs $SCRATCH/hpc1-data
# Check quota again to see any changes
quota -s
# Practice cleanup - remove temporary files from scratch
rm $SCRATCH/hpc1-data/input/data.csv
ls $SCRATCH/hpc1-data/input/
# Verify important data is safe in home
ls $HOME/hpc1-practice/results/
```
::: {.callout-tip}
## What You've Accomplished
- ✅ Explored storage locations and quotas
- ✅ Created organized directory structures
- ✅ Practiced file creation and copying between storage areas
- ✅ Monitored disk usage and performed cleanup
- ✅ Followed the recommended workflow: work in scratch, save to home
:::
---
# Summary
::: {.callout-note}
## Key Takeaways
- **Multiple storage areas** serve different purposes: home (permanent), scratch (temporary)
- **Understand quotas** and monitor usage with `quota -s` and `du -hs`
- **Use environment variables** like `$HOME` and `$SCRATCH` for navigation
- **Follow the workflow**: work in scratch, save important results to home
- **Data management is critical** - scratch areas are not backed up
- **Transfer tools** like `rsync` and `scp` help move data efficiently
:::
---
## Next Steps
Now you understand how to manage data on Aire! Let's move on to [Session 4: Modules and Software](modules-software.qmd) to learn how to access and use different software packages.
## Additional Resources
- [Aire Storage Documentation](https://arcdocs.leeds.ac.uk/aire/system/storage_filesystem.html)
- [Data Management Best Practices](https://arcdocs.leeds.ac.uk/aire/usage/file_data_management/best_practices.html)
- [File Transfer Guide](https://arcdocs.leeds.ac.uk/aire/usage/file_data_management/start.html)
- [Linux File Permissions Guide](https://www.linux.com/training-tutorials/understanding-linux-file-permissions/)
- [rsync Tutorial](https://www.digitalocean.com/community/tutorials/how-to-use-rsync-to-sync-local-and-remote-directories)