Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions collections/_projects/storage_systems_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: post
title: Leverage Blue Waters I/O Traces to Size and Allocate Future Storage Systems
date: 2023-04-26
updated: 2025-01-31
updated: 2026-02-02
navbar: Research
subnavbar: Projects
project_url:
Expand All @@ -24,7 +24,7 @@ In recent years, to address the growing gap between computing power and I/O band

This profusion of different technologies and architectures makes the use of these resources complex and their sizing at the machine design stage risky. One approach to partially solving these problems is to model supercomputers equipped with several levels of storage, coupled with the simulation of the scheduling of an execution history of large-scale I/O intensive applications. This type of simulation allows us to observe the behavior of storage tiers in the face of real-world workloads. Recently, following the decommissioning of the machine, several years of execution traces (including I/O traces) of applications that ran on Blue Waters have been made public. This mass of information is invaluable to feed simulations and study the architecture of modern storage systems.

In this JLESC project, we propose to analyze Darshan traces and Lustre metrics from several years of Blue Waters production to feed StorAlloc {% cite monniot:hal-03683568 --file external/storage_systems_design.bib %}, a simulator of a storage-aware job scheduler developed within the Inria KerData’s team. The goal of work is twofold: to provide a post-mortem study on the sizing of Blue Waters’ storage system and to explore the design of future highly storage-disaggregated HPC systems.
In this JLESC project, we propose to analyze Darshan traces and Lustre metrics from several years of Blue Waters production to feed StorAlloc {% cite monniot:hal-03683568 --file external/storage_systems_design.bib %} and FIVES {% cite monniotEtAl2024 --file jlesc.bib %}, two simulators of a storage systems developed within the Inria KerData’s team. The goal of work is twofold: to provide a post-mortem study on the sizing of Blue Waters’ storage system and to explore the design of future highly storage-disaggregated HPC systems.

## Results for 2023/2024
We introduce Fives, a storage system simulator based on WRENCH and SimGrid, two simulation frameworks in the field. Fives, currently under development, is capable of reproducing the behavior of a Lustre file system. Using Darshan execution traces to both calibrate and validate the simulator, Fives can extract a number of metrics and correlation indices demonstrating a reasonable level of accuracy between real and simulated I/O times. The traces currently used in Fives come from machines for which only aggregated Darshan traces are publicly available. We are currently working on using Blue Waters traces to feed our simulator.
Expand All @@ -40,6 +40,14 @@ We also introduced MOSAIC {% cite jolivelEtAl2024 --file jlesc.bib %}, an approa

Finally, still using Blue Waters traces among others, we proposed an in-depth study of access temporality on large-scale storage systems. This work has been accepted at IPDPS 2025 {% cite boitoEtAl2025 --file jlesc.bib %}.

## Results for 2025/2026

We presented the work accepted at IPDPS 2025 on the study of I/O temporality in datasets of I/O traces, including Blue Waters' {% cite boitoEtAl2025 --file jlesc.bib %}. A follow-up to this work is in preparation.

We continued work on MOSAIC to extend I/O pattern detection, in particular through improved periodicity detection, a reinforced clustering algorithm, and the consideration of file temperature (as a function of access frequency). This work has been submitted {% cite jolivelEtAl2025 --file jlesc.bib %}.

Work on Blue Waters traces was presented several times in 2025. This dataset is now also being used to feed into a pattern-driven I/O benchmark project.

## References
{% bibliography --cited --file jlesc.bib %}

Expand Down
Loading