From ddd126d41e7d18f6a69e7f403d4f3f7d3c27dd45 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Tessier?= <57344436+hephtaicie@users.noreply.github.com> Date: Mon, 2 Feb 2026 14:16:42 +0100 Subject: [PATCH] Update storage_systems_design.md --- collections/_projects/storage_systems_design.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/collections/_projects/storage_systems_design.md b/collections/_projects/storage_systems_design.md index cf6024c1..27a486a6 100644 --- a/collections/_projects/storage_systems_design.md +++ b/collections/_projects/storage_systems_design.md @@ -2,7 +2,7 @@ layout: post title: Leverage Blue Waters I/O Traces to Size and Allocate Future Storage Systems date: 2023-04-26 -updated: 2025-01-31 +updated: 2026-02-02 navbar: Research subnavbar: Projects project_url: @@ -24,7 +24,7 @@ In recent years, to address the growing gap between computing power and I/O band This profusion of different technologies and architectures makes the use of these resources complex and their sizing at the machine design stage risky. One approach to partially solving these problems is to model supercomputers equipped with several levels of storage, coupled with the simulation of the scheduling of an execution history of large-scale I/O intensive applications. This type of simulation allows us to observe the behavior of storage tiers in the face of real-world workloads. Recently, following the decommissioning of the machine, several years of execution traces (including I/O traces) of applications that ran on Blue Waters have been made public. This mass of information is invaluable to feed simulations and study the architecture of modern storage systems. -In this JLESC project, we propose to analyze Darshan traces and Lustre metrics from several years of Blue Waters production to feed StorAlloc {% cite monniot:hal-03683568 --file external/storage_systems_design.bib %}, a simulator of a storage-aware job scheduler developed within the Inria KerData’s team. The goal of work is twofold: to provide a post-mortem study on the sizing of Blue Waters’ storage system and to explore the design of future highly storage-disaggregated HPC systems. +In this JLESC project, we propose to analyze Darshan traces and Lustre metrics from several years of Blue Waters production to feed StorAlloc {% cite monniot:hal-03683568 --file external/storage_systems_design.bib %} and FIVES {% cite monniotEtAl2024 --file jlesc.bib %}, two simulators of a storage systems developed within the Inria KerData’s team. The goal of work is twofold: to provide a post-mortem study on the sizing of Blue Waters’ storage system and to explore the design of future highly storage-disaggregated HPC systems. ## Results for 2023/2024 We introduce Fives, a storage system simulator based on WRENCH and SimGrid, two simulation frameworks in the field. Fives, currently under development, is capable of reproducing the behavior of a Lustre file system. Using Darshan execution traces to both calibrate and validate the simulator, Fives can extract a number of metrics and correlation indices demonstrating a reasonable level of accuracy between real and simulated I/O times. The traces currently used in Fives come from machines for which only aggregated Darshan traces are publicly available. We are currently working on using Blue Waters traces to feed our simulator. @@ -38,7 +38,15 @@ A conference paper presenting Fives {% cite monniotEtAl2024 --file jlesc.bib %}, We also introduced MOSAIC {% cite jolivelEtAl2024 --file jlesc.bib %}, an approach to categorize execution traces and give information about the general behavior of applications from an I/O perspective. we analyze a full year of I/O execution traces of Blue Waters from which, we determine a set of non-exclusive categories to describe the I/O behavior of jobs, including the temporality and the periodicity of the accesses and the metadata overhead. This paper has been accepted in the SC'24 PDSW workshop. This work is currently being pursued, with several lines of research focusing in particular on automating the clustering of I/O operations. -Finally, still using Blue Waters traces among others, we proposed an in-depth study of access temporality on large-scale storage systems. This work has been accepted at IPDPS 2025 {% cite boitoEtAl2024 --file jlesc.bib %}. +Finally, still using Blue Waters traces among others, we proposed an in-depth study of access temporality on large-scale storage systems. This work has been accepted at IPDPS 2025 {% cite boitoEtAl2025 --file jlesc.bib %}. + +## Results for 2025/2026 + +We presented the work accepted at IPDPS 2025 on the study of I/O temporality in datasets of I/O traces, including Blue Waters' {% cite boitoEtAl2025 --file jlesc.bib %}. A follow-up to this work is in preparation. + +We continued work on MOSAIC to extend I/O pattern detection, in particular through improved periodicity detection, a reinforced clustering algorithm, and the consideration of file temperature (as a function of access frequency). This work has been submitted {% cite jolivelEtAl2025 --file jlesc.bib %}. + +Work on Blue Waters traces was presented several times in 2025. This dataset is now also being used to feed into a pattern-driven I/O benchmark project. ## References {% bibliography --cited --file jlesc.bib %}