-
Notifications
You must be signed in to change notification settings - Fork 1
5. File formats
This page summarizes the file formats used in OSP as inputs and outputs. Note that some example input files are available in the /data folder.
OSP allows the computational DAGs to be read in
This is a custom file format in OSP which represents every computational DAGs as a hyperDAG [1]. That is, for each non-sink vertex
The first few lines of these files can be comment lines, starting with '%'.
The first non-comment line contains three integers '
This is followed by
This is followed by
This is followed by
In .dot files, the computations are described as DAGs. This format allows to list the properties of each vertex and edge with human-readable labels. Some example files in this format can be found in the /data/dot folder.
The first line here contains 'digraph G {', and the last line contains '}'. Between these, we first have
0[work_weight="5";comm_weight="4";mem_weight="3";type="0";];
Here the first integer (
We then have a separate line for each directed edge of the DAG. An example for an edge description line is:
1->2 [comm_weight="1";];
which specifies that there is a directed edge from vertex
Finally, OSP can also construct computational DAGs from a sparse lower triangular matrix, interpreting each row as a vertex, and each non-zero entry as an edge. This coincides with the natural DAG representation of the scheduling problems for SpTRSV tasks [4].
The sparse matrix files are expected in a standard matrixmarket format. They might begin with several comment lines, each starting with the character '%'. The first non-comment line contains three integers: the number of rows and columns in the matrix (which must be equal in our case, let us denote it by
The second vital part of the input is the architecture or machine model for the scheduling; we often used the .arch extension for these files.
The first line of the file must contain 3-6 on-negative integers, separated by spaces. The first three integers are the main parameters of the BSP model: 1) the number of available processing units
If the first line contains 6 numbers, and the last number is 1, then we have a heterogeneous problem instance. In this case, the next line must contain
Finally, the next
The file might also have comment lines between the different blocks, starting with the '%' character; these lines are ignored.
OSP can output the BSP schedule into a text file if desired.
The first line of this text file starts with '%%', and textually describes the number of processors and supersteps.
This is followed by a line with 4 distinct integers, separated by spaces. The first three of these numbers is the number of vertices
This is followed by
If we have an explicit communication schedule, then this is followed by several further lines, each with four different integers '
Note that there are also several other problem variants in OSP.
In pebbling problems, the input is the same as in a scheduling problem. The output file here is a textual description of the schedules, describing which vertices are computed, loaded and saved in each step.
In hypergraph partitioning problems, the input can be read in several different ways. If the input is a computational DAG file with '.hdag' extension, then a hypergraph is created from its hyperDAG interpretation [1]. There is also an alternative read function for computational DAGs from '.hdag' files, which instead interprets each directed edge as a hyperedge of size 2. Finally, one can also read a sparse matrix from an '.mtx' file, and form a hypergraph from this based on the fine-grained model [6].
The output in partitioning problems is again a text file. The first line of this is a comment line starting with '%%', specifying the number of parts in text. This is followed by
- [1] Pal Andras Papp, Georg Anegg, and Albert-Jan N. Yzelman. Partitioning hypergraphs is hard: Models, inapproximability, and applications. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 415–425, 2023. (full preprint)
- [2] Pal Andras Papp, Georg Anegg, Aikaterini Karanasiou, and Albert-Jan N. Yzelman. Efficient multiprocessor scheduling in increasingly realistic models. In Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 463–474, 2024. (full preprint)
- [3] HyperDAG_DB: database of computational DAGs in hyperDAG format, available on GitHub.
- [4] Toni Boehnlein, Pal Andras Papp, Raphael S. Steiner, Christos K. Matzoros, and Albert-Jan N. Yzelman. Efficient parallel scheduling for sparse triangular solvers. In 2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2025. (full preprint)
- [5] Pal Andras Papp, Georg Anegg, and Albert-Jan N. Yzelman. DAG scheduling in the BSP model. In Proceedings of the 50th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), pages 238–253, 2025. (full preprint)
- [6] Timon E. Knigge, and Rob H. Bisseling. An improved exact algorithm and an NP-completeness proof for sparse matrix bipartitioning. In Parallel Computing 96, 102640. 2020.