Below, we describe the supported file formats. Aside from these formats, ftio and predictor support ZMQ as described here, which avoids creating intermediate files. Furthermore, ftio also provides an API to different tools (e.g., GekkoFS) and allows easy and direct use of the Python functions as described and demonstrated here.
The currently supported file Formats are:
All units in the files are by default SI units.
ftio supports JSON files as an input. These files can be either custom-generated or from TMIO. Below, we explain both and provide links to some examples.
ftio supports custom JSON files that can be either at the rank, node, or application level. Bellow we present JSON Files with rank- or application--level metrics.
Note
ftio work on the application level as mentioned here. Internally the tool overlaps
rank-level metrics to obtain application-level metrics. If the metrics are not at the application-level (i.e., if the timestamps are not sorted in an increasing order), the syntax for rank-level metrics should be used.
The JSON file provided to ftio should have the following structure:
{
"write_sync":{
"total_bytes": 1.68e+09,
"number_of_ranks": 4,
"bandwidth": {
"b_rank_avr": [
18854940.934542, 35493987.414814, 26656768.595012, 20879002.716997,
28234996.909554, 21386514.858547, 26588927.552098, 18408319.012778,
66591429.795722, 64718227.754890, 20303409.879130, 39347431.382128,
21904017.847817, 26657866.416951, 38344190.339886, 67774121.679019,
24178995.499522, 24457899.521490, 77675298.475351, 71081178.788930,
69537332.852923, 72595355.489592, 20328465.984519, 22970008.962767,
32817028.455917, 18802080.851357, 38077100.664442, 27360825.411258,
39622620.596783, 35450821.093752, 68597882.969851, 34332203.448823],
"t_rank_s": [
0.060111, 2.920865, 5.801426, 8.470562,
11.072364, 13.558426, 16.081736, 18.744577,
0.060017, 2.920780, 5.801340, 8.470471,
11.072274, 13.558340, 16.081650, 18.744492,
0.059990, 2.920736, 5.801290, 8.470427,
11.072234, 13.558294, 16.081607, 18.744447,
0.060026, 2.920790, 5.801344, 8.470485,
11.072288, 13.558349, 16.081659, 18.744501],
"t_rank_e": [
2.840750, 4.397983, 7.768236, 10.981639,
12.929237, 16.009915, 18.053565, 21.592681,
0.847337, 3.730889, 8.383606, 9.802929,
13.465844, 15.525069, 17.448971, 19.518073,
2.228352, 5.064370, 6.476264, 9.208017,
11.826200, 14.280500, 18.660690, 21.026937,
1.657636, 5.709247, 7.178256, 10.386685,
12.395491, 15.037266, 16.845951, 20.271603]
}
}
}The associated example is provided here, which can be simply provided to ftio using:
ftio custom.json The -mflags allows to pick the desired mode from the trace. In case the JSON file only contains a single mode (as in this example), the -m flag is not strictly needed:
# Explicitly specify the mode
ftio custom.json -m write_syncftio automatically detects the source of the JSON file. To skip this test, the source can be specified with the -s|--source flag, that is, for tmio -s "tmio" or custom -s "custom".
Several fields in the above example are self-explanatory. The metrics with _rank_in their names represent the rank-level metrics. As FTIO operates on the application level, these metrics are internally overlapped.
The application-level metrics can also be provided directly. As example
custom_app.json shows, the previous three rank-level metrics for the bandwidth (i.e., *_rank_*), are simply replaced by two metrics b_overlap_avr and t_overlap:
{
"write_sync":{
"total_bytes": 1.68e+09,
"number_of_ranks": 4,
"bandwidth": {
"b_overlap_avr": [1000000000,2000000000,5000000000,2000000000,1000000000,0],
"t_overlap": [1, 2, 4, 6, 7, 8]
}
}
}With:
b_overlap_avrrepresenting the bandwidth at the application-levelt_overlaprepresenting the time when new values for the application-level bandwidth are attained. This means for the above example, that the bandwidth at time 1 s was 1 GB, at 2 s changed to 2 GB, and so on.
Several fields shown in the JSON Files with rank- or application--level metrics are optional. In a simpler form, a JSON file only needs the field bandwidth, and thus can have the following form:
{
"bandwidth": {
"b_rank_avr": [
18854940.934542, 35493987.414814, 26656768.595012, 20879002.716997,
28234996.909554, 21386514.858547, 26588927.552098, 18408319.012778,
66591429.795722, 64718227.754890, 20303409.879130, 39347431.382128,
21904017.847817, 26657866.416951, 38344190.339886, 67774121.679019,
24178995.499522, 24457899.521490, 77675298.475351, 71081178.788930,
69537332.852923, 72595355.489592, 20328465.984519, 22970008.962767,
32817028.455917, 18802080.851357, 38077100.664442, 27360825.411258,
39622620.596783, 35450821.093752, 68597882.969851, 34332203.448823],
"t_rank_s": [
0.060111, 2.920865, 5.801426, 8.470562,
11.072364, 13.558426, 16.081736, 18.744577,
0.060017, 2.920780, 5.801340, 8.470471,
11.072274, 13.558340, 16.081650, 18.744492,
0.059990, 2.920736, 5.801290, 8.470427,
11.072234, 13.558294, 16.081607, 18.744447,
0.060026, 2.920790, 5.801344, 8.470485,
11.072288, 13.558349, 16.081659, 18.744501],
"t_rank_e": [
2.840750, 4.397983, 7.768236, 10.981639,
12.929237, 16.009915, 18.053565, 21.592681,
0.847337, 3.730889, 8.383606, 9.802929,
13.465844, 15.525069, 17.448971, 19.518073,
2.228352, 5.064370, 6.476264, 9.208017,
11.826200, 14.280500, 18.660690, 21.026937,
1.657636, 5.709247, 7.178256, 10.386685,
12.395491, 15.037266, 16.845951, 20.271603]
}
}In case the bandwidth filed is omitted, and only two vectors are provided, ftio assumes these metrics are at the application level. Furthermore, it searches for the first match of b* and t* to map b_overlap_avr and t_overlap, respectively. Hence, this form also works with ftio:
{
"b": [1000000000,2000000000,5000000000,2000000000,1000000000,0],
"t": [1, 2, 4, 6, 7, 8]
}Alternatively, the JSON files can be loaded into ftiomanually, and the required fields can be passed through the API.
In the offline mode of TMIO, a JSON file is generated that contains 7 fields. Besides the standard fields: read sync, write sync, read async, and write async*, TMIO also generated fields for the required bandwidth in case of asynchronous I/O. Furthermore, the I/O time is also captured. Consequently, a JSON file has the following structure:
{
"read_sync":{
"total_bytes": 1.34e+09,
"max_bytes_per_rank": 1.68e+08,
"max_bytes_per_phase": 2.10e+06,
"max_io_phases_per_rank": 80,
"total_io_phases": 640,
"max_io_ops_in_phase": 1,
"max_io_ops_per_rank": 80,
"total_io_ops": 640,
"number_of_ranks": 8,
"bandwidth": {
"weighted_harmonic_mean": 1.20e+09,
"harmonic_mean": 1.20e+09,
"arithmetic_mean": 1.31e+09,
"median": 1.18e+09,
"max": 5.28e+09,
"min": 7.35e+08,
"b_rank_avr": [1582905557.128160, 1338049664.395335, 1216357988.172627, 1263707703.960846, 1183417554.729751, 1149204355.590840, 1213748335.340853, 1386897971.519520, ...],
"b_rank_sum": [1582905557.128160, 1338049664.395335, 1216357988.172627, 1263707703.960846, 1183417554.729751, 1149204355.590840, 1213748335.340853, 1386897971.519520, ...],
"t_rank_s": [ 2.575119, 2.576482, 2.578066, 2.579800, 2.581467, 2.583249, 2.585093, 2.586831, ...],
"t_rank_e": [2.576444, 2.578050, 2.579791, 2.581460, 2.583239, 2.585074, 2.586821, 2.588343, ...],
⋮
}},
"read_async_t":{...},
"read_async_b":{...},
"write_async_t":{...},
"write_async_b":{...},
"write_sync":{...},
"io_time":{"delta_t_agg": 1.58e+02,"delta_t_agg_io": 1.43e+02, ...}
}An example with 8 ranks is located here
TBD
384.msgpack provides an example with 384 ranks and IOR. This file was generated using TMIO (online or offline).
TBD
ftio supports standard Darshan files. By default, ftio first tries to read the DXT trace.
The DXT mode can be specified with the flag -x DXT_MODE, where DXT_MODE is either DXT_POSIX or DXT_MPIIO (default). If the file does not contain a DXT trace, ftio tires to read the heat map. In both cases, ftiouses pydarshan to read the file.
As the collected values are per rank level, ftio internally overlaps them to obtain the application-level bandwidth. An example trace can be downloaded from the website: https://hpcioanalysis.zdv.uni-mainz.de/. For example, Nek5000 with 2048 ranks executed on the Mogon II cluster. After downloading, rename the file to nek_2048.darshan. ftio can now be called on the complete trace via:
ftio nek_2048.darshanpass the -e no flag to avoid generating plots and just directly obtaining the result from ftio on the command line:
ftio nek_2048.darshan -e noTo limit the time window to 56,000 s, pass the -te 56000 flag as follows:
ftio nek_2048.darshan -te 56000 -e noSimply specify the folder where the traces are located:
ftio folderftio supports parsing custom file formats using regex. These values can be scaled in case they are not in SI units.
The file must currently have the txtextension.
For that, ftio provides two dictionaries that must be provided (pattern and translate) in a custom file similar to the convert function.
- pattern (dict[str, str]): dictionary containing the name and a regex expression to find the custom pattern.
- translate (dict[str, tuple[str, (optional)float]]): dictionary containing matching filed from sample.py and the matching name from the pattern. The unit can be optionally specified
An example is provided in example/txt. Navigate to this folder. There, a file called custom_input.py is located and contains the following:
pattern = {
"avg_thruput_mib": r"avg_thruput_mib:\s+\[([\d.\d,\s]+)\]",
"end_t_micro": r"end_t_micro:\s+\[([\d,\s]+)\]",
"start_t_micro": r"start_t_micro:\s+\[([\d,\s]+)\]",
"total_bytes": r"total_bytes:\s+(\d+)",
"total_iops": r"total_iops:\s+(\d+)",
}
# Define map according to sample.py class, along with the scale if any:
# ftio_field: ("custom_name", scale)
# ftio unit are default in bytes, b/s, ...
# scale applies ftio_field = custom_name*scale
translate = {
"bandwidth": {
"b_rank_avr": ("avg_thruput_mib",1.07*1e+6),
"t_rank_e": ("end_t_micro", 1e-3),
"t_rank_s": ("start_t_micro", 1e-3)
},
"total_bytes": "total_bytes",
"max_io_ops_per_rank": "total_iops"
}The fields in translate must match the fields in sample.py. The field bandwidth as its three fields (b_rank_avr, t_rank_e, and t_rank_s) are mandatory. Though these fields are indexed with _rank_, they can be on any level higher than rank. Note that in case the application-level bandwidth is provided, t_rank_e is not needed.
Other supported fields include (from sample.py):
# cutout from /ftio/parse/sample.py
⋮
def __init__(self, values, io_type, args):
self.type # 'read_sync', 'read_async_t', 'write_async_t','write_sync'
self.max_bytes_per_rank # maximum bytes transferred per rank per phase
self.max_bytes_phase # maximum bytes transferred per rank during all phases
self.total_bytes # total transferred bytes
self.max_io_phases_per_rank # maximum I/O phases
self.total_io_phases # total I/O phases
self.max_io_ops_per_rank # maximum I/O operations per rank
self.max_io_ops_in_phase # maximum I/O operation per phase
self.total_io_ops # Total I/O operations
self.number_of_ranks # number of ranks that did I/O
self.bandwidth # Dictionary containing the fileds b_rank_avr, t_rank_e, and t_rank_s
⋮Next, the fields indicated by custom_input.py match the fields in the file 2.txt, which contains, for example:
avg_thruput_mib: [0.0,0.0,1000.0,1000.0,0.0,0.0,1000.0,1000.0,0.0,0.0,1000.0,1000.0,0.0,0.0]
start_t_micro: [0500,0000,10500,10000,20500,20000,30500,30000,40500,40000,50500,50000,60500,60000]
end_t_micro: [5000,4500,15000,14500,25000,24500,35000,34500,45000,44500,55000,54500,65000,64500]
hostname: "XXX"
pid: 2063022
total_bytes: 15000
total_iops: 1024
Finally, ftio can be executed with the -cf flag pointing to the location of custom_input.py.
For this example, all files are in the current working directory example/txt.
Thus, ftio can be simply executed by:
ftio 2.txt -cf custom_input.py