|
| 1 | +--- |
| 2 | +hide: |
| 3 | + - toc |
| 4 | +--- |
| 5 | + |
| 6 | +Within an HDF5 container, datasets may be stored using compact, chunked, or contiguous layouts. Objects are identified by slash-separated `(/)` paths. Non-leaf nodes are groups `(h5::gr_t)`, while terminal leaf nodes are datasets `(h5::ds_t)` and named types `(h5::dt_t)`. Groups, datasets, and named types may also carry attached attributes `(h5::att_t)`. At first glance, an HDF5 container resembles a regular file system, but with a richer and more specialized API. |
| 7 | + |
| 8 | +#### :material-view-module:{.icon} Chunked Layout and Partial I/O |
| 9 | +<img src="../assets/hdf5-dataset-layout-chunked.svg" alt="Chunked Layout" |
| 10 | + class="float-right w-2/5 border-2 border-solid border-red rounded-lg shadow-lg ml-4 transition-transform duration-500 hover:scale-110 bg-transparent"/> |
| 11 | + |
| 12 | +Chunked layout is the standard way to work efficiently with large datasets when full-array I/O is unnecessary. Instead of storing data as one contiguous region, the dataset is divided into fixed-size chunks, which enables partial reads and writes and allows the use of filters such as compression. In H5CPP, chunked storage is requested by adding `h5::chunk{...}` to the dataset creation property list. This implicitly selects `h5::layout_chunked`. |
| 13 | + |
| 14 | +```cpp |
| 15 | +h5::ds_t ds = h5::create<double>(fd, "dataset", ..., |
| 16 | + h5::chunk{4, 8} | h5::fill_value<double>{3} | h5::gzip{9}); |
| 17 | +``` |
| 18 | +
|
| 19 | +Here, `fd` is an open HDF5 file descriptor of type `h5::fd_t`, and the omitted arguments define the dataset extent. H5CPP supports practical partial I/O through hyperslab-style selections using `h5::offset{...}`, `h5::stride{...}`, and `h5::block{...}`. For example, let: `arma::mat M(20, 16);` To write `M` matrix into a larger dataset at a given offset, optionally with a stride, use: `h5::write(ds, M, h5::offset{4, 8}, h5::stride{2, 2});` H5CPP infers the memory address, datatype, and dimensions of supported objects and forwards them to the underlying HDF5 calls. When working with raw pointers, or with types not yet recognized by H5CPP, the logical shape must be specified explicitly with `h5::count{...}` as in `h5::write(ds, M.memptr(), h5::count{5, 10});` For common cases, dataset creation and full-object write can be expressed in a single call. The following creates a chunked dataset, applies a fill value and gzip compression, and writes the full contents of `M` matrix: |
| 20 | +
|
| 21 | +```cpp |
| 22 | +h5::write(fd, "dataset", M, h5::chunk{4, 8} | h5::fill_value<double>{3} | h5::gzip{9}); |
| 23 | +``` |
| 24 | +See the [examples](examples.md) section for complete working cases. |
| 25 | + |
| 26 | +#### :material-arrow-expand-horizontal:{.icon} Contiguous Layout and IO Access |
| 27 | +<img src="../assets/hdf5-dataset-layout-contiguous.svg" alt="Contiguous Layout" |
| 28 | + class="float-left w-2/5 border-2 border-solid border-red rounded-lg shadow-lg mr-4 transition-transform duration-500 hover:scale-110 bg-transparent"/> |
| 29 | + |
| 30 | +Contiguous layout is the simplest storage mode in HDF5. The dataset is stored as one continuous block on disk, which makes it a good fit when the entire object is typically written or read in a single operation. This layout works well for **small datasets** and for workloads that do not require compression, chunking, or partial I/O. It also keeps metadata overhead low, which can be beneficial when working with many small objects. When no filtering is requested and the object is written in one shot, H5CPP will generally choose this layout automatically. |
| 31 | + |
| 32 | +**Example:** the following call opens `arma.h5`, creates a dataset with the appropriate dimensions and datatype, and writes the full content of the vector in a single operation. |
| 33 | + |
| 34 | +```cpp |
| 35 | +arma::vec V({1.,2.,3.,4.,5.,6.,7.,8.}); |
| 36 | +h5::write("arma.h5", "one shot create write", V); |
| 37 | +``` |
| 38 | +
|
| 39 | +To request contiguous layout explicitly, pass the `h5::contiguous` flag in the dataset creation property list. |
| 40 | +
|
| 41 | +The resulting dataset layout is conceptually equivalent to: |
| 42 | +
|
| 43 | +```text |
| 44 | +DATASET "one shot create write" { |
| 45 | + DATATYPE H5T_IEEE_F64LE |
| 46 | + DATASPACE SIMPLE { ( 8 ) / ( 8 ) } |
| 47 | + STORAGE_LAYOUT { |
| 48 | + CONTIGUOUS |
| 49 | + SIZE 64 |
| 50 | + OFFSET 5888 |
| 51 | + } |
| 52 | + FILTERS { |
| 53 | + NONE |
| 54 | + } |
| 55 | + FILLVALUE { |
| 56 | + FILL_TIME H5D_FILL_TIME_IFSET |
| 57 | + VALUE H5D_FILL_VALUE_DEFAULT |
| 58 | + } |
| 59 | + ALLOCATION_TIME { |
| 60 | + H5D_ALLOC_TIME_LATE |
| 61 | + } |
| 62 | +} |
| 63 | +``` |
| 64 | + |
| 65 | +The trade-off is straightforward: contiguous layout is simple and efficient for full-object I/O, but it does not support chunk-based filtering or efficient partial access. For small, dense datasets, that is often exactly the right choice. |
| 66 | + |
| 67 | + |
| 68 | +#### :material-package-variant-closed:{.icon} Compact Layout |
| 69 | +<img src="../assets/hdf5-dataset-layout-compact.svg" alt="Compact Layout" |
| 70 | + class="float-right w-2/5 border-2 border-solid border-red rounded-lg shadow-lg ml-4 transition-transform duration-500 hover:scale-110 bg-transparent"/> |
| 71 | + |
| 72 | +Compact layout stores the entire dataset payload directly in the object header rather than in a separate data block. This minimizes indirection and can be very efficient for **small datasets** that are written and read as a whole. Because both metadata and data live together, access is simple and overhead is low. The trade-off is capacity and flexibility. Compact datasets are size-limited by the available object header space, so they are not suitable for larger arrays, append-style workflows, compression, or partial I/O. In practice, compact layout is most useful for very small fixed-size objects where minimizing storage overhead matters more than scalability. For anything expected to grow, be filtered, or accessed in pieces, chunked or contiguous layout is the better choice. |
| 73 | + |
| 74 | + |
| 75 | +#### :material-ruler-square:{.icon} Dataspaces and Dimensions |
| 76 | + |
| 77 | +A dataspace describes how data is shaped and mapped between memory and file storage. In practical terms, it tells HDF5 how an in-memory region corresponds to a dataset on disk, or how a region on disk should be read back into memory. For example, a contiguous block of memory may be interpreted as a vector, matrix, or cube-shaped dataset depending on the associated dataspace. A dataspace may have fixed extent, bounded extensible extent, or unlimited extent along one or more dimensions. When working with supported objects, H5CPP derives the in-memory dataspace automatically. When passing raw pointers to I/O operations, the relevant shape must be provided explicitly, and the file selection determines how much memory is transferred. The following descriptors define dataset dimensions: |
| 78 | + |
| 79 | +* `h5::current_dims{i,j,k,...}` — current extent of the dataset |
| 80 | +* `h5::max_dims{i,j,k,...}` — maximum extent; use `H5S_UNLIMITED` for an unbounded dimension |
| 81 | +* `h5::chunk{i,j,k,...}` — chunk shape for chunked datasets; a suitable chunk layout can significantly improve performance |
| 82 | + |
| 83 | +The following descriptors define read or write selections within a dataset: |
| 84 | + |
| 85 | +* `h5::offset{i,j,k,...}` — starting coordinates of the selection |
| 86 | +* `h5::stride{i,j,k,...}` — step between selected elements |
| 87 | +* `h5::block{i,j,k,...}` — size of each selected block |
| 88 | +* `h5::count{i,j,k,...}` — number of blocks to transfer |
| 89 | + |
| 90 | +**Note:** `h5::stride`, `h5::block`, and scatter/gather-style selections are not available when `h5::high_throughput` is enabled, as that mode prioritizes simplified high-bandwidth transfer paths. |
| 91 | + |
| 92 | + |
| 93 | +# :material-tag-multiple-outline:{.icon} Attributes |
| 94 | + |
| 95 | +Attributes are the standard HDF5 mechanism for attaching side-band metadata to groups (`h5::gr_t`), datasets (`h5::ds_t`), and named datatypes (`h5::dt_t`). In H5CPP, the same core I/O model applies here as well, so attributes support the same family of storage-capable object types described in the [Supported Types](#supported-objects) section. In the definitions below, `P ::= h5::ds_t | h5::gr_t | h5::ob_t | h5::dt_t` denotes any valid HDF5 object handle, while `std::tuple<T...>` represents a sequence of pairwise attribute operations, where consecutive tuple elements are interpreted as attribute-name / attribute-value pairs. |
| 96 | + |
| 97 | +```cpp |
| 98 | +h5::at_t acreate( const P& parent, const std::string& name, args_t&&... args ); |
| 99 | +h5::at_t aopen(const P& parent, const std::string& name, const h5::acpl_t& acpl); |
| 100 | +T aread( const P& parent, const std::string& name, const h5::acpl_t& acpl) |
| 101 | +h5::att_t awrite( const P& parent, const std::string& name, const T& ref, const h5::acpl_t& acpl) |
| 102 | +void awrite( const P& parent, const std::tuple<Field...>& fields, const h5::acpl_t& acpl) |
| 103 | +``` |
| 104 | +
|
| 105 | +**Example:** the following snippet creates three attributes and attaches them to an `h5::gr_t` group. It also illustrates mixed use of the HDF5 C API and H5CPP, including how a raw `hid_t` handle returned by the C API may be wrapped in the RAII-enabled `h5::gr_t` handle. |
| 106 | +```cpp |
| 107 | +auto attributes = std::make_tuple( |
| 108 | + "author", "steven varga", "company","vargaconsulting.ca", "year", 2019); |
| 109 | +// freely mixed CAPI call returned `hid_t id` is wrapped in RAII capable H5CPP template class: |
| 110 | +h5::gr_t group{H5Gopen(fd,"/my-data-set", H5P_DEFAULT)}; |
| 111 | +// templates will synthesize code from std::tuple, resulting in 3 attributes created and written into `group` |
| 112 | +h5::awrite(group, attributes); |
| 113 | +``` |
0 commit comments