Skip to content

Fix memory safety vulnerabilities in high-level and VFD code#6140

Open
brtnfld wants to merge 93 commits intoHDFGroup:developfrom
brtnfld:snyk
Open

Fix memory safety vulnerabilities in high-level and VFD code#6140
brtnfld wants to merge 93 commits intoHDFGroup:developfrom
brtnfld:snyk

Conversation

@brtnfld
Copy link
Copy Markdown
Collaborator

@brtnfld brtnfld commented Jan 2, 2026

Address multiple CWE-415 (double-free), CWE-416 (use-after-free), and CWE-122 (buffer overflow) vulnerabilities identified by static analysis:

  • hl/src/H5DS.c: Fix double-free in H5DSis_scale() by setting buf to NULL after free and adding NULL check in cleanup path

  • hl/src/H5LT.c: Fix multiple memory issues:

    • Set myinput to NULL after free in H5LTtext_to_dtype()
    • Add NULL check in realloc_and_append() to prevent use-after-free
    • Refactor duplicated stmp handling by creating H5LT_append_dtype_super_text() helper function, eliminating ~50 lines of repeated code across 4 case blocks
  • hl/src/H5TB.c: Replace unsafe strcpy() with strncpy() in H5TBget_field_info() using HLTB_MAX_FIELD_LEN constant to prevent buffer overflow

  • hl/src/H5TBpublic.h: Document buffer size requirements for field_names parameter

  • src/H5FDstdio.c: Fix inconsistent resource cleanup in H5FD_stdio_open() by using file->fp instead of f throughout error paths

  • src/H5VLnative.c: Add assert checks for obj and file parameters in H5VL_native_get_file_struct() following internal API conventions

SAFE project work.


Important

Fixes memory safety vulnerabilities in HDF5 codebase, addressing double-free, use-after-free, and buffer overflow issues across multiple files.

  • Memory Safety Fixes:
    • H5DS.c: Fix double-free in H5DSis_scale() by setting buf to NULL after free and adding NULL check in cleanup.
    • H5LT.c: Set myinput to NULL after free in H5LTtext_to_dtype(), add NULL check in realloc_and_append(), refactor H5LT_append_dtype_super_text() to reduce code duplication.
    • H5TB.c: Replace strcpy() with strncpy() in H5TBget_field_info() to prevent buffer overflow.
  • Documentation:
    • H5TBpublic.h: Document buffer size requirements for field_names parameter.
  • Resource Management:
    • H5FDstdio.c: Use file->fp consistently in H5FD_stdio_open() for error paths.
  • Assertions:
    • H5VLnative.c: Add assert checks for obj and file parameters in H5VL_native_get_file_struct().

This description was created by Ellipsis for 7b22833. You can customize this summary. It will automatically update as commits are pushed.

Address multiple CWE-415 (double-free), CWE-416 (use-after-free),
and CWE-122 (buffer overflow) vulnerabilities identified by static analysis:

- hl/src/H5DS.c: Fix double-free in H5DSis_scale() by setting buf to NULL
  after free and adding NULL check in cleanup path

- hl/src/H5LT.c: Fix multiple memory issues:
  * Set myinput to NULL after free in H5LTtext_to_dtype()
  * Add NULL check in realloc_and_append() to prevent use-after-free
  * Refactor duplicated stmp handling by creating H5LT_append_dtype_super_text()
    helper function, eliminating ~50 lines of repeated code across 4 case blocks

- hl/src/H5TB.c: Replace unsafe strcpy() with strncpy() in H5TBget_field_info()
  using HLTB_MAX_FIELD_LEN constant to prevent buffer overflow

- hl/src/H5TBpublic.h: Document buffer size requirements for field_names parameter

- src/H5FDstdio.c: Fix inconsistent resource cleanup in H5FD_stdio_open() by
  using file->fp instead of f throughout error paths

- src/H5VLnative.c: Add assert checks for obj and file parameters in
  H5VL_native_get_file_struct() following internal API conventions
@brtnfld brtnfld added the Component - C Library Core C library issues (usually in the src directory) label Jan 2, 2026
@brtnfld brtnfld added the HDFG-internal Internally coded for use by the HDF Group label Jan 2, 2026
@github-project-automation github-project-automation bot moved this to To be triaged in HDF5 - TRIAGE & TRACK Jan 2, 2026
Comment thread hl/src/H5DS.c Fixed
Comment thread hl/src/H5DS.c Outdated
Comment thread hl/src/H5TBpublic.h Outdated
Comment thread src/H5FDstdio.c

/* Use the value in the property list */
if (H5Pget_file_locking(fapl_id, &unused, &file->ignore_disabled_file_locks) < 0) {
fclose(file->fp);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the issue with these close calls? file->fp should be the same as f at this point

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snyk flags it for clarity on resource ownership. The concern is that it's not explicit which pointer "owns" the resource after the assignment. Once you've assigned file->fp = f, the FILE* is conceptually owned by the file structure, and using file->fp in cleanup makes this ownership clear.

Comment thread hl/src/H5LT.c Outdated
buf = tmp_realloc;
}

if (!buf)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of the _no_user_buf parameter isn't really obvious, but it seems like this check overlaps with the same check inside that block, which seems like it would imply buf being allowed to be passed in as NULL in the false case. But I'm guessing this check was added due to the strlen(buf) below. This seems like we should determine whether it was ever intended for buf to be allowed as NULL.

Copy link
Copy Markdown
Collaborator Author

@brtnfld brtnfld Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments, side-stepped the question if we should assert/error more clearly.

Comment thread hl/src/H5TBprivate.h Outdated
#define HLTB_MAX_FIELD_LEN 255
#define TABLE_CLASS "TABLE"
#define TABLE_VERSION "3.0"
/* HLTB_MAX_FIELD_LEN is now defined in H5TBpublic.h */
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harmless, but it's probably unnecessary to document that a macro used to be in this file

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Address a breaking API change introduced in commit 7b22833 where
H5TBget_field_info unconditionally wrote a null terminator at byte 254
(HLTB_MAX_FIELD_LEN - 1), requiring all callers to allocate 255-byte
buffers regardless of actual field name length.

Changes:
- hl/src/H5TB.c: Implement smart truncation that only enforces the
  255-byte limit when field names actually exceed it. For typical
  short field names, only the actual string length plus null terminator
  is written, preserving backward compatibility with existing code
  using smaller buffers.

- hl/src/H5TBpublic.h: Update documentation for HLTB_MAX_FIELD_LEN
  and H5TBget_field_info to clarify that 255-byte buffers are only
  required for exceptionally long field names. Short names are copied
  exactly without padding.

- hl/test/test_table.c: Add new test case "field info with small buffers
  (backward compatibility)" that verifies the function works correctly
  with 32-byte buffers for typical field names, ensuring no buffer
  overflow occurs.

This fix maintains the security improvement (preventing unbounded writes
from the original strcpy) while avoiding the compatibility hazard of
requiring all existing code to be updated.

Fixes: CWE-122 (Heap-based Buffer Overflow) - user-side compatibility issue
Maintains: Security fix from 7b22833
Add comprehensive documentation and assertions to address review feedback
about the ambiguous intent of the _no_user_buf parameter and redundant
NULL checks in realloc_and_append (H5LT.c).

Changes:
- Enhanced function header comment to document:
  * Two operating modes (library-managed vs user-provided buffer)
  * Explicit parameter descriptions
  * Preconditions that buf must never be NULL

- Added inline comments explaining:
  * Mode 1 (library-managed): buf initialized via calloc, can reallocate
  * Mode 2 (user-provided): fixed-size buffer, no reallocation
  * Why there are two NULL checks (defensive programming)

- Added assertion before the second NULL check to:
  * Document the API contract (buf must be valid)
  * Aid debugging in development builds
  * Make it clear this check is for defensive programming

The second NULL check (line ~1978) is intentionally redundant:
- In library-managed mode: already checked at line ~1945
- In user-provided mode: catches caller errors
- Prevents strlen(buf) crash regardless of mode

This addresses the review comment about unclear intent and overlapping
checks, making it explicit that buf=NULL is never a valid input, and
the checks are defensive programming against logic/caller errors.

Addresses: Review feedback from jhendersonHDF on commit 7b22833
The comment '/* HLTB_MAX_FIELD_LEN is now defined in H5TBpublic.h */'
documents a past refactoring but provides no useful information for
current development. The constant is properly defined in H5TBpublic.h
and available through the include at line 20.

Removes code archaeology that doesn't aid understanding.
… API

Changes to hl/src/H5LT.c:
- Simplify defensive redundancy in realloc_and_append()
- Consolidate triple NULL check (lines 1945, 1972, 1979) into single
  assertion + runtime check at function start
- Improves code clarity while maintaining identical safety guarantees
- No functional change: both debug (assertion) and production (runtime)
  safety preserved

Changes to hl/test/test_table.c:
- Add comprehensive HLTB_MAX_FIELD_LEN boundary testing
- Tests field name truncation at exact boundaries:
  * 253 chars: no truncation (253 + null = 254 < 255)
  * 254 chars: no truncation (254 + null = 255 = limit)
  * 255 chars: truncates to 254 (255 + null = 256 > limit)
  * 1000 chars: truncates to 254 (extreme case)
- Complements existing small-buffer backward compatibility test
- Verifies truncation logic in H5TB.c:3037-3040 works correctly
- Fix compiler warnings: remove unused boundary_field_sizes variable,
  initialize boundary_names_out to NULL
Document the following changes in release_docs/CHANGELOG.md:

Library section:
- Fixed file descriptor leaks in stdio VFD error paths (H5FDstdio.c)
- Added defensive NULL pointer checks in native VOL connector (H5VLnative.c)

High-Level Library section:
- Fixed critical buffer overflow vulnerability in H5TBget_field_info() (CWE-120)
  * SECURITY FIX: Replaced unbounded strcpy() with bounds-checked memcpy()
  * Field names exceeding 255 chars are now safely truncated
  * Backward compatibility preserved for small buffers
- Made HLTB_MAX_FIELD_LEN constant public (moved to H5TBpublic.h)
- Fixed memory leaks and improved safety in H5LT functions (H5LT.c)
  * Added NULL check after strdup() in H5LTtext_to_dtype()
  * Enhanced documentation for realloc_and_append()
- Eliminated code duplication in H5LT datatype conversion
  * New helper function H5LT_append_dtype_super_text() reduces ~80 lines
  * Improves maintainability across ENUM, VLEN, ARRAY, COMPLEX handlers
- Fixed use-after-free risk in H5DSis_scale() (H5DS.c)

These entries correspond to commits 7b22833 through 31edfe0.
jhendersonHDF and others added 25 commits April 17, 2026 08:47
Fix issue where chunked datasets could get setup with an incorrect
chunking index type in parallel HDF5

Fix issue where metadata cache images with an undefined address
and size of 0 couldn't be properly decoded

Fix issue where a flag in H5Cimage.c wasn't getting set correctly
for release builds of the library, leading to incorrect error
checking when reconstructing metadata cache entries
Disable float16 support for undefined sanitizer workflow for now as it
causes a crash in UBSan
Link checker can't access the acm url, hence will fail.  The change in
this PR is a workaround to provide the url but prevent the link checker
from accessing it.  Please do not add https://.
* Fix display of '--' options in documentation

* Fix more formatting
* fixed assignment of size in the wrapper

* Call H5DSget_label directly from Fortran wrapper

Replace the intermediate C wrapper h5dsget_label_c with a direct
bind(c) call to H5DSget_label from H5DSget_label_f. This eliminates
the malloc/free of a temporary buffer and the associated failure path
where size was incorrectly set when H5DSget_label failed. The Fortran
wrapper now handles the C-to-Fortran string conversion (equivalent to
HD5packFstring) by blank-padding the buffer from the returned label
length to the end.

* Remove unused h5dsget_label_c C wrapper
… Array Indexing information which is embedded within the data layout message. (HDFGroup#6333)
* Consolidate documentation under doc/ directory

Move user-facing guides from release_docs/ and doxygen/ into a single
doc/ root. release_docs/ now holds only release artifacts (changelogs,
history, release process, maintainer info).

- git mv release_docs/INSTALL*.md, USING_*.md, README_HPC.md,
  BuildSystemNotes.md, AutotoolsToCMakeOptions.md,
  HDF5_Library_2.0.0_Migration_Guide.md → doc/
- git mv doxygen/ → doc/doxygen/
- Update CMakeLists.txt: HDF5_DOXYGEN_DIR and add_subdirectory path
- Update CMakeInstallation.cmake: all install paths for moved files
- Update bin/make_vers: hardcoded doxygen/ path substitution
- Update doc/doxygen/CMakeLists.txt: EXAMPLES_DIRECTORY and comments
- Update README.md, CONTRIBUTING.md, SECURITY.md, config/README.md,
  release_docs/RELEASE_PROCESS.md: links to moved files
- Update doxygen .dox files: release_docs/ URLs for moved guides
- Rewrite release_docs/README.md for narrowed scope

* Add HDF5_DOCS_DIR variable for doc/ root path

Introduce HDF5_DOCS_DIR = \${HDF5_SOURCE_DIR}/doc so that
CMakeInstallation.cmake and future callers reference the doc/
directory symbolically rather than by hardcoded path.
HDF5_DOXYGEN_DIR is now derived from HDF5_DOCS_DIR.
Updated NVERDOT and NVERDASH environment variables to version 26.3.
* ci: add gate job to CodeQL workflow for text-only PRs

Remove paths-ignore from the workflow trigger and add a check-changes
job with dorny/paths-filter to detect code changes at the job level.
This ensures the workflow always triggers so the codeql-complete gate
job can report a passing status when analyze is skipped, preventing
text-only PRs from being blocked by required status checks.

* ci: check both check-changes and analyze results in gate job

Add check-changes to the needs array of codeql-complete so that a
failure in the change-detection job is not silently treated as a
skipped analysis.
so that the "Require code scanning results" branch
protection rule is satisfied for text-only PRs.
Restrict empty SARIF upload to pull_request events only, so that
push-to-develop (e.g. after merging a text-only PR) does not overwrite
the real CodeQL results in the Security tab with an empty SARIF.
…ttings (HDFGroup#6280)

When a global API version is set (e.g., H5_USE_16_API), functions
introduced after that version now default to their earliest version
(version 1) instead of the latest. This prevents breakage when an
application uses an older API setting but calls functions that were
later versioned.
Updates the requirements on [actions/checkout](https://github.com/actions/checkout), [actions/download-artifact](https://github.com/actions/download-artifact), [actions/cache](https://github.com/actions/cache), [lukka/get-cmake](https://github.com/lukka/get-cmake), [actions/setup-java](https://github.com/actions/setup-java), [EndBug/add-and-commit](https://github.com/endbug/add-and-commit), [github/codeql-action](https://github.com/github/codeql-action), [advanced-security/filter-sarif](https://github.com/advanced-security/filter-sarif), [codespell-project/actions-codespell](https://github.com/codespell-project/actions-codespell), [azure/trusted-signing-action](https://github.com/azure/trusted-signing-action), [vmactions/freebsd-vm](https://github.com/vmactions/freebsd-vm), [julia-actions/setup-julia](https://github.com/julia-actions/setup-julia), [msys2/setup-msys2](https://github.com/msys2/setup-msys2), [vmactions/openbsd-vm](https://github.com/vmactions/openbsd-vm) and [softprops/action-gh-release](https://github.com/softprops/action-gh-release) to permit the latest version.

* Keep vmactions/openbsd-vm@271a1ba # v1.3.4 until ssh doesn't fail with newer version.
The loop in H5O__dtype_decode_helper() that computes nelem by multiplying array dimension sizes has no per-step overflow check.

This produces incorrect element counts that propagate through type conversion, vlen iteration, and size calculations.

Add a per-step overflow guard inside the multiplication loop so the wrap is caught before it happens.
* Allow setting HDF5_INSTALL_JNI_LIB_DIR to specify install location for the JNI shared library

* No library versioning for Java JNI
@brtnfld brtnfld requested a review from epourmal as a code owner April 17, 2026 13:47
brtnfld added a commit to brtnfld/hdf5 that referenced this pull request Apr 17, 2026
…FIELD_LEN public

- H5TBget_field_info: replace unbounded strcpy with bounds-checked memcpy
  guarded by HLTB_MAX_FIELD_LEN; names >= 255 chars truncated safely (CWE-120)
- Move HLTB_MAX_FIELD_LEN from H5TBprivate.h to H5TBpublic.h with docs so
  callers can correctly size their field_names[] buffers
- test_table.c: add backward-compat (32-byte buffer) and boundary-length
  (253/254/255/1000-char names) tests for H5TBget_field_info
- CMakeTests.cmake: add test_boundary.h5 to cleanup list
- CHANGELOG.md: add H5TBget_field_info security fix and HLTB_MAX_FIELD_LEN
  entries; HDFGroup#6140's H5DSis_scale buf=NULL entry is superseded by our broader
  cleanup already on this branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component - C Library Core C library issues (usually in the src directory) HDFG-internal Internally coded for use by the HDF Group

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.