Problem
qat.service fails to start on 420xx hardware with:
QAT driver is still not present after 20s. Aborting qat_init
even when the hardware and drivers are fully functional.
Root cause
get_module_state() builds a grep pattern for every name in SUPPORTED_DRIVER_NAMES (qat_4xxx qat_420xx) and returns the fifth field (module state) for every match in /proc/modules:
get_module_state() {
CMD=""
for SUPPORTED_DRIVER_NAME in $SUPPORTED_DRIVER_NAMES;
do
CMD="$CMD -e ^$SUPPORTED_DRIVER_NAME"
done
echo "$(cat /proc/modules | grep $CMD | cut -d' ' -f5)"
}
On 420xx hardware the kernel loads both modules — qat_420xx for the physical devices and qat_4xxx as a dependency — so get_module_state() returns two lines:
check_driver() then compares this multi-line string against the literal "Live":
while [ "$CURRENT_STATE" != "Live" ]
"Live\nLive" != "Live" is always true, so the loop runs until the 20-second timeout and the service exits with an error, despite both drivers being fully ready.
Reproduction
On any 420xx host (PCI device ID 0x4946):
# Both modules are present even though qat_4xxx has no devices bound
lsmod | grep qat
# qat_420xx ...
# qat_4xxx ... ← loaded, no bound devices
# intel_qat ... 2 qat_4xxx,qat_420xx
# Simulate what get_module_state() returns
grep -e '^qat_4xxx' -e '^qat_420xx' /proc/modules | cut -d' ' -f5
# Live
# Live ← two lines; never matches the string "Live"
systemctl start qat
# → "QAT driver is still not present after 20s. Aborting qat_init"
The workaround is to manually unload the idle module first (rmmod qat_4xxx), after which only one line is returned and the comparison succeeds.
Proposed Fix
Replace the string-equality check with a grep that succeeds as soon as any driver in the output reaches the Live state:
# before
while [ "$CURRENT_STATE" != "Live" ]
# after
while ! echo "$CURRENT_STATE" | grep -q "^Live$"
This is the correct semantic: the relevant hardware driver (qat_420xx) being Live is sufficient signal to proceed. The state of co-loaded but idle drivers (qat_4xxx) is irrelevant.
Testing
Verified on an Intel 420xx SR630v3 system running RHEL 9 (kernel 5.14.0-687.el9.x86_64, qatlib 25.08.0) with both qat_4xxx and qat_420xx loaded. Prior to the fix systemctl start qat timed out consistently; after the fix the service
starts cleanly and all 8 PFs are configured correctly.
Problem
qat.service fails to start on 420xx hardware with:
QAT driver is still not present after 20s. Aborting qat_initeven when the hardware and drivers are fully functional.
Root cause
get_module_state() builds a grep pattern for every name in SUPPORTED_DRIVER_NAMES (qat_4xxx qat_420xx) and returns the fifth field (module state) for every match in /proc/modules:
On 420xx hardware the kernel loads both modules — qat_420xx for the physical devices and qat_4xxx as a dependency — so get_module_state() returns two lines:
check_driver() then compares this multi-line string against the literal "Live":
while [ "$CURRENT_STATE" != "Live" ]"Live\nLive" != "Live" is always true, so the loop runs until the 20-second timeout and the service exits with an error, despite both drivers being fully ready.
Reproduction
On any 420xx host (PCI device ID 0x4946):
The workaround is to manually unload the idle module first (rmmod qat_4xxx), after which only one line is returned and the comparison succeeds.
Proposed Fix
Replace the string-equality check with a grep that succeeds as soon as any driver in the output reaches the Live state:
This is the correct semantic: the relevant hardware driver (qat_420xx) being Live is sufficient signal to proceed. The state of co-loaded but idle drivers (qat_4xxx) is irrelevant.
Testing
Verified on an Intel 420xx SR630v3 system running RHEL 9 (kernel 5.14.0-687.el9.x86_64, qatlib 25.08.0) with both qat_4xxx and qat_420xx loaded. Prior to the fix systemctl start qat timed out consistently; after the fix the service
starts cleanly and all 8 PFs are configured correctly.