tracing: add pre-validation for tracepoints and LSM hooks#4708
tracing: add pre-validation for tracepoints and LSM hooks#4708mtardy merged 4 commits intocilium:mainfrom
Conversation
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
9173159 to
8f17b4a
Compare
olsajiri
left a comment
There was a problem hiding this comment.
thanks, left some comments
| // For raw tracepoints, argument index must be <= 5 | ||
| for i, arg := range spec.Args { | ||
| if arg.Index > 5 { | ||
| return nil, fmt.Errorf("raw tracepoint %s/%s can read up to 5 arguments, but index %d was requested in args[%d]", |
There was a problem hiding this comment.
checks and errors like this are now doubled, also in lsm case, is there any way we could have preValidate hook and probe setup code sharing the same check?
| for i := range tracepoints { | ||
| for _, sel := range tracepoints[i].Selectors { | ||
| for _, act := range sel.MatchActions { | ||
| if act.Action == "NotifyEnforcer" && len(enforcers) == 0 { |
There was a problem hiding this comment.
there's HasNotifyEnforcerAction , it's kprobe specific, but perhaps we could change the argument to be spec.Selectors directly
| require.Error(t, err) | ||
| } | ||
|
|
||
| func TestTracepointValidationWrongSubsystem(t *testing.T) { |
There was a problem hiding this comment.
we have validation tests in separate files for each probe, let's add files for tracepoint and lsm as well
88e1191 to
71918e3
Compare
olsajiri
left a comment
There was a problem hiding this comment.
looks good, left one comment, thanks
| specArg := &specArgs[argIdx] | ||
| if specArg.Index >= nfields { | ||
| return nil, fmt.Errorf("tracepoint %s/%s has %d fields but field %d was requested", info.Subsys, info.Event, nfields, specArg.Index) | ||
| if err := validateTracepointArg(info.Subsys, info.Event, nfields, specArg.Index, argIdx); err != nil { |
There was a problem hiding this comment.
I think these changes should be squashed into previous commit
85be55e to
36f212b
Compare
|
CI failure seems to be unrelated. |
| // It checks that the kernel supports BPF LSM, that each hook exists in BTF, | ||
| // and that arguments and selectors are valid. | ||
| func preValidateLsmHooks(lsmHooks []v1alpha1.LsmHookSpec) error { | ||
| if !bpf.HasLSMPrograms() || !config.EnableLargeProgs() { |
There was a problem hiding this comment.
we have the same check in createGenericLsmSensor, let's keep just one,
I have slight preference on keep the one in createGenericLsmSensor .. but not really strong ;-)
| } | ||
| } | ||
| } else { | ||
| // For raw tracepoints, argument index must be <= 5 |
There was a problem hiding this comment.
this belongs to previous commit
| // Use the pre-loaded tracepoint info from validation if available, | ||
| // otherwise create a new one (format will be loaded in buildGenericTracepointArgs). | ||
| var tp tracepoint.Tracepoint | ||
| if valInfo != nil && valInfo.tp != nil { |
There was a problem hiding this comment.
could valInfo.tp be nil ?
There was a problem hiding this comment.
No, valInfo.tp should never be nil if it comes from the pre-validation step. I only added thevalInfo.tp != nilcheck to be extra safe and avoid any potential nil pointer panics in case someone manually calls this function without running validation first.If you think it's unnecessary, I can remove it!
| Event: conf.Event, | ||
| // Use the pre-loaded tracepoint info from validation if available, | ||
| // otherwise create a new one (format will be loaded in buildGenericTracepointArgs). | ||
| var tp tracepoint.Tracepoint |
There was a problem hiding this comment.
the rest of the code uses &tp, so let's start with that and use the pointer from the beginning
There was a problem hiding this comment.
Thanks,this perfectly avoids copying the struct.
18f62c8 to
2bfbbe1
Compare
Currently, if a tracingpolicy has a bogus tracepoint with a wrong subsystem or event name or out-of-bounds arguments, tetragon goes ahead and creates BPF maps and programs before failing at the attach stage. This wastes kernel resources and gives confusing error messages. This commit adds a pre-validation step for tracepoints similar to what it already does for kprobes. Before creating any BPF resources, it now verify that the subsystem and event fields are not empty and that the tracepoint actually exists in tracefs for non-raw tracepoints. Also, it checks that argument indices are within the tracepoint's field count, raw tracepoint argument indices are at most 5, and that NotifyEnforcer actions have matching enforcers in the specification. If any of these checks fail, the policy is rejected early with a clear error message, and no BPF resources are ever created. Before this patch (ex: raw tracepoint with index > 5): ``` time="2024-02-28T10:15:30Z" level=debug msg="Received an AddTracingPolicy request" metadata.name=bogus-raw-tp time="2024-02-28T10:15:30Z" level=debug msg="tetragon, map loaded." sensor=generic_tracepoint map=fdinstall_map time="2024-02-28T10:15:30Z" level=debug msg="tetragon, map loaded." sensor=generic_tracepoint map=tp_calls ... time="2024-02-28T10:15:31Z" level=debug msg="map was unloaded" map=fdinstall_map time="2024-02-28T10:15:31Z" level=debug msg="map was unloaded" map=tp_calls ... time="2024-02-28T10:15:31Z" level=warn msg="Server AddTracingPolicy request failed" error="sensor generic_tracepoint from collection bogus-raw-tp failed to load: failed prog ... attaching 'generic_rawtp_event' failed: no such file or directory" metadata.name=bogus-raw-tp ``` After this patch: ``` time="2024-02-28T10:16:00Z" level=debug msg="Received an AddTracingPolicy request" metadata.name=bogus-raw-tp time="2024-02-28T10:16:00Z" level=warn msg="Server AddTracingPolicy request failed" error="policy handler 'tracing' failed loading policy 'bogus-raw-tp': tracepoint validation failed: error in spec.tracepoints[0]: raw tracepoint (raw_syscalls/sys_enter) can read up to 5 arguments, but 10 was requested" metadata.name=bogus-raw-tp ``` Signed-off-by: Aritra Dey <adey01027@gmail.com>
718984c to
295927b
Compare
| } | ||
|
|
||
| // Validate the hook exists in BTF by looking up bpf_lsm_<hook> | ||
| btfFunc := "bpf_lsm_" + f.Hook |
There was a problem hiding this comment.
nit, we do this also in addLsm would be good to have helper that returns the name
Similar to what is done for tracepoints in commit 9968f04, this adds a pre-validation step for LSM hooks. Before creating any BPF resources, we now verify that the kernel actually supports BPF LSM programs, the hook name is not empty, and the hook exists in BTF by looking up its associated bpf_lsm prefix. We also ensure that selectors are valid for LSM without MatchReturnArgs, argument indices are within the maximum bound of 4, and that argument types are properly recognized. If a user writes a policy with a non-existent LSM hook name, they get a clear error right away instead of a confusing failure deep in the BPF loading code. Before this patch (ex: running with LSM not supported by kernel): ``` time="2024-02-28T10:20:00Z" level=debug msg="Received an AddTracingPolicy request" metadata.name=bogus-lsm time="2024-02-28T10:20:00Z" level=debug msg="tetragon, map loaded." sensor=generic_lsm map=config_map time="2024-02-28T10:20:01Z" level=debug msg="map was unloaded" map=config_map time="2024-02-28T10:20:01Z" level=warn msg="Server AddTracingPolicy request failed" error="sensor generic_lsm from collection bogus-lsm failed to load: failed prog ... attaching 'generic_lsm_event' failed: ... no such file or directory..." metadata.name=bogus-lsm ``` After this patch: ``` time="2024-02-28T10:21:00Z" level=debug msg="Received an AddTracingPolicy request" metadata.name=bogus-lsm time="2024-02-28T10:21:00Z" level=warn msg="Server AddTracingPolicy request failed" error="policy handler 'tracing' failed loading policy 'bogus-lsm': lsm validation failed: does your kernel support the bpf LSM? You can enable LSM BPF by modifying the GRUB configuration /etc/default/grub with GRUB_CMDLINE_LINUX=\"lsm=bpf\"" metadata.name=bogus-lsm ``` Signed-off-by: Aritra Dey <adey01027@gmail.com>
295927b to
37dd8d5
Compare
| // phase. It is passed to createGenericTracepoint so that we don't need to load | ||
| // the tracepoint format from tracefs a second time. | ||
| type tpValidateInfo struct { | ||
| tp *tracepoint.Tracepoint |
There was a problem hiding this comment.
would that be more simple if your struct just hold the type instead of a pointer to the type? is it useful that to be able to be null?
There was a problem hiding this comment.
Yes,there's no need for tp to be nullable inside since the entire tpValidateInfo wrapper can just be nil if validation fails,updated it!
| ) | ||
|
|
||
| func TestLsmValidationBogusHook(t *testing.T) { | ||
|
|
There was a problem hiding this comment.
nit: remove all the empty lines after the function name
This commit follows the same pattern used by kprobes to prevent duplicate work. Instead of just checking and throwing away the results, the preValidateTracepoints function now returns the pre-loaded tracepoint information including format and fields, and passes it directly through to createGenericTracepointSensor. This avoids loading the tracepoint format from tracefs twice, which typically happens once during validation and once during sensor creation. The format is now loaded once in preValidateTracepoint and then reused in buildGenericTracepointArgs, which skips LoadFormat when the format is already present. Signed-off-by: Aritra Dey <adey01027@gmail.com>
37dd8d5 to
da4431a
Compare
This commit adds tests to ensure that invalid TracingPolicies for tracepoints and LSM hooks are properly rejected during the pre-validation phase, before any BPF resources get created. Signed-off-by: Aritra Dey <adey01027@gmail.com>
da4431a to
1a38543
Compare
|
@mtardy since it's already approved,is there anything blocking this PR to merge |
Description
This PR introduces pre validation for tracepoints and LSM hooks for tracingpolicy, following up on the initial pre-validation work that was done for kprobes in PR #830(See commit descriptions for more details).
Changelog