Skip to content

tracing: add pre-validation for tracepoints and LSM hooks#4708

Merged
mtardy merged 4 commits intocilium:mainfrom
AritraDey-Dev:pr/aritra/tp-prevalidate
Apr 8, 2026
Merged

tracing: add pre-validation for tracepoints and LSM hooks#4708
mtardy merged 4 commits intocilium:mainfrom
AritraDey-Dev:pr/aritra/tp-prevalidate

Conversation

@AritraDey-Dev
Copy link
Copy Markdown
Member

Description

This PR introduces pre validation for tracepoints and LSM hooks for tracingpolicy, following up on the initial pre-validation work that was done for kprobes in PR #830(See commit descriptions for more details).

Changelog

Added pre-validation for tracepoints and LSM hooks to reject invalid `TracingPolicies` before BPF resources are created.

@AritraDey-Dev AritraDey-Dev requested a review from a team as a code owner February 28, 2026 06:56
@AritraDey-Dev AritraDey-Dev requested a review from FedeDP February 28, 2026 06:56
@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 28, 2026

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit da4431a
🔍 Latest deploy log https://app.netlify.com/projects/tetragon/deploys/69d13197c9f6620008ba5c3e
😎 Deploy Preview https://deploy-preview-4708--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@AritraDey-Dev AritraDey-Dev force-pushed the pr/aritra/tp-prevalidate branch 5 times, most recently from 9173159 to 8f17b4a Compare February 28, 2026 10:42
Copy link
Copy Markdown
Contributor

@olsajiri olsajiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, left some comments

// For raw tracepoints, argument index must be <= 5
for i, arg := range spec.Args {
if arg.Index > 5 {
return nil, fmt.Errorf("raw tracepoint %s/%s can read up to 5 arguments, but index %d was requested in args[%d]",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checks and errors like this are now doubled, also in lsm case, is there any way we could have preValidate hook and probe setup code sharing the same check?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes done thanks.

for i := range tracepoints {
for _, sel := range tracepoints[i].Selectors {
for _, act := range sel.MatchActions {
if act.Action == "NotifyEnforcer" && len(enforcers) == 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's HasNotifyEnforcerAction , it's kprobe specific, but perhaps we could change the argument to be spec.Selectors directly

require.Error(t, err)
}

func TestTracepointValidationWrongSubsystem(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have validation tests in separate files for each probe, let's add files for tracepoint and lsm as well

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sounds good!

@AritraDey-Dev AritraDey-Dev force-pushed the pr/aritra/tp-prevalidate branch 5 times, most recently from 88e1191 to 71918e3 Compare March 8, 2026 15:23
@AritraDey-Dev AritraDey-Dev requested a review from olsajiri March 8, 2026 15:25
Copy link
Copy Markdown
Contributor

@olsajiri olsajiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, left one comment, thanks

specArg := &specArgs[argIdx]
if specArg.Index >= nfields {
return nil, fmt.Errorf("tracepoint %s/%s has %d fields but field %d was requested", info.Subsys, info.Event, nfields, specArg.Index)
if err := validateTracepointArg(info.Subsys, info.Event, nfields, specArg.Index, argIdx); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes should be squashed into previous commit

@AritraDey-Dev AritraDey-Dev force-pushed the pr/aritra/tp-prevalidate branch 2 times, most recently from 85be55e to 36f212b Compare March 9, 2026 13:52
@AritraDey-Dev AritraDey-Dev requested a review from olsajiri March 10, 2026 11:44
@AritraDey-Dev
Copy link
Copy Markdown
Member Author

CI failure seems to be unrelated.

Comment thread pkg/sensors/tracing/genericlsm.go Outdated
// It checks that the kernel supports BPF LSM, that each hook exists in BTF,
// and that arguments and selectors are valid.
func preValidateLsmHooks(lsmHooks []v1alpha1.LsmHookSpec) error {
if !bpf.HasLSMPrograms() || !config.EnableLargeProgs() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have the same check in createGenericLsmSensor, let's keep just one,
I have slight preference on keep the one in createGenericLsmSensor .. but not really strong ;-)

}
}
} else {
// For raw tracepoints, argument index must be <= 5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this belongs to previous commit

// Use the pre-loaded tracepoint info from validation if available,
// otherwise create a new one (format will be loaded in buildGenericTracepointArgs).
var tp tracepoint.Tracepoint
if valInfo != nil && valInfo.tp != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could valInfo.tp be nil ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, valInfo.tp should never be nil if it comes from the pre-validation step. I only added thevalInfo.tp != nilcheck to be extra safe and avoid any potential nil pointer panics in case someone manually calls this function without running validation first.If you think it's unnecessary, I can remove it!

Event: conf.Event,
// Use the pre-loaded tracepoint info from validation if available,
// otherwise create a new one (format will be loaded in buildGenericTracepointArgs).
var tp tracepoint.Tracepoint
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest of the code uses &tp, so let's start with that and use the pointer from the beginning

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks,this perfectly avoids copying the struct.

@AritraDey-Dev AritraDey-Dev force-pushed the pr/aritra/tp-prevalidate branch 2 times, most recently from 18f62c8 to 2bfbbe1 Compare March 11, 2026 15:07
Currently, if a tracingpolicy has a bogus tracepoint with a wrong
subsystem or event name or out-of-bounds arguments, tetragon goes
ahead and creates BPF maps and programs before failing at the attach
stage. This wastes kernel resources and gives confusing error messages.

This commit adds a pre-validation step for tracepoints similar to what
it already does for kprobes. Before creating any BPF resources, it now
verify that the subsystem and event fields are not empty and that the
tracepoint actually exists in tracefs for non-raw tracepoints. Also,
it checks that argument indices are within the tracepoint's field count,
raw tracepoint argument indices are at most 5, and that NotifyEnforcer
actions have matching enforcers in the specification.

If any of these checks fail, the policy is rejected early with a clear
error message, and no BPF resources are ever created.

Before this patch (ex: raw tracepoint with index > 5):

```
time="2024-02-28T10:15:30Z" level=debug msg="Received an AddTracingPolicy request" metadata.name=bogus-raw-tp
time="2024-02-28T10:15:30Z" level=debug msg="tetragon, map loaded." sensor=generic_tracepoint map=fdinstall_map
time="2024-02-28T10:15:30Z" level=debug msg="tetragon, map loaded." sensor=generic_tracepoint map=tp_calls
...
time="2024-02-28T10:15:31Z" level=debug msg="map was unloaded" map=fdinstall_map
time="2024-02-28T10:15:31Z" level=debug msg="map was unloaded" map=tp_calls
...
time="2024-02-28T10:15:31Z" level=warn msg="Server AddTracingPolicy request failed" error="sensor generic_tracepoint from collection bogus-raw-tp failed to load: failed prog ... attaching 'generic_rawtp_event' failed: no such file or directory" metadata.name=bogus-raw-tp
```

After this patch:

```
time="2024-02-28T10:16:00Z" level=debug msg="Received an AddTracingPolicy request" metadata.name=bogus-raw-tp
time="2024-02-28T10:16:00Z" level=warn msg="Server AddTracingPolicy request failed" error="policy handler 'tracing' failed loading policy 'bogus-raw-tp': tracepoint validation failed: error in spec.tracepoints[0]: raw tracepoint (raw_syscalls/sys_enter) can read up to 5 arguments, but 10 was requested" metadata.name=bogus-raw-tp
```

Signed-off-by: Aritra Dey <adey01027@gmail.com>
@AritraDey-Dev AritraDey-Dev force-pushed the pr/aritra/tp-prevalidate branch 2 times, most recently from 718984c to 295927b Compare March 11, 2026 15:28
@AritraDey-Dev AritraDey-Dev requested a review from olsajiri March 12, 2026 16:33
@olsajiri olsajiri added the release-note/minor This PR introduces a minor user-visible change label Mar 18, 2026
Copy link
Copy Markdown
Contributor

@olsajiri olsajiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

Comment thread pkg/sensors/tracing/genericlsm.go Outdated
}

// Validate the hook exists in BTF by looking up bpf_lsm_<hook>
btfFunc := "bpf_lsm_" + f.Hook
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, we do this also in addLsm would be good to have helper that returns the name

Similar to what is done for tracepoints in commit 9968f04, this
adds a pre-validation step for LSM hooks. Before creating any BPF
resources, we now verify that the kernel actually supports BPF LSM
programs, the hook name is not empty, and the hook exists in BTF by
looking up its associated bpf_lsm prefix. We also ensure that selectors
are valid for LSM without MatchReturnArgs, argument indices are within
the maximum bound of 4, and that argument types are properly recognized.

If a user writes a policy with a non-existent LSM hook name, they get
a clear error right away instead of a confusing failure deep in the
BPF loading code.

Before this patch (ex: running with LSM not supported by kernel):

```
time="2024-02-28T10:20:00Z" level=debug msg="Received an AddTracingPolicy request" metadata.name=bogus-lsm
time="2024-02-28T10:20:00Z" level=debug msg="tetragon, map loaded." sensor=generic_lsm map=config_map
time="2024-02-28T10:20:01Z" level=debug msg="map was unloaded" map=config_map
time="2024-02-28T10:20:01Z" level=warn msg="Server AddTracingPolicy request failed" error="sensor generic_lsm from collection bogus-lsm failed to load: failed prog ... attaching 'generic_lsm_event' failed: ... no such file or directory..." metadata.name=bogus-lsm
```

After this patch:

```
time="2024-02-28T10:21:00Z" level=debug msg="Received an AddTracingPolicy request" metadata.name=bogus-lsm
time="2024-02-28T10:21:00Z" level=warn msg="Server AddTracingPolicy request failed" error="policy handler 'tracing' failed loading policy 'bogus-lsm': lsm validation failed: does your kernel support the bpf LSM? You can enable LSM BPF by modifying the GRUB configuration /etc/default/grub with GRUB_CMDLINE_LINUX=\"lsm=bpf\"" metadata.name=bogus-lsm
```

Signed-off-by: Aritra Dey <adey01027@gmail.com>
@AritraDey-Dev AritraDey-Dev force-pushed the pr/aritra/tp-prevalidate branch from 295927b to 37dd8d5 Compare March 18, 2026 12:26
@mtardy mtardy self-requested a review March 20, 2026 14:05
Copy link
Copy Markdown
Member

@mtardy mtardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just two small remarks

// phase. It is passed to createGenericTracepoint so that we don't need to load
// the tracepoint format from tracefs a second time.
type tpValidateInfo struct {
tp *tracepoint.Tracepoint
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that be more simple if your struct just hold the type instead of a pointer to the type? is it useful that to be able to be null?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,there's no need for tp to be nullable inside since the entire tpValidateInfo wrapper can just be nil if validation fails,updated it!

)

func TestLsmValidationBogusHook(t *testing.T) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove all the empty lines after the function name

This commit follows the same pattern used by kprobes to prevent
duplicate work. Instead of just checking and throwing away the results,
the preValidateTracepoints function now returns the pre-loaded tracepoint
information including format and fields, and passes it directly through
to createGenericTracepointSensor.

This avoids loading the tracepoint format from tracefs twice, which
typically happens once during validation and once during sensor creation.
The format is now loaded once in preValidateTracepoint and then
reused in buildGenericTracepointArgs, which skips LoadFormat
when the format is already present.

Signed-off-by: Aritra Dey <adey01027@gmail.com>
@AritraDey-Dev AritraDey-Dev force-pushed the pr/aritra/tp-prevalidate branch from 37dd8d5 to da4431a Compare April 4, 2026 15:43
This commit adds tests to ensure that invalid TracingPolicies
for tracepoints and LSM hooks are properly rejected
during the pre-validation phase, before any BPF resources get created.

Signed-off-by: Aritra Dey <adey01027@gmail.com>
@AritraDey-Dev AritraDey-Dev force-pushed the pr/aritra/tp-prevalidate branch from da4431a to 1a38543 Compare April 4, 2026 16:20
@AritraDey-Dev AritraDey-Dev requested a review from mtardy April 4, 2026 16:20
Copy link
Copy Markdown
Member

@mtardy mtardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@AritraDey-Dev
Copy link
Copy Markdown
Member Author

@mtardy since it's already approved,is there anything blocking this PR to merge

@mtardy mtardy merged commit 39a850d into cilium:main Apr 8, 2026
49 checks passed
@AritraDey-Dev AritraDey-Dev deleted the pr/aritra/tp-prevalidate branch April 8, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/minor This PR introduces a minor user-visible change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants