Skip to content

Openmetric stats improvements#617

Draft
maxime-leroy wants to merge 12 commits into
DPDK:mainfrom
maxime-leroy:openmetric_stats_improvements
Draft

Openmetric stats improvements#617
maxime-leroy wants to merge 12 commits into
DPDK:mainfrom
maxime-leroy:openmetric_stats_improvements

Conversation

@maxime-leroy
Copy link
Copy Markdown
Collaborator

@maxime-leroy maxime-leroy commented May 13, 2026

modules/infra/api/iface.c extends OpenMetrics collection for interfaces with new metrics and labels.

New metrics:

  • iface_speed_bps (gauge): converts speed from Mbps to bits/sec; 0 when unknown or link down
  • iface_last_change_seconds (gauge): UNIX wall-clock timestamp of last operational state change (0 if never transitioned)
  • iface_rx_drops, iface_rx_errors, iface_tx_errors (counters): hardware-level drop/error counts from ethdev driver (non-zero only for PORT type)
  • iface_rx_broadcast_packets, iface_rx_multicast_packets, iface_tx_broadcast_packets, iface_tx_multicast_packets (counters): obtained by querying PMD xstats with driver-specific name aliases (e.g., rx_broadcast_packets vs ingress_broadcast_frames)
  • iface_rx_unicast_packets, iface_tx_unicast_packets (counters): computed via saturating subtraction (total - broadcast - multicast)

New labels on all per-iface metrics:

  • parent: parent interface name for VLAN sub-interfaces (enables stack reconstruction from a single scrape without separate API calls)
  • vrf or domain: already present; clarified as either VRF name (for VRF-mode interfaces) or domain/bridge name
  • mac: primary MAC address formatted as string; defaults to 00:00:00:00:00:00 when unavailable

Implementation details:

  • Operational state transitions tracked via iface_status_event() handler using monotonic clock timestamps to remain immune to NTP/PTP adjustments; rebased to wall clock during metric emission using computed mono_to_wall offset
  • Broadcast/multicast xstats queried from DPDK PMD with fallback to 0 for drivers not exposing this breakdown (virtio, tap, null, vhost) or non-port interface types
  • Hardware drops/errors (from rte_eth_stats) emitted only for PORT interfaces; 0 for other types to maintain consistent metric cardinality
  • API initialization subscribes to iface status events (up/down/post-add) to enable last-change tracking

Review Change Stack

@maxime-leroy maxime-leroy marked this pull request as draft May 13, 2026 14:34
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e542d6bf-e189-49af-8364-66495404969c

📥 Commits

Reviewing files that changed from the base of the PR and between 0ab251b and 26513a5.

📒 Files selected for processing (1)
  • modules/infra/api/iface.c

📝 Walkthrough

Walkthrough

This PR extends the interface metrics model with hardware-level observability. It adds link speed, operational state change time, drops/errors, and broadcast/multicast/unicast packet counters. Each metric scrape is enriched with VRF/domain, VLAN parent, and MAC address labels. Status transitions are tracked via monotonic timestamps and rebased to wall-clock during emission. DPDK PMD xstats are queried, aliased to generic metrics (defaulting to 0 when unavailable), and unicast counters are derived by saturating subtraction from totals. The status event handler is wired into initialization to populate last-change data.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread modules/infra/api/iface.c Outdated
Comment thread modules/infra/api/iface.c Outdated
Comment thread modules/infra/api/iface.c Outdated
Comment thread modules/infra/api/iface.c Outdated
Add the interface id and primary MAC address as labels on the per-iface
metrics. The id label allows a metric series to be correlated with the
interface even when its name changes. The mac label exposes the L2
address alongside the other interface attributes already published.

This keeps the OpenMetrics endpoint self-sufficient: monitoring
consumers no longer need a side request to associate a series with its
underlying interface id or hardware address.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add iface_speed_bps as a gauge metric reporting the current link speed
in bits per second. The value is derived from gr_iface.speed (which is
stored in Megabit/sec) multiplied by 1e6. A value of 0 indicates the
speed is unknown or the link is down.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add iface_last_change_seconds gauge reporting the UNIX timestamp of the
most recent operational status transition for each interface. The
per-interface timestamp is updated from a callback subscribed to the
POST_ADD, STATUS_UP and STATUS_DOWN events.

The transition is recorded in the CLOCK_MONOTONIC domain so the value
is immune to wall-clock jumps (typically caused by NTP/PTP corrections
shortly after boot). At metric emit time, the stored monotonic seconds
are rebased onto the current wall clock using the live offset between
CLOCK_REALTIME and CLOCK_MONOTONIC. The mono-to-wall offset is
computed once per scrape rather than per interface.

A separate per-id "set" flag is used as the sentinel for "no transition
recorded yet" since CLOCK_MONOTONIC.tv_sec is 0 in the first second
after boot and cannot be used as a sentinel itself. In that case the
metric value is 0, letting consumers distinguish "never transitioned"
from a transition that genuinely happened at boot time.

Exposing the last change as a metric avoids having to subscribe to the
event stream or poll a separate API just to know when an interface last
went up or down.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
port_metrics_collect already exposes iface_port_rx_missed (imissed) and
iface_port_tx_errors (oerrors). Add iface_port_rx_errors to also surface
ierrors from rte_eth_stats. These hardware counters are port-only by
nature: drops and bad frames happen on the NIC before any sub-interface
demux (VLAN, tunnel) can attribute them, so the iface_port_ namespace
is the correct home rather than a generic iface_ metric with a 0
fallback for non-ports.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
@maxime-leroy maxime-leroy force-pushed the openmetric_stats_improvements branch from 26513a5 to 3b59ed7 Compare May 19, 2026 13:19
Add six counter metrics splitting the per-port packet counts into
broadcast, multicast and unicast components:

  iface_port_rx_broadcast_packets
  iface_port_rx_multicast_packets
  iface_port_tx_broadcast_packets
  iface_port_tx_multicast_packets
  iface_port_rx_unicast_packets
  iface_port_tx_unicast_packets

DPDK PMDs do not use a common naming scheme for these counters in
their xstats. The collector matches a short list of aliases per metric
to cover the drivers grout enables in meson.build plus dpaa2 (added
out-of-tree on LX2160A-based platforms):

  rx_/tx_unicast_packets, rx_/tx_broadcast_packets, rx_/tx_multicast_packets
      i40e, ice, iavf, mlx5
  rx_/tx_broadcast_packets, rx_/tx_multicast_packets only
      ixgbe (no unicast xstat)
  ingress_/egress_broadcast_frames, ingress_/egress_multicast_frames only
      dpaa2 (NXP DPNI terminology; no unicast xstat)

Drivers that expose only per-queue counters (virtio) or none at all
(vmxnet3, tap, null, vhost) get no series here.

Unicast is read directly from xstats when the PMD exposes it. Otherwise
it is derived from rte_eth_stats.ipackets/opackets minus broadcast and
multicast, but only when both broadcast and multicast were found in
xstats so the subtraction has a complete breakdown to work from. The
DPDK contract guarantees ipackets/opackets = unicast + bcast + mcast,
so the fallback stays inside the HW counter domain.

Metrics are emitted only for entries actually found (or derived from a
complete breakdown). PMDs without xstat support produce no series for
the missing classes.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
The vrf label on per-iface metrics is currently the numeric vrf_id
formatted as a string ("1", "2", ...), whereas the domain label
emitted in the non-VRF branch is already resolved to the parent
interface name. Align the two by resolving vrf_id to the VRF iface
and emitting its name.

Before:
  grout_iface_up{name="te9.835",mode="VRF",vrf="1",...} 1
After:
  grout_iface_up{name="te9.835",mode="VRF",vrf="main",...} 1

Now that VRFs are first-class iface objects with a stable name, the
numeric id is mostly an internal allocation artefact and the name is
what an operator or monitoring consumer expects to see.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add a vlan-specific metrics_collect callback that attaches the parent
interface name as a label on every VLAN metric. This lets consumers
reconstruct the iface stack from a single OpenMetrics scrape without
keeping the iface_metrics_collect dispatcher aware of VLAN internals.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add iface_port_rx_bytes and iface_port_tx_bytes counter metrics sourced
from rte_eth_stats.ibytes/obytes. These complement the per-class packet
counters added in the previous commit so the OpenMetrics endpoint can
expose a coherent set of HW-level byte and packet counts at the port
level.

The existing iface_rx_bytes/tx_bytes metrics stay as the SW per-iface
view, used by VLAN, BOND, BRIDGE and other virtual iface types where
no PMD xstats are available. Consumers that need a port-level total
matching the PMD's view (independent of how subinterfaces split the
traffic via VLAN demux) read iface_port_rx_bytes/tx_bytes and fall
back to iface_rx_bytes/tx_bytes for non-port iface types.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
When a VLAN-tagged packet arrives on a port, iface_input.c reassigns
d->iface to the matched VLAN sub-iface before calling IFACE_STATS_INC,
which means the parent port's iface_stats.rx_packets / rx_bytes stay
at zero for all VLAN-tagged traffic. The TX path in iface_output.c has
the symmetric behavior on packets emitted via a VLAN sub-iface.

This diverges from Linux which double-counts: ip -s link on the
physical netdev shows all wire traffic, while ip -s link on a VLAN
sub-iface shows the per-VLAN subset. The current grout behavior
results in `grcli interface stats` and the OpenMetrics scrape showing
a port with apparently zero traffic when all of it is VLAN-tagged.

Match the Linux behavior: on RX, increment the parent port's stats
in addition to the destination VLAN sub-iface; on TX, increment the
parent port's stats in addition to the source VLAN sub-iface. A second
batched accumulator (IFACE_STATS_VARS_AS / IFACE_STATS_INC_AS) is used
to preserve the existing per-iface batching optimization across the
burst, with one accumulator for the destination iface (vlan or port if
no demux) and a separate one for the parent port. The hot path is
unchanged when no VLAN demux is involved.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add a global gauge metric grout_process_start_seconds reporting the
UNIX timestamp (seconds, CLOCK_REALTIME) at which the grout daemon
started. Captured once in an RTE_INIT constructor and emitted from a
small "process" metrics collector.

The intent is to let consumers detect a daemon restart, which is the
only event that resets the counters grout exposes.

The value is captured before NTP may have synchronised the wall clock,
so it can be wrong during the very first scrapes; consumers that care
re-read it later and detect a change just like any other reset event.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Add iface_discontinuity_seconds, a per-iface gauge reporting the wall
clock at which this iface's counters last suffered a discontinuity.
Updated by direct calls to iface_discontinuity_mark() from the few
control paths that reset counters:

  - iface_create() in control/iface.c: counters start at 0 alongside
    the iface_stats zeroing already done there.
  - port_mtu_set() and port_reconfig() in control/port.c after a
    successful PMD stop+start cycle: the PMD resets its HW counters
    on every stop+start.
  - iface_discontinuity_reset_all() called from stats_reset(): every
    known iface is bulk-marked after iface_stats is zeroed.

Stored in the monotonic domain and rebased onto wall clock at emit
time using the same pattern as iface_last_change_seconds, so a reset
recorded before NTP synchronisation is exposed with the correct epoch
once the wall clock has been corrected.

Per-iface granularity lets consumers tell exactly when each iface's
counters last started afresh, regardless of whether the daemon
restarted, a stats reset was issued, or a port was reconfigured
through a stop+start cycle.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
rte_eth_stats.oerrors does not have a uniform semantic across PMDs.
On dpaa2 it is filled exclusively from egress_discarded_frames (TX
drops, not errors). On i40e/ice/iavf/mlx5 it is the sum of tx_errors
and tx_discards. Reporting it as iface_port_tx_errors conflates the
two categories.

Add iface_port_tx_dropped sourced from a new alias entry in the xstats
lookup table (tx_dropped_packets on Intel, egress_discarded_frames on
dpaa2) and rewrite iface_port_tx_errors to be oerrors minus that
value, saturating to zero. The split yields:

  Intel  : tx_errors = E, tx_dropped = D
  dpaa2  : tx_errors = 0, tx_dropped = D
  other  : tx_errors = oerrors, tx_dropped unset

PMDs without a tx_dropped xstat (ixgbe, virtio, tap, ...) keep the
previous behaviour for iface_port_tx_errors and emit no series for
iface_port_tx_dropped.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
@maxime-leroy maxime-leroy force-pushed the openmetric_stats_improvements branch from 3b59ed7 to 509ffeb Compare May 19, 2026 13:43
Comment thread modules/infra/api/iface.c
Comment on lines +278 to +280
char mac_str[18] = "00:00:00:00:00:00";
if (iface_get_eth_addr(iface, &mac_addr) == 0)
snprintf(mac_str, sizeof(mac_str), ETH_F, &mac_addr);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpler to ignore the return value of iface_get_eth_addr().

struct rte_ether_addr mac_addr = {0};
char mac_str[18];
iface_get_eth_addr(iface, &mac_addr);
snprintf(mac_str, sizeof(mac_str), ETH_F, &mac_addr);
metrics_labels_add(&ctx, "mac", mac_str, NULL);

Comment thread modules/infra/api/iface.c
Comment on lines +316 to +317
// the metric. 0 means unknown / link down.
metric_emit(&ctx, &m_speed_bps, (uint64_t)iface->speed * 1000000ULL);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be checked, but I think "unknown" is UINT32_MAX.

METRIC_GAUGE(m_rxq_size, "iface_port_rxq_size", "Number of descriptors in RX queues.");
METRIC_GAUGE(m_txq_size, "iface_port_txq_size", "Number of descriptors in TX queues.");
METRIC_COUNTER(m_rx_missed, "iface_port_rx_missed", "Number of packets dropped by HW.");
METRIC_COUNTER(m_rx_errors, "iface_port_rx_errors", "Number of RX packets with errors.");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also add iface_port_rx_nobuf --> stats.rx_nombuf

Comment on lines +872 to +890
// Derive unicast from ipackets - bcast - mcast (HW domain, DPDK
// contract) when the PMD lacks a unicast xstat. Require a complete
// bcast/mcast breakdown so partial data does not overcount unicast.
if (stats_ok && !xstat_found[XSTAT_RX_UNICAST] && xstat_found[XSTAT_RX_BROADCAST]
&& xstat_found[XSTAT_RX_MULTICAST]) {
uint64_t bc_mc = xstat_values[XSTAT_RX_BROADCAST]
+ xstat_values[XSTAT_RX_MULTICAST];
xstat_values
[XSTAT_RX_UNICAST] = stats.ipackets > bc_mc ? stats.ipackets - bc_mc : 0;
xstat_found[XSTAT_RX_UNICAST] = true;
}
if (stats_ok && !xstat_found[XSTAT_TX_UNICAST] && xstat_found[XSTAT_TX_BROADCAST]
&& xstat_found[XSTAT_TX_MULTICAST]) {
uint64_t bc_mc = xstat_values[XSTAT_TX_BROADCAST]
+ xstat_values[XSTAT_TX_MULTICAST];
xstat_values
[XSTAT_TX_UNICAST] = stats.opackets > bc_mc ? stats.opackets - bc_mc : 0;
xstat_found[XSTAT_TX_UNICAST] = true;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be addressed in the DPDK drivers xstats_get callbacks. If a NIC does not expose unicast, it should fabricate a computed counter.

const struct iface_info_vlan *vlan = iface_info_vlan(iface);
const struct iface *parent = iface_from_id(vlan->parent_id);

metrics_labels_add(ctx, "parent", parent ? parent->name : "[deleted]", NULL);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a vlan-specific metric for VLAN ID.

rte_edge_t edge;

IFACE_STATS_VARS(tx);
IFACE_STATS_VARS_AS(tx_port);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overkill. We don't need separate counters.

Comment on lines 96 to +100
IFACE_STATS_INC(tx, m, d->iface);
// Match Linux double-counting: also tally on the parent
// port when the packet was emitted via a VLAN sub-iface.
if (parent != NULL)
IFACE_STATS_INC_AS(tx_port, tx, m, parent);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move IFACE_STATS_INC() before resolving the vlan subinterface and do the double counting that way without an additional condition check.

Comment thread main/metrics.c

RTE_INIT(metrics_process_init) {
struct timespec ts = {0};
clock_gettime(CLOCK_REALTIME, &ts);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use gr_clock() ?

Comment thread modules/infra/api/iface.c
Comment on lines +247 to +251
METRIC_GAUGE(
m_discontinuity,
"iface_discontinuity_seconds",
"UNIX timestamp (seconds) of the most recent counter discontinuity."
);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenMetrics is a time series db, you shouldn't need to expose timing information which is redundant.

[XSTAT_TX_UNICAST] = {{"tx_unicast_packets", NULL}, &m_tx_unicast},
[XSTAT_TX_BROADCAST] = {{"tx_broadcast_packets", "egress_broadcast_frames"}, &m_tx_bcast},
[XSTAT_TX_MULTICAST] = {{"tx_multicast_packets", "egress_multicast_frames"}, &m_tx_mcast},
[XSTAT_TX_DROPPED] = {{"tx_dropped_packets", "egress_discarded_frames"}, &m_tx_dropped},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really prefer if we could find a unified API that does not involve vendor-specific xstats for such counters.

Could we add support in struct rte_eth_stats for tx_discards, rx_broadcast and rx_multicast?

@david-marchand any opinion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants