Skip to content

chore: Refactoring all_fields iterator. Using OwnedTargetPath instead of KeyString.#25059

Open
lololozhkin wants to merge 6 commits intovectordotdev:masterfrom
lololozhkin:lololozhkin/issue-21077
Open

chore: Refactoring all_fields iterator. Using OwnedTargetPath instead of KeyString.#25059
lololozhkin wants to merge 6 commits intovectordotdev:masterfrom
lololozhkin:lololozhkin/issue-21077

Conversation

@lololozhkin
Copy link
Copy Markdown

@lololozhkin lololozhkin commented Mar 28, 2026

Summary

all_fields iterator now emits OwnedTargetPath instead of KeyString.

Vector configuration

--

How did you test this PR?

Run all the tests: both unit and integration. Added some more tests myself.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

@lololozhkin lololozhkin requested a review from a team as a code owner March 28, 2026 06:36
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 28, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@github-actions github-actions Bot added domain: sources Anything related to the Vector's sources domain: transforms Anything related to Vector's transform components domain: sinks Anything related to the Vector's sinks domain: core Anything related to core crates i.e. vector-core, core-common, etc labels Mar 28, 2026
@lololozhkin lololozhkin changed the title refactor: using OwnedTargetPath for all_fields iterator instead of KeyString chore: Refactoring all_fields iterator. Using OwnedTargetPath instead of KeyString. Mar 28, 2026
@pront pront added the no-changelog Changes in this PR do not need user-facing explanations in the release changelog label Mar 30, 2026
@pront
Copy link
Copy Markdown
Member

pront commented Mar 30, 2026

Thank you for your contribution! Before we can merge this PR, please sign our Contributor License Agreement.

To sign, copy and post the phrase below as a new comment on this PR.

Note: If the bot says your username was not found, the email used in your git commit may not be linked to your GitHub account. Fix this at github.com/settings/emails, then comment recheck to retry.

I have read the CLA Document and I hereby sign the CLA

You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Hi @lololozhkin, please follow the instructions above and we will take a look.

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@pront
Copy link
Copy Markdown
Member

pront commented Apr 4, 2026

Please sign the CLA and we will take a look at the PR afterwards, thanks.

@lololozhkin
Copy link
Copy Markdown
Author

recheck

@lololozhkin
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@lololozhkin
Copy link
Copy Markdown
Author

recheck

@lololozhkin
Copy link
Copy Markdown
Author

@pront done! Sorry for long wait, didn't understand where should I leave comment :)

Copy link
Copy Markdown
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @lololozhkin! With this PR, paths returned by the iterator can be used directly for lookups/inserts without re-parsing.


VRL's serialize_field wraps any field containing characters outside
[A-Za-z0-9_@] in quotes and escapes inner " / \ chars. So a field like my-field becomes "my-field" (with literal quote chars) when .to_string() is called.

This matters in three places:

  • InfluxDB sink
    (logs.rs#L293-L300) calls
    key.path.to_string() directly. The old code emitted the raw KeyString without quoting. This is a wire-format change that would silently break
    downstream queries matching on field names with special chars.

  • Honeycomb sink (encoder.rs#L42) passes
    log.convert_to_fields() into serde_json::json!(), which hits the new Serialize impl that also calls .to_string() for event-prefix keys. Same quoting issue for JSON keys.

  • FieldsIter Serialize impl ([all_fields.rs#L176-L182](https://github.com/lololozhkin/vector/blob/34078ba63a38b48332cd91b0aaeda1087ab4887c/lib/
    vector-core/src/event/util/log/all_fields.rs#L176-L182)) is a third place building flat string keys, with the same quoting behavior.

The New Relic sink handles this correctly with [build_unquoted_value_path](https://github.com/lololozhkin/vector/blob/34078ba63a38b48332cd91b0aaeda
1087ab4887c/src/sinks/new_relic/model.rs#L280-L294), but it's local to that sink. Since InfluxDB, Honeycomb (via Serialize), and New Relic all need the same "flat unquoted path string" logic, it would be worth centralizing this helper (the TODO in the code already suggests doing it in VRL code, but it's fine to put it a util in Vector for now) and reusing it across all three sites.

One small thing on the helper: the match only handles Field and Index. An explicit _ => unreachable!() arm would guard against future OwnedSegment variants.

Comment on lines +118 to +131
fn make_path(&mut self, component: PathComponent<'a>) -> OwnedTargetPath {
let segments = self
.path
.iter()
.chain(iter::once(&component))
.map(|val| match val {
PathComponent::Key(key_string) => OwnedSegment::Field((*key_string).to_owned()),
PathComponent::Index(idx) => OwnedSegment::Index(*idx as isize),
})
.collect();

OwnedTargetPath {
prefix: self.path_prefix,
path: OwnedValuePath { segments },
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem

Currently make_path iterates self.path and calls .to_owned() on every ancestor field name for every leaf. For a field at depth D, that's D string clones per leaf.

Perf optimization idea

Maintain a Vec<OwnedSegment> on the iterator, pushing/popping in tandem with push() / pop(). When we hit a leaf, we can do:

self.segments.push(leaf_segment);
let path = OwnedTargetPath {
    prefix: self.path_prefix,
    path: OwnedValuePath { segments: self.segments.clone() },
};
self.segments.pop();

Would help reduce allocation in hot paths that require iterating every field of every event.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, cannot understand. Check my logic for validity: vec.clone() will clone all the paths in the same way. It will call clone() on each element, right? So for each leaf we will copy all the vec, and all the D predecessor, right? Moreover, we will clone each key one more time in that vec.

As i believe, real economy could be achieved if we will store and return not owned versions of paths. But that's a lot of work...

Please, tell me if my thoughts are not correct, I really want to understand.

@lololozhkin
Copy link
Copy Markdown
Author

@pront thanks for your reply! I'll fix your comments.

@lololozhkin
Copy link
Copy Markdown
Author

@pront Hi again! I have some questions about your reply. First of all, need to mention, that i've tried to not change the logic of serialization/deserialization of the fields. I've used docs and code as a source of truth. The questions are:y

  • InfluxDB sink. You've said that paths used there are unquoted, but code in master uses convert_to_fields, which says in docs that paths emited in quoted way. moreover, code says the same thing, because convert_to_fields calls all_fields, which creates iterator with quoted paths
  • The same question about the HoneyComb: it uses convert_to_fields again, that means, paths are quoted.
  • The Serialize impl, yes, i agree, that now it uses quoted notation. Before it wrapped KeyStrings, which were build right in the iterator with it's quoting settings. What would you suggest instead now? I think it's better to remove that impl at all and build the patshs on the caller side to avoid misunderstanding about the quoting. Do you agree?

@pront
Copy link
Copy Markdown
Member

pront commented Apr 9, 2026

@pront Hi again! I have some questions about your reply. First of all, need to mention, that i've tried to not change the logic of serialization/deserialization of the fields. I've used docs and code as a source of truth. The questions are:y

  • InfluxDB sink. You've said that paths used there are unquoted, but code in master uses convert_to_fields, which says in docs that paths emited in quoted way. moreover, code says the same thing, because convert_to_fields calls all_fields, which creates iterator with quoted paths
  • The same question about the HoneyComb: it uses convert_to_fields again, that means, paths are quoted.

Hey @lololozhkin, I think you are right about InfluxDB and HoneyComb. I will double check myself later but I am pretty occupied at the moment with higher priority items.

  • The Serialize impl, yes, i agree, that now it uses quoted notation. Before it wrapped KeyStrings, which were build right in the iterator with it's quoting settings. What would you suggest instead now? I think it's better to remove that impl at all and build the patshs on the caller side to avoid misunderstanding about the quoting. Do you agree?

I agree, let's make the quoting decision visible at the call site instead of hidden in the iterator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: core Anything related to core crates i.e. vector-core, core-common, etc domain: sinks Anything related to the Vector's sinks domain: sources Anything related to the Vector's sources domain: transforms Anything related to Vector's transform components no-changelog Changes in this PR do not need user-facing explanations in the release changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor field iterators to return OwnedTargetPaths

2 participants