Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 16 additions & 13 deletions CIPs/cip-145.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,14 @@ below, and would result in pruning of Data Events coinciding with Time Events.
* A `fork point` is the last common ancestor of two events `A` and `B`.
* A `merge point` is the first common descendant of two events `A` and `B`.
* `A` is a `covered event` if another event `B` has `A`'s CID in its `prev` field.
* Said differently, if event `B` has event `A`'s CID in its `prev` field, then event `B` "covers" event `A`. By
transitivity, event `B` also covers every event covered by event `A`.
* `A` is an `uncovered event` if there is no event with `A`'s CID in its `prev` field.
* A `diverged stream` has more than one uncovered event.
* A `converged stream` has a single uncovered event.
* A `dominant Data Event` is one that is not covered by another Data Event.
* A `non-dominant Data Event` is one that is covered by another Data Event. This applies transitively through Time
Events.
* A `diverged stream` has more than one dominant Data Event.
* A `converged stream` has a single dominant Data Event.
* An `invalid Data Event` is one that has either an invalid signature or an expired CACAO.
* A `valid Data Event` is one that has a valid signature and a valid CACAO (if applicable).

Expand All @@ -51,17 +56,15 @@ branches are pruned. Since pruning branches that contain Data Events would resul
a Data Event to contain multiple ancestor events in its `prev` field so that a new merge point Data Event can cover
multiple events.

1. If a stream is in a diverged state (see events `A` and `B` in fig. 5), we only consider branches that contain valid,
uncovered Data Events.
2. For branches that contain valid, uncovered Data Events, we only consider the first Data Event on a branch after the
fork.
3. The Data Event that is covered by the earliest Time Event wins (see event `A` in fig. 5).
4. If two uncovered Data Events are covered by Time Events at the same block height, the Data Event with the lower CID
wins.

Using multi-prev Data Events allows us to reduce the number of uncovered events and converge the stream so that there is
only a single uncovered event, without any data abandoned on pruned branches. The stream's converged/diverged state can
be determined by looking at the `prev` fields of all the Data Events for that stream.
1. If a stream is in a diverged state (see events `A` and `B` in fig. 5), we consider branches that contain dominant
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would time events get pruned?
Consider Fig. 6:

  • Data B expires at time 3<n<4
  • Data C has a time event at 4
  • If Time 3 is pruned, then it's unclear why Data B should remain valid

Copy link
Copy Markdown

@m0ar m0ar Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Since the data A branch is the tip according to the rest of the rules in fig 4, I'm not sure why we should try to merge the descendants of T1 🤔
image

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good question. @AaronGoldman and I had a good discussion about this scenario.
image
In this example, Data C can only be created after Data B has been validated to either be within the expiration timeout or to already have a valid Time Event.

We can assume that the author of Data C validated Data B because Data C follows Data B.

Of course, if Time 3 for Data B is available before Data C is created, the CID of Time 3 will be included in Data C's multi-prev.

To your point, @m0ar, including Time 1's descendants allows multi-prev to cover existing history (even if pruned) in the stream state, whereas today, pruned events are lost from the stream state.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can assume that the author of Data C validated Data B because Data C follows Data B.

Can we really make this assumption? In theory the author of Data C could "import" Data B into the log even though it's invalid, e.g. Time 3 doesn't exist or is too late.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can assume that the author of Data C validated Data B because Data C follows Data B.

Can we really make this assumption? In theory the author of Data C could "import" Data B into the log even though it's invalid, e.g. Time 3 doesn't exist or is too late.

Events like Time 3 are definitely worth pinning, and potentially worth including in the multi-prev of subsequent Data Events.

We feel that the validity of Data C does not require proof of validity of Data A or Data B. However, establishing the latter's time bounds does require recording Time 2 and Time 3. Time 2 is already part of Data C's history. Including Time 3 in a future multi-prev would ensure that it also becomes part of the stream history.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We feel that the validity of Data C does not require proof of validity of Data A or Data B. However, establishing the latter's time bounds does require recording Time 2 and Time 3. Time 2 is already part of Data C's history. Including Time 3 in a future multi-prev would ensure that it also becomes part of the stream history.

Agree that Data C's validity is not predicated on the validity of A or B. However, the validity of Data B is predicated on Time 3 being available. Therefore, it doesn't seem right to say that we can prune Time 3.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We feel that the validity of Data C does not require proof of validity of Data A or Data B. However, establishing the latter's time bounds does require recording Time 2 and Time 3. Time 2 is already part of Data C's history. Including Time 3 in a future multi-prev would ensure that it also becomes part of the stream history.

Agree that Data C's validity is not predicated on the validity of A or B. However, the validity of Data B is predicated on Time 3 being available. Therefore, it doesn't seem right to say that we can prune Time 3.

Yes, what would you think about including Time 3 in the multi-prev of the next event to be added to the stream? That's what we meant to say here:

Events like Time 3 are definitely worth pinning, and potentially worth including in the multi-prev of subsequent Data Events.

What if a new event Data D (occurring after Time 4) had a prev of [Time 4, Time 3]? Even though Time 3 will not take precedence during tip selection, it will always remain part of the stream history.

Or, let's say, Time 3 didn't show up until we already had Time 4 -> Data D -> Time 5 -> Data E -> Time 6. Then Data F (occurring after Time 6) would have a prev of [Time 6, Time 3]. Data B would remain in an "unverified" state until Time 3 was discovered.

Tracking Data B's validity this way would be a little more complicated than the usual flow, but always possible. Moreover, now all events related to the stream would be part of the DAG, which is great.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is what I mean! Just wanted to be clear that Time 3 can't be pruned if we expect the creator of Data D to include it in prev.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smrz2001 since it seems hat we are in agreement, maybe you can update the language to be clear that the TimeEvent doesn't get pruned?

Data Events. Branches containing only Time Events will get pruned.
2. For branches that contain dominant Data Events, consider the first Data Event on a branch after the fork.
3. The Data Event that is covered by the earliest Time Event wins (see event `A` in fig. 5).
4. If two Data Events share the earliest timestamp, then the branch of the Data Event with the lower CID wins.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for a Data Event to "win" in the context of multiple prev?

Copy link
Copy Markdown
Contributor Author

@smrz2001 smrz2001 Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We updated the wording. A Data Event "winning" here meant that that Data Event would be marked the tip by the protocol. An application would be able to make use of this information to create a merge Data Event, though other candidate CIDs would also be present in the prev field.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I don't understand why we need to distinguish between winning and non-winning tips? Why not just call them all tips and put them all in the prev field? I don't follow why the protocol needs to care about "winning"?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. The protocol just gives the application a list of tips. It's up to the application to decide what is "winning" and what's not. It's also up to the application to chose the order of tips in it's prev array when it's being constructed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get where you're going, but I'm not sure how well interoperability would work if two projects have different ideas on tip consensus 🤔

Or do you with application here mean the ceramic node? As in, the stream type implementation would decide on how to solve conflicts? I think that would make sense if so, as there may be other valid interpretations of this depending on the stream type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when I say protocol here I'm referring to the event streaming protocol. Stream type handlers is an application.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes full-on sense then 👌

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I don't understand why we need to distinguish between winning and non-winning tips? Why not just call them all tips and put them all in the prev field? I don't follow why the protocol needs to care about "winning"?
e.g. The protocol just gives the application a list of tips. It's up to the application to decide what is "winning" and what's not. It's also up to the application to chose the order of tips in it's prev array when it's being constructed.

To answer your question, @oed, there are two reasons for this:

  • It is simpler for applications to just be given a tip per some default, predictable algorithm. They can choose to override this order, but don't have to.
  • There is always some eventually consistent state of a diverged stream, even if the controller never comes back to create the Merge Event, because the default precedence rules can be used to determine the tip.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the application need to be given all tips anyway? In order to include them all in the prev array? Are we simply talking about the ordering of the CIDs in the returned array here?


Using multi-prev Data Events allows us to reduce the number of dominant Data Events and converge the stream so that
there is only a single dominant Data Event, without any data abandoned on pruned branches. The stream's
converged/diverged state can be determined by looking at the `prev` fields of all the Data Events for that stream.

Events that have invalid signatures cannot be tips, even if uncovered. This has important implications for Data Events
with expired CACAOs. In figures 5 and 6, if we assume event `A` has an expired CACAO, the aggregation layer can choose
Expand Down