Persist streamed quotes#4579
Conversation
|
Reminder: Please consider backward compatibility when modifying the API specification.
Caused by: |
|
Claude finished @fleupold's task in 1m 52s —— View job PR Review: Persist streamed quotesReviewed the diff against I left one inline comment on the new persistence block in
Both are worth a conscious decision, but not blocking. The latency / DB-pressure tradeoffs are already acknowledged in the PR description, so I won't restate them. Nothing else stood out — the mock setup ( |
There was a problem hiding this comment.
Code Review
This pull request updates the streaming quote endpoint to persist generated quotes and return their corresponding IDs, aligning its behavior with the one-shot quote endpoint. The changes span the OpenAPI documentation, quote handler, streaming logic, and tests. Feedback was provided to terminate the stream immediately upon a database save failure rather than continuing, as a database error is likely a persistent infrastructure issue.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
squadgazzz
left a comment
There was a problem hiding this comment.
I have the following concerns:
- The stored quotes set now depends on the requesting client's timeout. A stream that closes early persists only the fast solvers' quotes and drops slower solvers, some of which would have returned a better price. Those partial results then sit in the shared pool, and unrelated orders gonna reuse them through
find. Wouldn't one client's timeout choice leak into another user's quote quality? The stored quote becomes the order's reference quote. quotes::findmight get slower for orders placed without a quote id. On prod, the planner runs it as a full seq scan, so the cost tracks the total row count. WETH/USDC alone is already ~500 of ~5000 non-expired quotes. Storing one row per solver would inflate that significantly.
| assert!( | ||
| response.id.is_none(), | ||
| "streaming quotes should have id == null, got {:?}", | ||
| response.id.is_some(), |
There was a problem hiding this comment.
Does anything prove that /order with that id hits the stored quote instead of re-quoting?
There was a problem hiding this comment.
Good catch, added that to quote_stream_smoke.
| match storage.save(quote.data.clone()).await { | ||
| Ok(id) => quote.id = Some(id), | ||
| Err(err) => { | ||
| yield Err(CalculateQuoteError::Other(err)); | ||
| continue; | ||
| } | ||
| } |
There was a problem hiding this comment.
This happens per quote on the hot path, so a slow write delays the event, and a failed write turns a good quote into an error event for that solver. Maybe it makes sense to still stream the quote with an id: null instead of returning an error?
There was a problem hiding this comment.
Regarding the slow write, this is the case with the regular order path today and I don't think our DB roundtrip time is material compared to the overhead of sending an auction to all connected solvers.
Regarding the failure case, I think your suggestion makes sense.
fleupold
left a comment
There was a problem hiding this comment.
Wouldn't one client's timeout choice leak into another user's quote quality? The stored quote becomes the order's reference quote.
find_quote is only invoked when an order is placed without a quote id (in which case you are right we are trying to find an existing quote with the exact same amounts). I don't see how this is an issue in practice though. A good integration should only place orders with quote ids.
Also note, that already today someone requesting an optimal quote with a very short timeout will create an entry that another order may hit when being placed without a quote id.
quotes::find might get slower for orders placed without a quote id. On prod, the planner runs it as a full seq scan, so the cost tracks the total row count.
It's true that right now it's a SeqScan, but an index exists so the reason for this is likely that the current table size is too small to justify using the index today. Once it grows (max 20x), I'm pretty confident the index will protect us from getting too slow. In any case we have existing metrics around db latency, which we can monitor.
| match storage.save(quote.data.clone()).await { | ||
| Ok(id) => quote.id = Some(id), | ||
| Err(err) => { | ||
| yield Err(CalculateQuoteError::Other(err)); | ||
| continue; | ||
| } | ||
| } |
There was a problem hiding this comment.
Regarding the slow write, this is the case with the regular order path today and I don't think our DB roundtrip time is material compared to the overhead of sending an auction to all connected solvers.
Regarding the failure case, I think your suggestion makes sense.
| assert!( | ||
| response.id.is_none(), | ||
| "streaming quotes should have id == null, got {:?}", | ||
| response.id.is_some(), |
There was a problem hiding this comment.
Good catch, added that to quote_stream_smoke.
Oh, that's right, I completely forgot about it. |
And currently, this I used this query to get some stats for orders that were created w/ and w/o quote id: https://victorialogs.dev.cow.fi/goto/bfqs2t7z0k4jkf?orgId=1 So the number of orders with no id is quite high and we probably need to adjust the |
| // error event. | ||
| match storage.save(quote.data.clone()).await { | ||
| Ok(id) => quote.id = Some(id), | ||
| Err(err) => tracing::warn!(?err, "failed to persist streamed quote"), |
There was a problem hiding this comment.
Maybe an error would help more here to catch DB issues.
| Err(err) => tracing::warn!(?err, "failed to persist streamed quote"), | |
| Err(err) => tracing::error!(?err, "failed to persist streamed quote"), |
Description
Streamed quotes currently don't carry a quote ID making it impossible to reuse them during order placement. This is problematic for a couple of reasons
Changes
How to test
Existing tests
Alternatives considered
https://app.notion.com/p/cownation/RFC-Quote-Persistence-for-Streaming-Quotes-38f8da5f04ca81a98f0fc016a3a37e81 contains two alternative AI generated options:
The first option leads to having no quote id unless we wait for the full timeout. This defeats the purpose of being able to quickly place orders as soon as we have a quote that is "good enough". Especially with the high deadlines suggested in cowprotocol/docs#638 I think we would not end up getting the final event in a good amount of cases.
In the second option, if the client terminates the connection prematurely there may be subjectivity around which quotes have been received (and therefore what the client saw as "best").
The main downside of the approach in this PR is that it adds a small amount of latency on the critical path (for the storage write), which in the grand scheme of current quote latency is likely irrelevant. It will also create more pressure on the DB, however since unused quotes are very short lived I don't expect this to be a concern in practice. It strikes me as the simplest approach therefore.