From 5b44551822a1fb9255b252d0b6425ed6e095bf98 Mon Sep 17 00:00:00 2001 From: Chida82 Date: Wed, 20 May 2026 14:51:12 +0200 Subject: [PATCH 1/2] Fix grammar and clarity in README.md for low-latency experience and session management --- README.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 76dbb586..e534805e 100644 --- a/README.md +++ b/README.md @@ -177,17 +177,19 @@ by the on-disk KV cache itself. Moreover the tools and the system prompt are all designed vertically for DeepSeek v4 Flash. This provides a few advantages: -* Low latency experience, bounded mainly by the prefill speed limits. Displaying of generated text, tool calling, start of a new session are always instantaneous. +* Low-latency experience, bounded mainly by prefill speed limits. Generated text display, tool calls, and new session startup are always instantaneous. * Live progress bar during prefill time. * No DSML tool calling conversion, the tools are handled natively in the LLM format. -* KV cache mismatch are impossible by construction, the current state is always the truth. +* KV cache mismatches are impossible by construction, the current state is always the truth. * Everything is tuned for this model. -* Ability to switch session with `/list` and `/switch` without any prefill stage. +* Ability to switch sessions with `/list` and `/switch` without any prefill stage. + + +However, while the system already works, there is still a lot of work to do +before it is ready for prime time. Once the agent reaches its desired shape, +we will *likely* split the server and the client, creating a stateful, +session-based protocol that can recreate all of this in a client-server setup. -However while the system already works, there is a lot of work to do -in order to make it ready for prime time. When finally the agent will reach -the wanted shape, we will *likely* split the server and the client creating a stateful -session-based protocol that can recreate all that in a client-server way. ## Benchmarking From 14493132f9d39bde37c5461aa48a128d04d80b4b Mon Sep 17 00:00:00 2001 From: Chida82 Date: Wed, 20 May 2026 14:51:12 +0200 Subject: [PATCH 2/2] revert for low latency experience description --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e534805e..494fdaa3 100644 --- a/README.md +++ b/README.md @@ -177,7 +177,7 @@ by the on-disk KV cache itself. Moreover the tools and the system prompt are all designed vertically for DeepSeek v4 Flash. This provides a few advantages: -* Low-latency experience, bounded mainly by prefill speed limits. Generated text display, tool calls, and new session startup are always instantaneous. +* Low latency experience, bounded mainly by prefill speed limits. Generated text display, tool calls, and new session startup are always instantaneous. * Live progress bar during prefill time. * No DSML tool calling conversion, the tools are handled natively in the LLM format. * KV cache mismatches are impossible by construction, the current state is always the truth.