Skip to content

Commit 5ad09cb

Browse files
committed
Merge branch '3.8-dev'
# Conflicts: # docs/src/dev/provider/index.asciidoc # docs/src/reference/gremlin-applications.asciidoc
2 parents 3c1aea1 + e679b6c commit 5ad09cb

4 files changed

Lines changed: 143 additions & 134 deletions

File tree

docs/src/reference/gremlin-applications.asciidoc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2335,7 +2335,7 @@ describeGraph(HadoopGraph)
23352335
----
23362336
23372337
[[gremlin-mcp]]
2338-
=== Gremlin MCP
2338+
== Gremlin MCP
23392339
23402340
Gremlin MCP integrates Apache TinkerPop with the Model Context Protocol (MCP) so that MCP‑capable assistants (for
23412341
example, desktop chat clients that support MCP) can discover your graph, run Gremlin traversals and exchange graph data
@@ -2383,7 +2383,7 @@ The Gremlin MCP server exposes these tools:
23832383
* `translate_gremlin_query` — Translates a Gremlin query to a target language variant with optional normalization.
23842384
* `format_gremlin_query` — Formats a Gremlin query using gremlint.
23852385
2386-
==== Schema discovery
2386+
=== Schema discovery
23872387
23882388
Schema discovery is the foundation that lets humans and AI assistants reason about a graph without prior tribal
23892389
knowledge. By automatically mapping the graph’s structure and commonly observed patterns, it produces a concise,
@@ -2403,7 +2403,7 @@ Schema discovery uses Gremlin traversals and sampling to uncover the following i
24032403
* Relationship patterns - Connectivity is derived from the labels of edges and their incident vertices.
24042404
* Enums - Properties with a small set of distinct values may be surfaced as enumerations to promote precise filters.
24052405
2406-
==== Formatting traversals
2406+
=== Formatting traversals
24072407
24082408
Gremlin is much easier to understand when it is properly formatted with appropriate line breaks and indents. An AI
24092409
assistant can format Gremlin using Gremlint via `format_gremlin_query` MCP tool which accepts any string input and
@@ -2487,7 +2487,7 @@ For example, the assistant may execute a traversal like the following:
24872487
g.V().hasLabel('person').has('age', gt(30)).out('knows').values('name')
24882488
----
24892489
2490-
==== Configuring an MCP Client
2490+
=== Configuring an MCP Client
24912491
24922492
The MCP client is responsible for launching the Gremlin MCP server and providing connection details for the Gremlin
24932493
endpoint the server should use.

docs/src/reference/gremlin-variants.asciidoc

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2878,15 +2878,13 @@ therefore cardinality functions that take a value like `list()`, `set()`, and `s
28782878
[[gremlin-python-limitations]]
28792879
=== Limitations
28802880
2881-
* Traversals that return a `Set` *might* be coerced to a `List` in Python. In the case of Python, number equality
2882-
is different from JVM languages which produces different `Set` results when those types are in use. When this case
2883-
is detected during deserialization, the `Set` is coerced to a `List` so that traversals return consistent
2884-
results within a collection across different languages. If a `Set` is needed then convert `List` results
2885-
to `Set` manually.
2886-
* Traversals that return a `Set` containing non-hashable items, such as `Dictionary`, `Set` and `List`, will be coerced
2887-
into a `List` during deserialization. Python requires set elements to be hashable, for which Gremlin does not. If a
2888-
`Set` is needed, convert elements to hashable equivalents manually (e.g. `dict` to `HashableDict`, `list` to `tuple`,
2889-
`set` to `frozenset`).
2881+
* Traversals that return a `Set` may be coerced to a `List` in Python in two cases. First, when the `Set` contains
2882+
mixed numeric types (e.g. `int` and `float`), because Python number equality differs from the JVM — a Java `Set` of
2883+
`[1, 1.0d]` has two elements, but Python considers `1 == 1.0` and would collapse them to one, so the `Set` is coerced to
2884+
a `List` to preserve all elements consistently across languages. Second, when the `Set` contains non-hashable items such
2885+
as `Dictionary`, `Set`, or `List`, because Python requires set elements to be hashable while Gremlin does not, the `Set`
2886+
is also coerced to a `List`. For this case, if a `Set` is needed, convert elements to hashable equivalents manually
2887+
(e.g. `dict` to `HashableDict`, `list` to `tuple`, `set` to `frozenset`).
28902888
* Gremlin is capable of returning `Dictionary` results that use non-hashable keys (e.g. Dictionary as a key) and Python
28912889
does not support that at a language level. Using GraphSON 3.0 or GraphBinary (after 3.5.0) makes it possible to return
28922890
such results. However, it may not be possible to serialize these maps so they can't be re-inserted (or round tripped).

docs/src/upgrade/release-3.7.x.asciidoc

Lines changed: 126 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ Gremlin Javascript now supports Node 22 and 24 alongside Node 20.
6161
6262
Gremlin Go has been upgraded to Go version 1.25.
6363
64-
==== Python Set Deserialization with Non-Hashable Elements
64+
==== Python Set-to-List Fallback
6565
6666
Traversals that return a `Set` containing non-hashable items (such as `Dictionary`, `Set`, or `List`) previously caused
6767
a `TypeError` during deserialization in Gremlin-Python. These results are now coerced to a `List` to avoid errors. This
@@ -70,10 +70,135 @@ Python hashable types manually (e.g. `dict` to `HashableDict`, `list` to `tuple`
7070
7171
See: link:https://issues.apache.org/jira/browse/TINKERPOP-3232[TINKERPOP-3232]
7272
73+
==== Remote Transaction Improvements
74+
75+
The Java driver now supports reusing existing pooled WebSocket connections for session-based requests rather than
76+
establishing a dedicated connection per session. This behavior is controlled by the `Cluster.Builder` option
77+
`reuseConnectionsForSessions`, which defaults to `false`.
78+
79+
When enabled, a `Client.SessionedChildClient` will attempt to borrow a connection from the connection pool of a standard
80+
`Client` rather than opening its own WebSocket connection. This avoids the overhead of the TCP handshake and WebSocket
81+
upgrade for each session, which can be significant when issuing many short-lived transactions.
82+
83+
[source,java]
84+
----
85+
// Enable connection reuse for sessions
86+
Cluster cluster = Cluster.build(host)
87+
.reuseConnectionsForSessions(true)
88+
.create();
89+
----
90+
91+
This feature was designed specifically for use with remote transactions, where sessions are short-lived and terminate
92+
after a `commit()` or `rollback()`. It should not be used for classic long-running session use cases where a session
93+
is used for purposes other than transactions such as remote console.
94+
95+
===== Server Configuration
96+
97+
When using `reuseConnectionsForSessions`, the server should be configured to close sessions immediately after a graph
98+
operation such as commit() or rollback() completes. Without this behavior, sessions may remain open until the session
99+
timeout expires, potentially leading to a buildup of idle sessions on the server side.
100+
101+
Some remote graph providers handle this automatically and require no additional configuration. For the reference Gremlin
102+
Server, this is controlled by the `closeSessionPostGraphOp` setting, which should be set to true. Users of other graph
103+
providers should consult their provider's documentation to determine whether this behavior is enabled by default,
104+
requires explicit configuration or is unsupported.
105+
106+
[source,yaml]
107+
----
108+
# gremlin-server.yaml
109+
closeSessionPostGraphOp: true
110+
----
111+
112+
IMPORTANT: Failing to enable `closeSessionPostGraphOp` on the server when using `reuseConnectionsForSessions` on the
113+
client will result in sessions that are not properly cleaned up. These leaked sessions will accumulate until the
114+
configured `sessionLifetimeTimeout` is reached, consuming server resources unnecessarily.
115+
116+
===== Performance
117+
118+
Performance was measured with an ad-hoc benchmark application. The application executes a configurable number of
119+
complete transaction lifecycles (begin, mutate, commit) and reports throughput and latency percentiles. Each transaction
120+
opens a session, submits one or more `addV()` operations, commits, and closes the session.
121+
122+
The benchmark varies the following parameters:
123+
124+
* *Concurrent clients* (`threads`): The number of threads issuing transactions simultaneously. A value of 1 means
125+
transactions are executed sequentially by a single client. Higher values simulate multiple application threads or
126+
service instances issuing transactions concurrently against the same server.
127+
* *Connection pool size* (`pool`): The number of WebSocket connections maintained in the pool when
128+
`reuseConnectionsForSessions` is enabled. When reuse is disabled, each session creates its own dedicated connection
129+
and this parameter does not apply (shown as `n/a`).
130+
* *Transaction weight* (`weight`): "light" transactions perform a single `addV()` plus commit. "heavy" transactions
131+
perform ten `addV()` operations plus commit, simulating a more substantial unit of work per transaction.
132+
133+
Tests were conducted both locally (client and server on the same machine) and remotely (client on the US west coast,
134+
server on the US east coast) to isolate the effect of network latency on connection setup overhead. Each scenario
135+
executed 1000 transactions after a warmup phase of 50 transactions.
136+
137+
*Local Results (same machine)*
138+
139+
[cols="3,1,1,1", options="header"]
140+
|=========================================================
141+
|Configuration |No-Reuse (tx/s) |Best-Reuse (tx/s) |Speedup
142+
|1 client, light |23.1 |26.7 |1.16x
143+
|8 clients, light |25.2 |28.5 |1.13x
144+
|16 clients, light |25.4 |27.9 |1.10x
145+
|1 client, heavy |26.0 |26.9 |1.03x
146+
|8 clients, heavy |26.4 |27.9 |1.06x
147+
|16 clients, heavy |25.8 |26.5 |1.03x
148+
|=========================================================
149+
150+
*Remote Results (west coast to east coast)*
151+
152+
[cols="3,1,1,1", options="header"]
153+
|=========================================================
154+
|Configuration |No-Reuse (tx/s) |Best-Reuse (tx/s) |Speedup
155+
|1 client, light |3.6 |7.6 |2.10x
156+
|8 clients, light |15.6 |23.0 |1.48x
157+
|16 clients, light |15.4 |25.3 |1.64x
158+
|1 client, heavy |1.4 |1.8 |1.26x
159+
|8 clients, heavy |9.2 |10.8 |1.17x
160+
|16 clients, heavy |14.5 |15.9 |1.10x
161+
|=========================================================
162+
163+
The "Best-Reuse" column reflects the highest throughput observed across all tested pool sizes (2, 4, and 8 connections)
164+
for each scenario.
165+
166+
The benefit of connection reuse is most pronounced in remote scenarios with light transactions. When the network
167+
round-trip cost is high and the transaction payload is small, the WebSocket connection setup overhead represents a
168+
larger proportion of the total transaction time. In the single-client remote light workload, connection reuse yielded a
169+
2.10x throughput improvement, as the connection handshake cost dominated the per-transaction time. With 16 concurrent
170+
clients in the same remote light scenario, throughput improved from 15.4 tx/s to 25.3 tx/s (1.64x), as the connection
171+
pool amortized the setup cost across many parallel sessions.
172+
173+
As transaction weight increases, the relative benefit diminishes because the graph operations themselves become the
174+
bottleneck rather than connection setup. In the local heavy workload scenarios, the improvement was only 3-6%, as the
175+
connection overhead was already negligible relative to the cost of the graph mutations. Even in the remote heavy
176+
scenarios, the improvement ranged from 10-26%, as the ten `addV()` operations per transaction shifted the time
177+
distribution toward server-side processing.
178+
179+
In summary, `reuseConnectionsForSessions` provides the greatest benefit when:
180+
181+
* Network latency between client and server is significant (remote deployments)
182+
* Transactions are lightweight (few operations per transaction)
183+
* Many short-lived transactions are issued in sequence or concurrently
184+
185+
See: link:https://issues.apache.org/jira/browse/TINKERPOP-3213[TINKERPOP-3213]
186+
73187
=== Upgrading for Providers
74188
75189
==== Graph System Providers
76190
191+
===== Session Changes
192+
193+
An option has been added to the Java GLV (`reuseConnectionsForSessions`) that allows for borrowing open WebSocket
194+
connections for sessions. This is primarily to reduce the overhead of new connection setup per session. This can lead
195+
to large performance gains in remote transaction scenarios where there are many small mutation traversals.
196+
197+
This option is disabled by default on the driver but providers may want to add an option that will allow sessions to end
198+
on the successful completion of a graph operation (commit/rollback). This will prevent a buildup of sessions if a user
199+
has enabled this option as the driver will *not* close the underlying WebSocket connection as a signal to end the
200+
session. Gremlin Server has added an option like this called `closeSessionPostGraphOp`. Remote graph providers are
201+
encouraged to add the same functionality.
77202
78203
==== Graph Driver Providers
79204

docs/src/upgrade/release-3.8.1.asciidoc

Lines changed: 6 additions & 120 deletions
Original file line numberDiff line numberDiff line change
@@ -128,131 +128,17 @@ This more compact representation presents a form much more in line with the manu
128128
room to improve, Gremlint now produces a format that is more likely to be usable without additional manual formatting
129129
intervention.
130130
131-
==== Remote Transaction Performance Improvements
132-
133-
The Java driver now supports reusing existing pooled WebSocket connections for session-based requests rather than
134-
establishing a dedicated connection per session. This behavior is controlled by the `Cluster.Builder` option
135-
`reuseConnectionsForSessions`, which defaults to `false`.
136-
137-
When enabled, a `Client.SessionedChildClient` will attempt to borrow a connection from the connection pool of a standard
138-
`Client` rather than opening its own WebSocket connection. This avoids the overhead of the TCP handshake and WebSocket
139-
upgrade for each session, which can be significant when issuing many short-lived transactions.
140-
141-
[source,java]
142-
----
143-
// Enable connection reuse for sessions
144-
Cluster cluster = Cluster.build(host)
145-
.reuseConnectionsForSessions(true)
146-
.create();
147-
----
148-
149-
This feature was designed specifically for use with remote transactions, where sessions are short-lived and terminate
150-
after a `commit()` or `rollback()`. It should not be used for classic long-running session use cases where a session
151-
is used for purposes other than transactions such as remote console.
152-
153-
===== Server Configuration
154-
155-
When using `reuseConnectionsForSessions`, the server must be configured with `closeSessionPostGraphOp` set to `true`.
156-
This setting instructs the server to close the session immediately after a graph-level operation such as `commit()` or
157-
`rollback()` completes. Without this setting, sessions will not be closed until the session timeout expires, leading to
158-
a buildup of idle sessions on the server side.
159-
160-
[source,yaml]
161-
----
162-
# gremlin-server.yaml
163-
closeSessionPostGraphOp: true
164-
----
165-
166-
IMPORTANT: Failing to enable `closeSessionPostGraphOp` on the server when using `reuseConnectionsForSessions` on the
167-
client will result in sessions that are not properly cleaned up. These leaked sessions will accumulate until the
168-
configured `sessionLifetimeTimeout` is reached, consuming server resources unnecessarily.
169-
170-
===== Performance
171-
172-
Performance was measured with an ad-hoc benchmark application. The application executes a configurable number of
173-
complete transaction lifecycles (begin, mutate, commit) and reports throughput and latency percentiles. Each transaction
174-
opens a session, submits one or more `addV()` operations, commits, and closes the session.
175-
176-
The benchmark varies the following parameters:
177-
178-
* *Concurrent clients* (`threads`): The number of threads issuing transactions simultaneously. A value of 1 means
179-
transactions are executed sequentially by a single client. Higher values simulate multiple application threads or
180-
service instances issuing transactions concurrently against the same server.
181-
* *Connection pool size* (`pool`): The number of WebSocket connections maintained in the pool when
182-
`reuseConnectionsForSessions` is enabled. When reuse is disabled, each session creates its own dedicated connection
183-
and this parameter does not apply (shown as `n/a`).
184-
* *Transaction weight* (`weight`): "light" transactions perform a single `addV()` plus commit. "heavy" transactions
185-
perform ten `addV()` operations plus commit, simulating a more substantial unit of work per transaction.
186-
187-
Tests were conducted both locally (client and server on the same machine) and remotely (client on the US west coast,
188-
server on the US east coast) to isolate the effect of network latency on connection setup overhead. Each scenario
189-
executed 1000 transactions after a warmup phase of 50 transactions.
190-
191-
*Local Results (same machine)*
192-
193-
[cols="3,1,1,1", options="header"]
194-
|=========================================================
195-
|Configuration |No-Reuse (tx/s) |Best-Reuse (tx/s) |Speedup
196-
|1 client, light |23.1 |26.7 |1.16x
197-
|8 clients, light |25.2 |28.5 |1.13x
198-
|16 clients, light |25.4 |27.9 |1.10x
199-
|1 client, heavy |26.0 |26.9 |1.03x
200-
|8 clients, heavy |26.4 |27.9 |1.06x
201-
|16 clients, heavy |25.8 |26.5 |1.03x
202-
|=========================================================
203-
204-
*Remote Results (west coast to east coast)*
205-
206-
[cols="3,1,1,1", options="header"]
207-
|=========================================================
208-
|Configuration |No-Reuse (tx/s) |Best-Reuse (tx/s) |Speedup
209-
|1 client, light |3.6 |7.6 |2.10x
210-
|8 clients, light |15.6 |23.0 |1.48x
211-
|16 clients, light |15.4 |25.3 |1.64x
212-
|1 client, heavy |1.4 |1.8 |1.26x
213-
|8 clients, heavy |9.2 |10.8 |1.17x
214-
|16 clients, heavy |14.5 |15.9 |1.10x
215-
|=========================================================
216-
217-
The "Best-Reuse" column reflects the highest throughput observed across all tested pool sizes (2, 4, and 8 connections)
218-
for each scenario.
219-
220-
The benefit of connection reuse is most pronounced in remote scenarios with light transactions. When the network
221-
round-trip cost is high and the transaction payload is small, the WebSocket connection setup overhead represents a
222-
larger proportion of the total transaction time. In the single-client remote light workload, connection reuse yielded a
223-
2.10x throughput improvement, as the connection handshake cost dominated the per-transaction time. With 16 concurrent
224-
clients in the same remote light scenario, throughput improved from 15.4 tx/s to 25.3 tx/s (1.64x), as the connection
225-
pool amortized the setup cost across many parallel sessions.
226-
227-
As transaction weight increases, the relative benefit diminishes because the graph operations themselves become the
228-
bottleneck rather than connection setup. In the local heavy workload scenarios, the improvement was only 3-6%, as the
229-
connection overhead was already negligible relative to the cost of the graph mutations. Even in the remote heavy
230-
scenarios, the improvement ranged from 10-26%, as the ten `addV()` operations per transaction shifted the time
231-
distribution toward server-side processing.
232-
233-
In summary, `reuseConnectionsForSessions` provides the greatest benefit when:
234-
235-
* Network latency between client and server is significant (remote deployments)
236-
* Transactions are lightweight (few operations per transaction)
237-
* Many short-lived transactions are issued in sequence or concurrently
238-
239-
See: link:https://issues.apache.org/jira/browse/TINKERPOP-3213[TINKERPOP-3213]
240-
241131
=== Upgrading for Providers
242132
243133
==== Graph System Providers
244134
245-
===== Closing Sessions On Graph Operations
246-
247-
An option has been added to the Java GLV (`reuseConnectionsForSessions`) that allows for borrowing open WebSocket
248-
connections for sessions. This is primarily to reduce the overhead of new connection setup per session. This can lead
249-
to large performance gains in remote transaction scenarios where there are many small mutation traversals.
135+
===== New Gherkin Test Tags
250136
251-
This option is disabled by default on the driver but providers may want to add an option that will allow sessions to end
252-
on the successful completion of a graph operation (commit/rollback). This will prevent a buildup of sessions if a user
253-
has enabled this option as the driver will *not* close the underlying WebSocket connection as a signal to end the
254-
session. Gremlin Server has added an option like this called `closeSessionPostGraphOp`. Remote graph providers are
255-
encouraged to add the same functionality.
137+
New Gherkin test tags have been added for scenarios that require the graph to support specific property value types:
138+
`@AllowDateTimePropertyValues`, `@AllowListPropertyValues`, `@AllowMapPropertyValues`, `@AllowSetPropertyValues`, and
139+
`@AllowUUIDPropertyValues`. Providers whose graphs do not support storing these types as property values should exclude
140+
the relevant tags in their `@CucumberOptions`. The full list of tags can be found
141+
link:https://tinkerpop.apache.org/docs/3.8.1/dev/developer/#gherkin-tags[here].
256142
257143
==== Graph Driver Providers
258144

0 commit comments

Comments
 (0)