Skip to content

cassandra-stress prints END but does not exit after ConvictionPolicy errors #915

@timtimb0t

Description

@timtimb0t

Spotted there: https://argus.scylladb.com/tests/scylla-cluster-tests/8f9a1bef-c19a-4d0d-b198-da196a99e148

Loader logs:


[2026-05-28 11:12:50.529] total,     833690459,   57339,   57339,   57339,    17.4,    15.1,    38.8,    55.5,    86.4,   116.3,10800.0,  0.03598,      0,      0,       0,       0,       0,       0
[2026-05-28 11:12:50.529] total,     833705650,   53916,   53916,   53916,    18.3,    16.7,    38.6,    47.9,    53.7,    58.6,10800.3,  0.03596,      0,      0,       0,       0,       0,       0
[2026-05-28 11:12:50.529] Results:
[2026-05-28 11:12:50.529] Op rate                   :   77,193 op/s  [WRITE: 77,193 op/s]
[2026-05-28 11:12:50.529] Partition rate            :   77,193 pk/s  [WRITE: 77,193 pk/s]
[2026-05-28 11:12:50.529] Row rate                  :   77,193 row/s [WRITE: 77,193 row/s]
[2026-05-28 11:12:50.529] Latency mean              :   12.9 ms [WRITE: 12.9 ms]
[2026-05-28 11:12:50.529] Latency median            :    9.8 ms [WRITE: 9.8 ms]
[2026-05-28 11:12:50.529] Latency 95th percentile   :   29.3 ms [WRITE: 29.3 ms]
[2026-05-28 11:12:50.529] Latency 99th percentile   :   48.6 ms [WRITE: 48.6 ms]
[2026-05-28 11:12:50.529] Latency 99.9th percentile :  203.2 ms [WRITE: 203.2 ms]
[2026-05-28 11:12:50.529] Latency max               : 44459.6 ms [WRITE: 44,459.6 ms]
[2026-05-28 11:12:50.529] Total partitions          : 833,705,650 [WRITE: 833,705,650]
[2026-05-28 11:12:50.529] Total errors              :          0 [WRITE: 0]
[2026-05-28 11:12:50.529] Total GC count            : 0
[2026-05-28 11:12:50.529] Total GC memory           : 0.000 KiB
[2026-05-28 11:12:50.529] Total GC time             :    0.0 seconds
[2026-05-28 11:12:50.529] Avg GC time               :    NaN ms
[2026-05-28 11:12:50.529] StdDev GC time            :    0.0 ms
[2026-05-28 11:12:50.529] Total operation time      : 03:00:00

[2026-05-28 11:12:50.529] END
[2026-05-28 11:24:19.709] ERROR [cluster1-nio-worker-0] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-53, inFlight=0, closed=false failed, remaining is negative -4
[2026-05-28 11:24:19.709] ERROR [cluster1-nio-worker-7] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-54, inFlight=0, closed=false failed, remaining is negative -6
[2026-05-28 11:24:19.709] ERROR [cluster1-nio-worker-4] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-50, inFlight=0, closed=false failed, remaining is negative -7
[2026-05-28 11:24:19.710] ERROR [cluster1-nio-worker-5] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-51, inFlight=0, closed=false failed, remaining is negative -2
[2026-05-28 11:24:19.710] ERROR [cluster1-nio-worker-1] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-55, inFlight=0, closed=false failed, remaining is negative -1
[2026-05-28 11:24:19.710] ERROR [cluster1-nio-worker-1] 2026-05-28 11:24:19,299 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-48, inFlight=0, closed=false failed, remaining is negative -8
[2026-05-28 11:24:19.710] ERROR [cluster1-nio-worker-6] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-52, inFlight=0, closed=false failed, remaining is negative -5
[2026-05-28 11:24:19.710] ERROR [cluster1-nio-worker-3] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-49, inFlight=0, closed=false failed, remaining is negative -3

SCT error:

(CassandraStressEvent Severity.CRITICAL) period_type=end event_id=3fa6587b-433d-4e90-883e-5c195a1af316 duration=3h34m30s: node=Node longevity-10gb-3h-2026-1-loader-node-8f9a1bef-2 [98.80.139.218 | 10.12.3.138] (Type: c6i.xlarge) (rack: RACK1)
stress_cmd=cassandra-stress write cl=QUORUM duration=180m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=SizeTieredCompactionStrategy)' -mode cql3 native -rate threads=1000 -pop seq=1..10000000 -log interval=5
errors:
Stress command execution failed with: Command did not complete within 12870 seconds!

Command: 'sudo  docker exec a4e3f0c612832820cca125e2d9caf65e5173c769674b9f213fcc2d1bc2bc9c4c /bin/sh -c \'echo TAG: loader_idx:2-cpu_idx:0-keyspace_idx:1; STRESS_TEST_MARKER=TG9PEC5F75PDBE8F3ILO; cassandra-stress write no-warmup cl=QUORUM duration=180m -schema keyspace=keyspace1 \'"\'"\'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=SizeTieredCompactionStrategy)\'"\'"\' -mode cql3 native -rate threads=1000 -pop seq=1..10000000 -log interval=5 -node 10.12.0.237,10.12.2.51,10.12.2.29,10.12.1.30,10.12.2.214,10.12.2.227 -errors skip-unsupported-columns\''

Stdout:

END
ERROR [cluster1-nio-worker-0] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-53, inFlight=0, closed=false failed, remaining is negative -4
ERROR [cluster1-nio-worker-7] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-54, inFlight=0, closed=false failed, remaining is negative -6
ERROR [cluster1-nio-worker-4] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-50, inFlight=0, closed=false failed, remaining is negative -7
ERROR [cluster1-nio-worker-5] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-51, inFlight=0, closed=false failed, remaining is negative -2
ERROR [cluster1-nio-worker-1] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-55, inFlight=0, closed=false failed, remaining is negative -1
ERROR [cluster1-nio-worker-1] 2026-05-28 11:24:19,299 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-48, inFlight=0, closed=false failed, remaining is negative -8
ERROR [cluster1-nio-worker-6] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-52, inFlight=0, closed=false failed, remaining is negative -5
ERROR [cluster1-nio-worker-3] 2026-05-28 11:24:19,298 ConvictionPolicy.java:122 - [ip-10-12-2-51.ec2.internal/10.12.2.51:9042] Connection/10.12.2.51:9042-49, inFlight=0, closed=false failed, remaining is negative -3

Stderr:

at com.datastax.shaded.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:796)
at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
at com.datastax.shaded.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
at com.datastax.shaded.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at com.datastax.shaded.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
at com.datastax.shaded.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at com.datastax.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)

The issue happened on loader-2:

Loader: 10.12.3.138
Stress marker: TG9PEC5F75PDBE8F3ILO
Container: a4e3f0c612832820cca125e2d9caf65e5173c769674b9f213fcc2d1bc2bc9c4c

The same stress marker on loader-1 (10.12.3.189) completed successfully with finished with status 0.
The post-END output contains Java driver ConvictionPolicy errors against 10.12.2.51 (node was alive at that moment, but was under disruptive nemeses before)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions