Skip to content

Corrupted block detected during decompression #102

@YutingWang98

Description

@YutingWang98

Hi, we are seeing some zstd corruption error during shuffle read recently.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 300 in stage 7.0 failed 4 times, most recent failure: Lost task 300.3 in stage 7.0 (TID 5866) (100.65.134.162 executor 200): com.github.luben.zstd.ZstdException: Corrupted block detected
	at com.github.luben.zstd.ZstdDecompressCtx.decompressByteArray(ZstdDecompressCtx.java:216)
	at com.github.luben.zstd.Zstd.decompressByteArray(Zstd.java:409)
	at org.apache.spark.shuffle.rss.BlockDownloaderPartitionRecordIterator.fetchNextDeserializationIterator(BlockDownloaderPartitionRecordIterator.scala:178)

It seems not related to the input files since the spark job succeeded after we retry. Any ideas why and if this is related to rss client/server? Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions