Skip to content

Oximeter: prevent ClickHouse from becoming inoperable when disk is full #10513

@jmcarp

Description

@jmcarp

ClickHouse needs free space to service certain workloads and background processes. Of course, inserting data requires available disk space, but so do updates and deletes, which require writing a mutation to disk; background merges also require scratch space on the disk. If the clickhouse disk is too full, clickhouse may not be able to merge parts, or even delete enough data to recover from the disk being full.

The recommendation to ensure adequate free space for background operations and cleanup is to set keep_free_space_bytes to some reasonable value. However, in practice, "reasonable" means "a multiple of the largest expected part size", and we use the default max_bytes_to_merge_at_max_space_in_pool of 150gib. If we don't want to reserve >150gib of buffer space for clickhouse, we may want to additionally drop keep_free_space_bytes to a lower value.

Tagging as postmortem, since this came up during discussion of a recent clickhouse outage related to disk full.

Metadata

Metadata

Assignees

No one assigned

    Labels

    postmortemGenerated during a postmortem discussion.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions