Skip to content

feat: Optimize Nimble metadata IO with MetadataCache and pinned caching (#16948)#16948

Closed
xiaoxmeng wants to merge 1 commit intofacebookincubator:mainfrom
xiaoxmeng:export-D98260058
Closed

feat: Optimize Nimble metadata IO with MetadataCache and pinned caching (#16948)#16948
xiaoxmeng wants to merge 1 commit intofacebookincubator:mainfrom
xiaoxmeng:export-D98260058

Conversation

@xiaoxmeng
Copy link
Copy Markdown
Contributor

@xiaoxmeng xiaoxmeng commented Mar 28, 2026

Summary:

X-link: facebookincubator/nimble#616

CONTEXT: Nimble's metadata IO (stripe groups, cluster index, chunk index) uses a weak-pointer cache (ReferenceCountedCache) that expires entries as soon as the last reference is dropped. For workloads that access multiple stripes, metadata gets re-read and re-parsed on every stripe access. Additionally, when AsyncDataCache is enabled, both metadata and data are cached indiscriminately — metadata is small and hot while data is large and often cold, leading to cache pollution.

WHAT:

  1. MetadataCache: New thread-safe template cache (MetadataCache.h) replacing ReferenceCountedCache with three entry types:

    • Preloaded: strong refs moved to weak cache after first access
    • Pinned: strong refs when pinEntries_=true, never evicted
    • Cached: weak refs (existing behavior)
      Uses read-lock fast path (shared_mutex) with builder called outside both locks.
  2. TabletReader: Migrated stripeGroupCache_, clusterIndexCache_, chunkIndexCache_ from ReferenceCountedCache to MetadataCache. Added pinMetadataInCache option to pin parsed metadata objects with strong references.

  3. ReaderOptions: Added pinMetadataInCache() getter/setter to velox::dwio::common::ReaderOptions. When enabled, pins metadata in the reader's MetadataCache so it survives across stripe accesses.

  4. ReaderBase: Replaced prevStripeIdentifier_ class member with local variable in setStripe() — only needs to stay alive during the call.

  5. Test coverage:

    • MetadataCacheTest: 9 tests including preload, customBuilder, preloadDoesNotOverwrite (parameterized over pin/non-pin mode), and concurrent fuzzer test with 8 threads.
    • E2EFilterTest: Extended parameterization with pinMetadataInCache.
    • E2EIndexTest: Extended parameterization with pinMetadataInCache + test loop over enableCache.
    • SelectiveNimbleReaderTest: Added pinMetadataInCache test with enableCache loop.
  6. Benchmark tuning: Changed chunk_index_min_avg_chunks default to 1.0, commented out custom sizeClassSizes, removed setIOExecutor in cluster index benchmark.

Reviewed By: tanjialiang, HuamengJiang, kewang1024

Differential Revision: D98260058

@xiaoxmeng xiaoxmeng requested a review from majetideepak as a code owner March 28, 2026 07:03
@netlify
Copy link
Copy Markdown

netlify Bot commented Mar 28, 2026

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit fc3ed98
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/69c89f4b14725200089250f7

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 28, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Mar 28, 2026

@xiaoxmeng has exported this pull request. If you are a Meta employee, you can view the originating Diff in D98260058.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 28, 2026

Build Impact Analysis

Directly Changed Targets

Target Changed Files
physical_size_aggregator_test Options.h
spark_aggregation_fuzzer_test HiveConfig.h, Options.h
velox_aggregates_GeometryAggregateTest HiveConfig.h, Options.h
velox_aggregates_reduce_agg_bm HiveConfig.h, Options.h
velox_aggregates_simple_aggregates_bm HiveConfig.h, Options.h
velox_aggregates_string_keys_bm HiveConfig.h, Options.h
velox_aggregates_test_group0 HiveConfig.h, Options.h
velox_aggregates_test_group1 HiveConfig.h, Options.h
velox_aggregates_test_group2 HiveConfig.h, Options.h
velox_aggregates_test_group3 HiveConfig.h, Options.h
velox_aggregates_test_group4 HiveConfig.h, Options.h
velox_aggregation_fuzzer HiveConfig.h, Options.h
velox_aggregation_fuzzer_base HiveConfig.h, Options.h
velox_aggregation_fuzzer_test HiveConfig.h, Options.h
velox_aggregation_result_verifier HiveConfig.h, Options.h
velox_aggregation_runner_test HiveConfig.h
velox_cache_fuzzer_lib CachedBufferedInput.h
velox_common_test Options.h
velox_core_test HiveConfig.h, Options.h
velox_driver_test HiveConfig.h, Options.h
velox_dwio_arrow_parquet_writer Options.h
velox_dwio_arrow_parquet_writer_test Options.h
velox_dwio_cache_test CachedBufferedInput.h, Options.h
velox_dwio_common CachedBufferedInput.h, Options.h
velox_dwio_common_compression Options.h
velox_dwio_common_test CachedBufferedInput.h, CachedBufferedInputTest.cpp, Options.h
velox_dwio_common_test_utils Options.h
velox_dwio_dwrf_buffered_output_stream_test Options.h
velox_dwio_dwrf_byte_rle_encoder_test Options.h
velox_dwio_dwrf_byte_rle_test Options.h
velox_dwio_dwrf_column_reader_test Options.h
velox_dwio_dwrf_column_statistics_test Options.h
velox_dwio_dwrf_common Options.h
velox_dwio_dwrf_compression_test Options.h
velox_dwio_dwrf_config_test Options.h
velox_dwio_dwrf_decompression_test Options.h
velox_dwio_dwrf_decryption_test Options.h
velox_dwio_dwrf_dictionary_encoder_test Options.h
velox_dwio_dwrf_dictionary_encoding_utils_test Options.h
velox_dwio_dwrf_encryption_test Options.h
velox_dwio_dwrf_flush_policy_test Options.h
velox_dwio_dwrf_index_builder_test Options.h
velox_dwio_dwrf_int_direct_test Options.h
velox_dwio_dwrf_int_encoder_test Options.h
velox_dwio_dwrf_layout_planner_test Options.h
velox_dwio_dwrf_reader Options.h
velox_dwio_dwrf_reader_base_test Options.h
velox_dwio_dwrf_reader_test Options.h
velox_dwio_dwrf_rle_test Options.h
velox_dwio_dwrf_rlev1_encoder_test Options.h
velox_dwio_dwrf_stripe_dictionary_cache_test Options.h
velox_dwio_dwrf_stripe_reader_base_test Options.h
velox_dwio_dwrf_stripe_stream_test Options.h
velox_dwio_dwrf_utils Options.h
velox_dwio_dwrf_utils_test Options.h
velox_dwio_dwrf_writer Options.h
velox_dwio_dwrf_writer_context_test Options.h
velox_dwio_dwrf_writer_encoding_manager_test Options.h
velox_dwio_dwrf_writer_sink_test Options.h
velox_dwio_dwrf_writer_test Options.h
velox_dwio_iceberg_reader_benchmark HiveConfig.h, Options.h
velox_dwio_iceberg_reader_benchmark_lib HiveConfig.h, Options.h
velox_dwio_native_parquet_reader Options.h
velox_dwio_orc_column_statistics_test Options.h
velox_dwio_orc_reader Options.h
velox_dwio_orc_reader_filter_test Options.h
velox_dwio_orc_reader_test Options.h
velox_dwio_parquet_page_reader_test Options.h
velox_dwio_parquet_reader Options.h
velox_dwio_parquet_reader_benchmark Options.h
velox_dwio_parquet_reader_benchmark_lib Options.h
velox_dwio_parquet_reader_test Options.h
velox_dwio_parquet_table_scan_test HiveConfig.h, Options.h
velox_dwio_parquet_tpch_test HiveConfig.h, Options.h
velox_dwio_parquet_writer Options.h
velox_dwio_text_reader Options.h
velox_dwio_text_reader_register Options.h
velox_dwio_text_writer Options.h
velox_dwio_text_writer_register Options.h
velox_dwrf_column_writer_index_test Options.h
velox_dwrf_column_writer_stats_test Options.h
velox_dwrf_column_writer_test Options.h
velox_dwrf_e2e_filter_test Options.h
velox_dwrf_e2e_reader_test Options.h
velox_dwrf_e2e_writer_test Options.h
velox_dwrf_float_column_writer_benchmark Options.h
velox_dwrf_int_encoder_benchmark Options.h
velox_dwrf_statistics_builder_utils_test Options.h
velox_dwrf_test_utils Options.h
velox_dwrf_writer_extended_test Options.h
velox_dwrf_writer_flush_test Options.h
velox_example_operator_extensibility HiveConfig.h, Options.h
velox_example_scan_orc Options.h
velox_exchange_benchmark HiveConfig.h, Options.h
velox_exchange_fuzzer HiveConfig.h, Options.h
velox_exec_SpatialJoinTest HiveConfig.h, Options.h
velox_exec_bm_duplicate_project HiveConfig.h, Options.h
velox_exec_infra_test HiveConfig.h, Options.h
velox_exec_test_group0 HiveConfig.h, Options.h
velox_exec_test_group1 HiveConfig.h, Options.h
velox_exec_test_group2 HiveConfig.h, Options.h
velox_exec_test_group3 HiveConfig.h, Options.h
velox_exec_test_group4 HiveConfig.h, Options.h
velox_exec_test_group5 HiveConfig.h, Options.h
velox_exec_test_group6 HiveConfig.h, Options.h
velox_exec_test_group7 HiveConfig.h, Options.h
velox_exec_test_lib HiveConfig.h, Options.h
velox_exec_util_test_group0 HiveConfig.h, Options.h
velox_expression_fuzzer HiveConfig.h
velox_expression_runner_test HiveConfig.h
velox_expression_test HiveConfig.h, Options.h
velox_filemetadata_test Options.h
velox_filter_project_benchmark HiveConfig.h, Options.h
velox_functions_aggregates_test_lib HiveConfig.h, Options.h
velox_functions_spark_aggregates_test HiveConfig.h, Options.h
velox_functions_window_test_lib HiveConfig.h, Options.h
velox_fuzzer_connector_test HiveConfig.h, Options.h
velox_fuzzer_util HiveConfig.h, Options.h
velox_gcs HiveConfig.h
velox_gcs_file_test HiveConfig.h
velox_gcs_insert_test HiveConfig.h, Options.h
velox_gcs_multiendpoints_test HiveConfig.h, Options.h
velox_gcsfile_example HiveConfig.h
velox_hash_join_build_benchmark HiveConfig.h, Options.h
velox_hdfs_insert_test HiveConfig.h, Options.h
velox_hive_config HiveConfig.cpp, HiveConfig.h
velox_hive_connector CachedBufferedInput.h, HiveConfig.cpp, HiveConfig.h, HiveConnectorUtil.cpp, Options.h
velox_hive_connector_test HiveConfig.h, Options.h
velox_hive_iceberg_insert_test HiveConfig.h, Options.h
velox_hive_iceberg_splitreader HiveConfig.h, Options.h
velox_hive_iceberg_test HiveConfig.h, Options.h
velox_hive_paimon_connector HiveConfig.h, Options.h
velox_hive_paimon_split Options.h
velox_hive_paimon_split_test Options.h
velox_in_10_min_demo HiveConfig.h, Options.h
velox_join_fuzzer HiveConfig.h, Options.h
velox_key_encoder_test HiveConfig.h, Options.h
velox_mark_distinct_fuzzer_lib HiveConfig.h, Options.h
velox_mark_sorted_benchmark HiveConfig.h, Options.h
velox_memory_arbitration_fuzzer HiveConfig.h, Options.h
velox_memory_test HiveConfig.h, Options.h
velox_orderby_benchmark HiveConfig.h, Options.h
velox_parquet_e2e_filter_test Options.h
velox_parquet_writer_sink_test Options.h
velox_parquet_writer_test HiveConfig.h, Options.h
velox_query_benchmark HiveConfig.h, Options.h
velox_query_trace_replayer_base HiveConfig.h, Options.h
velox_row_number_fuzzer_base_lib HiveConfig.h
velox_row_number_fuzzer_lib HiveConfig.h, Options.h
velox_rpc_operator_test HiveConfig.h, Options.h
velox_s3file_test HiveConfig.h, Options.h
velox_s3insert_test HiveConfig.h, Options.h
velox_s3metrics_test HiveConfig.h, Options.h
velox_s3multiendpoints_test HiveConfig.h, Options.h
velox_s3read_test HiveConfig.h, Options.h
velox_s3registration_test HiveConfig.h, Options.h
velox_simple_aggregate_test HiveConfig.h, Options.h
velox_sort_benchmark Options.h
velox_spark_query_runner Options.h
velox_spark_query_runner_test HiveConfig.h, Options.h
velox_spark_windows_test HiveConfig.h, Options.h
velox_spatial_join_benchmark HiveConfig.h, Options.h
velox_spatial_join_fuzzer HiveConfig.h, Options.h
velox_streaming_aggregation_benchmark HiveConfig.h, Options.h
velox_table_evolution_fuzzer_test HiveConfig.h, Options.h
velox_text_reader_test Options.h
velox_text_writer_test Options.h
velox_tool_trace_test HiveConfig.h, Options.h
velox_topn_row_number_fuzzer_lib HiveConfig.h, Options.h
velox_tpcds_connector_test HiveConfig.h, Options.h
velox_tpch_benchmark HiveConfig.h, Options.h
velox_tpch_benchmark_lib HiveConfig.h, Options.h
velox_tpch_connector_test HiveConfig.h, Options.h
velox_tpch_speed_test HiveConfig.h, Options.h
velox_wave_benchmark HiveConfig.h, Options.h
velox_wave_dwio Options.h
velox_wave_exec HiveConfig.h, Options.h
velox_wave_exec_test HiveConfig.h, Options.h
velox_wave_mock_file Options.h
velox_wave_mock_reader HiveConfig.h, Options.h
velox_window_fuzzer HiveConfig.h, Options.h
velox_window_fuzzer_test HiveConfig.h, Options.h
velox_window_prefixsort_benchmark HiveConfig.h, Options.h
velox_window_sub_partitioned_sort_benchmark HiveConfig.h, Options.h
velox_writer_fuzzer HiveConfig.h, Options.h
velox_writer_fuzzer_test HiveConfig.h

Selective Build Targets (building these covers all 293 affected)

cmake --build _build/release --target aggregate_companion_functions_test physical_size_aggregator_test presto_sql_test spark_aggregation_fuzzer_test spark_expression_fuzzer_test velox_abfs_test velox_aggregates_GeometryAggregateTest velox_aggregates_reduce_agg_bm velox_aggregates_simple_aggregates_bm velox_aggregates_string_keys_bm velox_aggregates_test_group0 velox_aggregates_test_group1 velox_aggregates_test_group2 velox_aggregates_test_group3 velox_aggregates_test_group4 velox_aggregation_fuzzer_test velox_aggregation_runner_test velox_benchmark_array_writer_no_nulls velox_benchmark_array_writer_with_nulls velox_benchmark_map_writer_no_nulls velox_benchmark_map_writer_with_nulls velox_benchmark_nested_array_writer_no_nulls velox_benchmark_nested_array_writer_with_nulls velox_cache_fuzzer velox_common_compression_test velox_common_test velox_core_test velox_driver_test velox_duckdb_conversion_test velox_dwio_arrow_parquet_writer_test velox_dwio_cache_test velox_dwio_common_bitpack_decoder_benchmark velox_dwio_common_data_buffer_benchmark velox_dwio_common_int_decoder_benchmark velox_dwio_common_test velox_dwio_dwrf_buffered_output_stream_test velox_dwio_dwrf_byte_rle_encoder_test velox_dwio_dwrf_byte_rle_test velox_dwio_dwrf_checksum_test velox_dwio_dwrf_column_reader_test velox_dwio_dwrf_column_statistics_test velox_dwio_dwrf_compression_test velox_dwio_dwrf_config_test velox_dwio_dwrf_data_buffer_holder_test velox_dwio_dwrf_decompression_test velox_dwio_dwrf_decryption_test velox_dwio_dwrf_dictionary_encoder_test velox_dwio_dwrf_dictionary_encoding_utils_test velox_dwio_dwrf_encoding_selector_test velox_dwio_dwrf_encryption_test velox_dwio_dwrf_flush_policy_test velox_dwio_dwrf_index_builder_test velox_dwio_dwrf_int_direct_test velox_dwio_dwrf_int_encoder_test velox_dwio_dwrf_layout_planner_test velox_dwio_dwrf_ratio_checker_test velox_dwio_dwrf_reader_base_test velox_dwio_dwrf_reader_test velox_dwio_dwrf_rle_test velox_dwio_dwrf_rlev1_encoder_test velox_dwio_dwrf_stream_labels_test velox_dwio_dwrf_stripe_dictionary_cache_test velox_dwio_dwrf_stripe_reader_base_test velox_dwio_dwrf_stripe_stream_test velox_dwio_dwrf_utils_test velox_dwio_dwrf_writer_context_test velox_dwio_dwrf_writer_encoding_manager_test velox_dwio_dwrf_writer_sink_test velox_dwio_dwrf_writer_test velox_dwio_iceberg_reader_benchmark velox_dwio_orc_column_statistics_test velox_dwio_orc_reader_filter_test velox_dwio_orc_reader_test velox_dwio_parquet_common_test velox_dwio_parquet_page_reader_test velox_dwio_parquet_reader_benchmark velox_dwio_parquet_reader_test velox_dwio_parquet_rlebp_decoder_test velox_dwio_parquet_structure_decoder_benchmark velox_dwio_parquet_structure_decoder_test velox_dwio_parquet_table_scan_test velox_dwio_parquet_thrift_test velox_dwio_parquet_tpch_test velox_dwrf_column_writer_index_test velox_dwrf_column_writer_stats_test velox_dwrf_column_writer_test velox_dwrf_e2e_filter_test velox_dwrf_e2e_reader_test velox_dwrf_e2e_writer_test velox_dwrf_float_column_writer_benchmark velox_dwrf_int_encoder_benchmark velox_dwrf_statistics_builder_utils_test velox_dwrf_writer_extended_test velox_dwrf_writer_flush_test velox_example_operator_extensibility velox_example_scan_orc velox_exchange_benchmark velox_exchange_fuzzer velox_exec_SpatialJoinTest velox_exec_bm_duplicate_project velox_exec_infra_test velox_exec_test_group0 velox_exec_test_group1 velox_exec_test_group2 velox_exec_test_group3 velox_exec_test_group4 velox_exec_test_group5 velox_exec_test_group6 velox_exec_test_group7 velox_exec_util_test_group0 velox_expression_fuzzer_test velox_expression_fuzzer_unit_test velox_expression_runner_test velox_expression_runner_unit_test velox_expression_test velox_expression_verifier_unit_test velox_filemetadata_test velox_filter_project_benchmark velox_function_dynamic_link_test velox_function_registry_test velox_functions_aggregates_test velox_functions_benchmarks_compare velox_functions_benchmarks_row_writer_no_nulls velox_functions_benchmarks_simdjson_function_with_expr velox_functions_benchmarks_string_writer_no_nulls velox_functions_benchmarks_url velox_functions_iceberg_test velox_functions_lib_test velox_functions_prestosql_benchmarks_array_contains velox_functions_prestosql_benchmarks_array_min_max velox_functions_prestosql_benchmarks_array_position velox_functions_prestosql_benchmarks_array_sum velox_functions_prestosql_benchmarks_bitwise velox_functions_prestosql_benchmarks_cardinality velox_functions_prestosql_benchmarks_comparisons velox_functions_prestosql_benchmarks_concat velox_functions_prestosql_benchmarks_date_time velox_functions_prestosql_benchmarks_field_reference velox_functions_prestosql_benchmarks_generic velox_functions_prestosql_benchmarks_in velox_functions_prestosql_benchmarks_map_input velox_functions_prestosql_benchmarks_map_subscript velox_functions_prestosql_benchmarks_map_zip_with velox_functions_prestosql_benchmarks_not velox_functions_prestosql_benchmarks_regexp_replace velox_functions_prestosql_benchmarks_row velox_functions_prestosql_benchmarks_string_ascii_utf_functions velox_functions_prestosql_benchmarks_uuid_cast velox_functions_prestosql_benchmarks_width_bucket velox_functions_prestosql_benchmarks_zip velox_functions_prestosql_benchmarks_zip_with velox_functions_spark_aggregates_test velox_functions_spark_test velox_functions_test velox_fuzzer_connector_test velox_gcs_file_test velox_gcs_insert_test velox_gcs_multiendpoints_test velox_gcsfile_example velox_hash_benchmark velox_hash_join_build_benchmark velox_hash_join_list_result_benchmark velox_hash_join_prepare_join_table_benchmark velox_hdfs_file_test velox_hdfs_insert_test velox_hive_connector_test velox_hive_iceberg_insert_test velox_hive_iceberg_test velox_hive_paimon_connector velox_hive_paimon_data_file_meta_test velox_hive_paimon_deletion_file_test velox_hive_paimon_row_kind_test velox_hive_paimon_split_test velox_hive_partition_function_benchmark velox_in_10_min_demo velox_join_fuzzer velox_key_encoder_test velox_mark_distinct_fuzzer velox_mark_sorted_benchmark velox_memory_arbitration_fuzzer velox_memory_test velox_orderby_benchmark velox_parquet_e2e_filter_test velox_parquet_writer_sink_test velox_parquet_writer_test velox_presto_types_fuzzer_utils_test velox_query_replayer velox_re2_functions_benchmarks velox_read_benchmark velox_row_number_fuzzer velox_rpc_operator_test velox_s3config_test velox_s3file_test velox_s3finalize_test velox_s3insert_test velox_s3metrics_test velox_s3multiendpoints_test velox_s3read_test velox_s3registration_test velox_serializer_test_group0 velox_simple_aggregate_test velox_sort_benchmark velox_spark_query_runner_test velox_spark_windows_test velox_sparksql_benchmarks_from_json velox_sparksql_benchmarks_get_funcs velox_sparksql_benchmarks_in velox_spatial_join_benchmark velox_spatial_join_fuzzer velox_spiller_aggregate_benchmark velox_spiller_join_benchmark velox_streaming_aggregation_benchmark velox_table_evolution_fuzzer_test velox_text_reader_test velox_text_writer_test velox_tool_trace_test velox_topn_row_number_fuzzer velox_tpcds_connector_test velox_tpch_benchmark velox_tpch_connector_test velox_tpch_speed_test velox_trace_file_tool velox_wave_benchmark velox_wave_exec_test velox_window_fuzzer_test velox_window_prefixsort_benchmark velox_window_sub_partitioned_sort_benchmark velox_windows_agg_test velox_windows_rank_test velox_windows_value_test velox_writer_fuzzer_test

Total affected: 293/555 targets

Warning: 1 file(s) could not be mapped to any target. A full build may be needed.

  • velox/docs/configs.rst
All affected targets (293)
  • aggregate_companion_functions_test
  • physical_size_aggregator_test
  • presto_sql_test
  • spark_aggregation_fuzzer_test
  • spark_expression_fuzzer_test
  • velox_abfs
  • velox_abfs_test
  • velox_aggregates_GeometryAggregateTest
  • velox_aggregates_reduce_agg_bm
  • velox_aggregates_simple_aggregates_bm
  • velox_aggregates_string_keys_bm
  • velox_aggregates_test_group0
  • velox_aggregates_test_group1
  • velox_aggregates_test_group2
  • velox_aggregates_test_group3
  • velox_aggregates_test_group4
  • velox_aggregation_fuzzer
  • velox_aggregation_fuzzer_base
  • velox_aggregation_fuzzer_test
  • velox_aggregation_result_verifier
  • velox_aggregation_runner_test
  • velox_benchmark_array_writer_no_nulls
  • velox_benchmark_array_writer_with_nulls
  • velox_benchmark_map_writer_no_nulls
  • velox_benchmark_map_writer_with_nulls
  • velox_benchmark_nested_array_writer_no_nulls
  • velox_benchmark_nested_array_writer_with_nulls
  • velox_cache_fuzzer
  • velox_cache_fuzzer_lib
  • velox_common_compression_test
  • velox_common_test
  • velox_core_test
  • velox_driver_test
  • velox_duckdb_conversion_test
  • velox_dwio_arrow_parquet_writer
  • velox_dwio_arrow_parquet_writer_lib
  • velox_dwio_arrow_parquet_writer_test
  • velox_dwio_arrow_parquet_writer_test_lib
  • velox_dwio_arrow_parquet_writer_util_lib
  • velox_dwio_cache_test
  • velox_dwio_common
  • velox_dwio_common_bitpack_decoder_benchmark
  • velox_dwio_common_compression
  • velox_dwio_common_data_buffer_benchmark
  • velox_dwio_common_int_decoder_benchmark
  • velox_dwio_common_test
  • velox_dwio_common_test_utils
  • velox_dwio_dwrf_buffered_output_stream_test
  • velox_dwio_dwrf_byte_rle_encoder_test
  • velox_dwio_dwrf_byte_rle_test
  • velox_dwio_dwrf_checksum_test
  • velox_dwio_dwrf_column_reader_test
  • velox_dwio_dwrf_column_statistics_test
  • velox_dwio_dwrf_common
  • velox_dwio_dwrf_compression_test
  • velox_dwio_dwrf_config_test
  • velox_dwio_dwrf_data_buffer_holder_test
  • velox_dwio_dwrf_decompression_test
  • velox_dwio_dwrf_decryption_test
  • velox_dwio_dwrf_dictionary_encoder_test
  • velox_dwio_dwrf_dictionary_encoding_utils_test
  • velox_dwio_dwrf_encoding_selector_test
  • velox_dwio_dwrf_encryption_test
  • velox_dwio_dwrf_flush_policy_test
  • velox_dwio_dwrf_index_builder_test
  • velox_dwio_dwrf_int_direct_test
  • velox_dwio_dwrf_int_encoder_test
  • velox_dwio_dwrf_layout_planner_test
  • velox_dwio_dwrf_ratio_checker_test
  • velox_dwio_dwrf_reader
  • velox_dwio_dwrf_reader_base_test
  • velox_dwio_dwrf_reader_test
  • velox_dwio_dwrf_rle_test
  • velox_dwio_dwrf_rlev1_encoder_test
  • velox_dwio_dwrf_stream_labels_test
  • velox_dwio_dwrf_stripe_dictionary_cache_test
  • velox_dwio_dwrf_stripe_reader_base_test
  • velox_dwio_dwrf_stripe_stream_test
  • velox_dwio_dwrf_utils
  • velox_dwio_dwrf_utils_test
  • velox_dwio_dwrf_writer
  • velox_dwio_dwrf_writer_context_test
  • velox_dwio_dwrf_writer_encoding_manager_test
  • velox_dwio_dwrf_writer_sink_test
  • velox_dwio_dwrf_writer_test
  • velox_dwio_faulty_file_sink
  • velox_dwio_iceberg_reader_benchmark
  • velox_dwio_iceberg_reader_benchmark_lib
  • velox_dwio_native_parquet_reader
  • velox_dwio_orc_column_statistics_test
  • velox_dwio_orc_reader
  • velox_dwio_orc_reader_filter_test
  • velox_dwio_orc_reader_test
  • velox_dwio_parquet_common
  • velox_dwio_parquet_common_test
  • velox_dwio_parquet_page_reader_test
  • velox_dwio_parquet_reader
  • velox_dwio_parquet_reader_benchmark
  • velox_dwio_parquet_reader_benchmark_lib
  • velox_dwio_parquet_reader_test
  • velox_dwio_parquet_rlebp_decoder_test
  • velox_dwio_parquet_structure_decoder_benchmark
  • velox_dwio_parquet_structure_decoder_test
  • velox_dwio_parquet_table_scan_test
  • velox_dwio_parquet_thrift_test
  • velox_dwio_parquet_tpch_test
  • velox_dwio_parquet_writer
  • velox_dwio_text_reader
  • velox_dwio_text_reader_register
  • velox_dwio_text_writer
  • velox_dwio_text_writer_register
  • velox_dwrf_column_writer_index_test
  • velox_dwrf_column_writer_stats_test
  • velox_dwrf_column_writer_test
  • velox_dwrf_e2e_filter_test
  • velox_dwrf_e2e_reader_test
  • velox_dwrf_e2e_writer_test
  • velox_dwrf_float_column_writer_benchmark
  • velox_dwrf_int_encoder_benchmark
  • velox_dwrf_statistics_builder_utils_test
  • velox_dwrf_test_utils
  • velox_dwrf_writer_extended_test
  • velox_dwrf_writer_flush_test
  • velox_example_operator_extensibility
  • velox_example_scan_orc
  • velox_exchange_benchmark
  • velox_exchange_fuzzer
  • velox_exec_SpatialJoinTest
  • velox_exec_bm_duplicate_project
  • velox_exec_infra_test
  • velox_exec_test_group0
  • velox_exec_test_group1
  • velox_exec_test_group2
  • velox_exec_test_group3
  • velox_exec_test_group4
  • velox_exec_test_group5
  • velox_exec_test_group6
  • velox_exec_test_group7
  • velox_exec_test_lib
  • velox_exec_util_test_group0
  • velox_expression_fuzzer
  • velox_expression_fuzzer_test
  • velox_expression_fuzzer_unit_test
  • velox_expression_runner
  • velox_expression_runner_test
  • velox_expression_runner_unit_test
  • velox_expression_test
  • velox_expression_test_utility
  • velox_expression_verifier
  • velox_expression_verifier_unit_test
  • velox_filemetadata_test
  • velox_filter_project_benchmark
  • velox_function_dynamic_link_test
  • velox_function_registry_test
  • velox_functions_aggregates_test
  • velox_functions_aggregates_test_lib
  • velox_functions_benchmarks_compare
  • velox_functions_benchmarks_row_writer_no_nulls
  • velox_functions_benchmarks_simdjson_function_with_expr
  • velox_functions_benchmarks_string_writer_no_nulls
  • velox_functions_benchmarks_url
  • velox_functions_iceberg_test
  • velox_functions_lib_test
  • velox_functions_prestosql_benchmarks_array_contains
  • velox_functions_prestosql_benchmarks_array_min_max
  • velox_functions_prestosql_benchmarks_array_position
  • velox_functions_prestosql_benchmarks_array_sum
  • velox_functions_prestosql_benchmarks_bitwise
  • velox_functions_prestosql_benchmarks_cardinality
  • velox_functions_prestosql_benchmarks_comparisons
  • velox_functions_prestosql_benchmarks_concat
  • velox_functions_prestosql_benchmarks_date_time
  • velox_functions_prestosql_benchmarks_field_reference
  • velox_functions_prestosql_benchmarks_generic
  • velox_functions_prestosql_benchmarks_in
  • velox_functions_prestosql_benchmarks_map_input
  • velox_functions_prestosql_benchmarks_map_subscript
  • velox_functions_prestosql_benchmarks_map_zip_with
  • velox_functions_prestosql_benchmarks_not
  • velox_functions_prestosql_benchmarks_regexp_replace
  • velox_functions_prestosql_benchmarks_row
  • velox_functions_prestosql_benchmarks_string_ascii_utf_functions
  • velox_functions_prestosql_benchmarks_uuid_cast
  • velox_functions_prestosql_benchmarks_width_bucket
  • velox_functions_prestosql_benchmarks_zip
  • velox_functions_prestosql_benchmarks_zip_with
  • velox_functions_spark_aggregates_test
  • velox_functions_spark_test
  • velox_functions_test
  • velox_functions_test_lib
  • velox_functions_window_test_lib
  • velox_fuzzer_connector_test
  • velox_fuzzer_util
  • velox_gcs
  • velox_gcs_file_test
  • velox_gcs_insert_test
  • velox_gcs_multiendpoints_test
  • velox_gcsfile_example
  • velox_hash_benchmark
  • velox_hash_join_build_benchmark
  • velox_hash_join_list_result_benchmark
  • velox_hash_join_prepare_join_table_benchmark
  • velox_hdfs
  • velox_hdfs_file_test
  • velox_hdfs_insert_test
  • velox_hive_config
  • velox_hive_connector
  • velox_hive_connector_test
  • velox_hive_iceberg_insert_test
  • velox_hive_iceberg_splitreader
  • velox_hive_iceberg_test
  • velox_hive_paimon_connector
  • velox_hive_paimon_data_file_meta_test
  • velox_hive_paimon_deletion_file_test
  • velox_hive_paimon_row_kind_test
  • velox_hive_paimon_split
  • velox_hive_paimon_split_test
  • velox_hive_partition_function_benchmark
  • velox_in_10_min_demo
  • velox_join_fuzzer
  • velox_key_encoder_test
  • velox_mark_distinct_fuzzer
  • velox_mark_distinct_fuzzer_lib
  • velox_mark_sorted_benchmark
  • velox_memory_arbitration_fuzzer
  • velox_memory_test
  • velox_orderby_benchmark
  • velox_parquet_e2e_filter_test
  • velox_parquet_writer_sink_test
  • velox_parquet_writer_test
  • velox_presto_types_fuzzer_utils_test
  • velox_query_benchmark
  • velox_query_replayer
  • velox_query_trace_replayer_base
  • velox_re2_functions_benchmarks
  • velox_read_benchmark
  • velox_row_number_fuzzer
  • velox_row_number_fuzzer_base_lib
  • velox_row_number_fuzzer_lib
  • velox_rpc_operator_test
  • velox_s3config_test
  • velox_s3file_test
  • velox_s3finalize_test
  • velox_s3fs
  • velox_s3insert_test
  • velox_s3metrics_test
  • velox_s3multiendpoints_test
  • velox_s3read_test
  • velox_s3registration_test
  • velox_serializer_test_group0
  • velox_simple_aggregate_test
  • velox_sort_benchmark
  • velox_spark_query_runner
  • velox_spark_query_runner_test
  • velox_spark_windows_test
  • velox_sparksql_benchmarks_from_json
  • velox_sparksql_benchmarks_get_funcs
  • velox_sparksql_benchmarks_in
  • velox_spatial_join_benchmark
  • velox_spatial_join_fuzzer
  • velox_spiller_aggregate_benchmark
  • velox_spiller_aggregate_benchmark_base
  • velox_spiller_join_benchmark
  • velox_spiller_join_benchmark_base
  • velox_streaming_aggregation_benchmark
  • velox_table_evolution_fuzzer_test
  • velox_text_reader_test
  • velox_text_writer_test
  • velox_tool_trace_test
  • velox_topn_row_number_fuzzer
  • velox_topn_row_number_fuzzer_lib
  • velox_tpcds_connector_test
  • velox_tpch_benchmark
  • velox_tpch_benchmark_lib
  • velox_tpch_connector_test
  • velox_tpch_speed_test
  • velox_trace_file_tool
  • velox_trace_file_tool_base
  • velox_wave_benchmark
  • velox_wave_dwio
  • velox_wave_exec
  • velox_wave_exec_test
  • velox_wave_mock_file
  • velox_wave_mock_reader
  • velox_window_fuzzer
  • velox_window_fuzzer_test
  • velox_window_prefixsort_benchmark
  • velox_window_sub_partitioned_sort_benchmark
  • velox_windows_agg_test
  • velox_windows_rank_test
  • velox_windows_value_test
  • velox_writer_fuzzer
  • velox_writer_fuzzer_test

Fast path • Graph from main@ee7708ef8383697e111815f40d60a0a1a49a8d34

@meta-codesync meta-codesync Bot changed the title feat: Optimize Nimble metadata IO with MetadataCache and pinned caching feat: Optimize Nimble metadata IO with MetadataCache and pinned caching (#16948) Mar 28, 2026
xiaoxmeng added a commit to xiaoxmeng/nimble that referenced this pull request Mar 28, 2026
…ng (facebookincubator#616)

Summary:
X-link: facebookincubator/velox#16948


CONTEXT: Nimble's metadata IO (stripe groups, cluster index, chunk index) uses a weak-pointer cache (ReferenceCountedCache) that expires entries as soon as the last reference is dropped. For workloads that access multiple stripes, metadata gets re-read and re-parsed on every stripe access. Additionally, when AsyncDataCache is enabled, both metadata and data are cached indiscriminately — metadata is small and hot while data is large and often cold, leading to cache pollution.

WHAT:
1. **MetadataCache**: New thread-safe template cache (MetadataCache.h) replacing ReferenceCountedCache with three entry types:
   - Preloaded: strong refs moved to weak cache after first access
   - Pinned: strong refs when `pinEntries_=true`, never evicted
   - Cached: weak refs (existing behavior)
   Uses read-lock fast path (shared_mutex) with builder called outside both locks.

2. **TabletReader**: Migrated stripeGroupCache_, clusterIndexCache_, chunkIndexCache_ from ReferenceCountedCache to MetadataCache. Added `pinMetadataInCache` option to pin parsed metadata objects with strong references.

3. **ReaderOptions**: Added `pinMetadataInCache()` getter/setter to velox::dwio::common::ReaderOptions. When enabled, pins metadata in the reader's MetadataCache so it survives across stripe accesses.

4. **ReaderBase**: Replaced `prevStripeIdentifier_` class member with local variable in `setStripe()` — only needs to stay alive during the call.

5. **Test coverage**:
   - MetadataCacheTest: 9 tests including preload, customBuilder, preloadDoesNotOverwrite (parameterized over pin/non-pin mode), and concurrent fuzzer test with 8 threads.
   - E2EFilterTest: Extended parameterization with pinMetadataInCache.
   - E2EIndexTest: Extended parameterization with pinMetadataInCache + test loop over enableCache.
   - SelectiveNimbleReaderTest: Added pinMetadataInCache test with enableCache loop.

6. **Benchmark tuning**: Changed chunk_index_min_avg_chunks default to 1.0, commented out custom sizeClassSizes, removed setIOExecutor in cluster index benchmark.

Reviewed By: tanjialiang, HuamengJiang, kewang1024

Differential Revision: D98260058
xiaoxmeng added a commit to xiaoxmeng/nimble that referenced this pull request Mar 28, 2026
…ng (facebookincubator#616)

Summary:
X-link: facebookincubator/velox#16948


CONTEXT: Nimble's metadata IO (stripe groups, cluster index, chunk index) uses a weak-pointer cache (ReferenceCountedCache) that expires entries as soon as the last reference is dropped. For workloads that access multiple stripes, metadata gets re-read and re-parsed on every stripe access. Additionally, when AsyncDataCache is enabled, both metadata and data are cached indiscriminately — metadata is small and hot while data is large and often cold, leading to cache pollution.

WHAT:
1. **MetadataCache**: New thread-safe template cache (MetadataCache.h) replacing ReferenceCountedCache with three entry types:
   - Preloaded: strong refs moved to weak cache after first access
   - Pinned: strong refs when `pinEntries_=true`, never evicted
   - Cached: weak refs (existing behavior)
   Uses read-lock fast path (shared_mutex) with builder called outside both locks.

2. **TabletReader**: Migrated stripeGroupCache_, clusterIndexCache_, chunkIndexCache_ from ReferenceCountedCache to MetadataCache. Added `pinMetadataInCache` option to pin parsed metadata objects with strong references.

3. **ReaderOptions**: Added `pinMetadataInCache()` getter/setter to velox::dwio::common::ReaderOptions. When enabled, pins metadata in the reader's MetadataCache so it survives across stripe accesses.

4. **ReaderBase**: Replaced `prevStripeIdentifier_` class member with local variable in `setStripe()` — only needs to stay alive during the call.

5. **Test coverage**:
   - MetadataCacheTest: 9 tests including preload, customBuilder, preloadDoesNotOverwrite (parameterized over pin/non-pin mode), and concurrent fuzzer test with 8 threads.
   - E2EFilterTest: Extended parameterization with pinMetadataInCache.
   - E2EIndexTest: Extended parameterization with pinMetadataInCache + test loop over enableCache.
   - SelectiveNimbleReaderTest: Added pinMetadataInCache test with enableCache loop.

6. **Benchmark tuning**: Changed chunk_index_min_avg_chunks default to 1.0, commented out custom sizeClassSizes, removed setIOExecutor in cluster index benchmark.

Reviewed By: tanjialiang, HuamengJiang, kewang1024

Differential Revision: D98260058
xiaoxmeng added a commit to xiaoxmeng/nimble that referenced this pull request Mar 28, 2026
…ng (facebookincubator#616)

Summary:
X-link: facebookincubator/velox#16948

Pull Request resolved: facebookincubator#616

CONTEXT: Nimble's metadata IO (stripe groups, cluster index, chunk index) uses a weak-pointer cache (ReferenceCountedCache) that expires entries as soon as the last reference is dropped. For workloads that access multiple stripes, metadata gets re-read and re-parsed on every stripe access. Additionally, when AsyncDataCache is enabled, both metadata and data are cached indiscriminately — metadata is small and hot while data is large and often cold, leading to cache pollution.

WHAT:
1. **MetadataCache**: New thread-safe template cache (MetadataCache.h) replacing ReferenceCountedCache with three entry types:
   - Preloaded: strong refs moved to weak cache after first access
   - Pinned: strong refs when `pinEntries_=true`, never evicted
   - Cached: weak refs (existing behavior)
   Uses read-lock fast path (shared_mutex) with builder called outside both locks.

2. **TabletReader**: Migrated stripeGroupCache_, clusterIndexCache_, chunkIndexCache_ from ReferenceCountedCache to MetadataCache. Added `pinMetadataInCache` option to pin parsed metadata objects with strong references.

3. **ReaderOptions**: Added `pinMetadataInCache()` getter/setter to velox::dwio::common::ReaderOptions. When enabled, pins metadata in the reader's MetadataCache so it survives across stripe accesses.

4. **ReaderBase**: Replaced `prevStripeIdentifier_` class member with local variable in `setStripe()` — only needs to stay alive during the call.

5. **Test coverage**:
   - MetadataCacheTest: 9 tests including preload, customBuilder, preloadDoesNotOverwrite (parameterized over pin/non-pin mode), and concurrent fuzzer test with 8 threads.
   - E2EFilterTest: Extended parameterization with pinMetadataInCache.
   - E2EIndexTest: Extended parameterization with pinMetadataInCache + test loop over enableCache.
   - SelectiveNimbleReaderTest: Added pinMetadataInCache test with enableCache loop.

6. **Benchmark tuning**: Changed chunk_index_min_avg_chunks default to 1.0, commented out custom sizeClassSizes, removed setIOExecutor in cluster index benchmark.

Reviewed By: tanjialiang, HuamengJiang, kewang1024

Differential Revision: D98260058
xiaoxmeng added a commit to xiaoxmeng/nimble that referenced this pull request Mar 28, 2026
…ng (facebookincubator#616)

Summary:
X-link: facebookincubator/velox#16948


CONTEXT: Nimble's metadata IO (stripe groups, cluster index, chunk index) uses a weak-pointer cache (ReferenceCountedCache) that expires entries as soon as the last reference is dropped. For workloads that access multiple stripes, metadata gets re-read and re-parsed on every stripe access. Additionally, when AsyncDataCache is enabled, both metadata and data are cached indiscriminately — metadata is small and hot while data is large and often cold, leading to cache pollution.

WHAT:
1. **MetadataCache**: New thread-safe template cache (MetadataCache.h) replacing ReferenceCountedCache with three entry types:
   - Preloaded: strong refs moved to weak cache after first access
   - Pinned: strong refs when `pinEntries_=true`, never evicted
   - Cached: weak refs (existing behavior)
   Uses read-lock fast path (shared_mutex) with builder called outside both locks.

2. **TabletReader**: Migrated stripeGroupCache_, clusterIndexCache_, chunkIndexCache_ from ReferenceCountedCache to MetadataCache. Added `pinMetadataInCache` option to pin parsed metadata objects with strong references.

3. **ReaderOptions**: Added `pinMetadataInCache()` getter/setter to velox::dwio::common::ReaderOptions. When enabled, pins metadata in the reader's MetadataCache so it survives across stripe accesses.

4. **ReaderBase**: Replaced `prevStripeIdentifier_` class member with local variable in `setStripe()` — only needs to stay alive during the call.

5. **Test coverage**:
   - MetadataCacheTest: 9 tests including preload, customBuilder, preloadDoesNotOverwrite (parameterized over pin/non-pin mode), and concurrent fuzzer test with 8 threads.
   - E2EFilterTest: Extended parameterization with pinMetadataInCache.
   - E2EIndexTest: Extended parameterization with pinMetadataInCache + test loop over enableCache.
   - SelectiveNimbleReaderTest: Added pinMetadataInCache test with enableCache loop.

6. **Benchmark tuning**: Changed chunk_index_min_avg_chunks default to 1.0, commented out custom sizeClassSizes, removed setIOExecutor in cluster index benchmark.

Reviewed By: tanjialiang, HuamengJiang, kewang1024

Differential Revision: D98260058
xiaoxmeng added a commit to xiaoxmeng/nimble that referenced this pull request Mar 28, 2026
…ng (facebookincubator#616)

Summary:
X-link: facebookincubator/velox#16948


CONTEXT: Nimble's metadata IO (stripe groups, cluster index, chunk index) uses a weak-pointer cache (ReferenceCountedCache) that expires entries as soon as the last reference is dropped. For workloads that access multiple stripes, metadata gets re-read and re-parsed on every stripe access. Additionally, when AsyncDataCache is enabled, both metadata and data are cached indiscriminately — metadata is small and hot while data is large and often cold, leading to cache pollution.

WHAT:
1. **MetadataCache**: New thread-safe template cache (MetadataCache.h) replacing ReferenceCountedCache with three entry types:
   - Preloaded: strong refs moved to weak cache after first access
   - Pinned: strong refs when `pinEntries_=true`, never evicted
   - Cached: weak refs (existing behavior)
   Uses read-lock fast path (shared_mutex) with builder called outside both locks.

2. **TabletReader**: Migrated stripeGroupCache_, clusterIndexCache_, chunkIndexCache_ from ReferenceCountedCache to MetadataCache. Added `pinMetadataInCache` option to pin parsed metadata objects with strong references.

3. **ReaderOptions**: Added `pinMetadataInCache()` getter/setter to velox::dwio::common::ReaderOptions. When enabled, pins metadata in the reader's MetadataCache so it survives across stripe accesses.

4. **ReaderBase**: Replaced `prevStripeIdentifier_` class member with local variable in `setStripe()` — only needs to stay alive during the call.

5. **Test coverage**:
   - MetadataCacheTest: 9 tests including preload, customBuilder, preloadDoesNotOverwrite (parameterized over pin/non-pin mode), and concurrent fuzzer test with 8 threads.
   - E2EFilterTest: Extended parameterization with pinMetadataInCache.
   - E2EIndexTest: Extended parameterization with pinMetadataInCache + test loop over enableCache.
   - SelectiveNimbleReaderTest: Added pinMetadataInCache test with enableCache loop.

6. **Benchmark tuning**: Changed chunk_index_min_avg_chunks default to 1.0, commented out custom sizeClassSizes, removed setIOExecutor in cluster index benchmark.

Reviewed By: tanjialiang, HuamengJiang, kewang1024

Differential Revision: D98260058
xiaoxmeng added a commit to xiaoxmeng/nimble that referenced this pull request Mar 28, 2026
…ng (facebookincubator#616)

Summary:
X-link: facebookincubator/velox#16948


CONTEXT: Nimble's metadata IO (stripe groups, cluster index, chunk index) uses a weak-pointer cache (ReferenceCountedCache) that expires entries as soon as the last reference is dropped. For workloads that access multiple stripes, metadata gets re-read and re-parsed on every stripe access. Additionally, when AsyncDataCache is enabled, both metadata and data are cached indiscriminately — metadata is small and hot while data is large and often cold, leading to cache pollution.

WHAT:
1. **MetadataCache**: New thread-safe template cache (MetadataCache.h) replacing ReferenceCountedCache with three entry types:
   - Preloaded: strong refs moved to weak cache after first access
   - Pinned: strong refs when `pinEntries_=true`, never evicted
   - Cached: weak refs (existing behavior)
   Uses read-lock fast path (shared_mutex) with builder called outside both locks.

2. **TabletReader**: Migrated stripeGroupCache_, clusterIndexCache_, chunkIndexCache_ from ReferenceCountedCache to MetadataCache. Added `pinMetadataInCache` option to pin parsed metadata objects with strong references.

3. **ReaderOptions**: Added `pinMetadataInCache()` getter/setter to velox::dwio::common::ReaderOptions. When enabled, pins metadata in the reader's MetadataCache so it survives across stripe accesses.

4. **ReaderBase**: Replaced `prevStripeIdentifier_` class member with local variable in `setStripe()` — only needs to stay alive during the call.

5. **Test coverage**:
   - MetadataCacheTest: 9 tests including preload, customBuilder, preloadDoesNotOverwrite (parameterized over pin/non-pin mode), and concurrent fuzzer test with 8 threads.
   - E2EFilterTest: Extended parameterization with pinMetadataInCache.
   - E2EIndexTest: Extended parameterization with pinMetadataInCache + test loop over enableCache.
   - SelectiveNimbleReaderTest: Added pinMetadataInCache test with enableCache loop.

6. **Benchmark tuning**: Changed chunk_index_min_avg_chunks default to 1.0, commented out custom sizeClassSizes, removed setIOExecutor in cluster index benchmark.

Reviewed By: tanjialiang, HuamengJiang, kewang1024

Differential Revision: D98260058
xiaoxmeng added a commit to xiaoxmeng/nimble that referenced this pull request Mar 28, 2026
…ng (facebookincubator#616)

Summary:
X-link: facebookincubator/velox#16948


CONTEXT: Nimble's metadata IO (stripe groups, cluster index, chunk index) uses a weak-pointer cache (ReferenceCountedCache) that expires entries as soon as the last reference is dropped. For workloads that access multiple stripes, metadata gets re-read and re-parsed on every stripe access. Additionally, when AsyncDataCache is enabled, both metadata and data are cached indiscriminately — metadata is small and hot while data is large and often cold, leading to cache pollution.

WHAT:
1. **MetadataCache**: New thread-safe template cache (MetadataCache.h) replacing ReferenceCountedCache with three entry types:
   - Preloaded: strong refs moved to weak cache after first access
   - Pinned: strong refs when `pinEntries_=true`, never evicted
   - Cached: weak refs (existing behavior)
   Uses read-lock fast path (shared_mutex) with builder called outside both locks.

2. **TabletReader**: Migrated stripeGroupCache_, clusterIndexCache_, chunkIndexCache_ from ReferenceCountedCache to MetadataCache. Added `pinMetadataInCache` option to pin parsed metadata objects with strong references.

3. **ReaderOptions**: Added `pinMetadataInCache()` getter/setter to velox::dwio::common::ReaderOptions. When enabled, pins metadata in the reader's MetadataCache so it survives across stripe accesses.

4. **ReaderBase**: Replaced `prevStripeIdentifier_` class member with local variable in `setStripe()` — only needs to stay alive during the call.

5. **Test coverage**:
   - MetadataCacheTest: 9 tests including preload, customBuilder, preloadDoesNotOverwrite (parameterized over pin/non-pin mode), and concurrent fuzzer test with 8 threads.
   - E2EFilterTest: Extended parameterization with pinMetadataInCache.
   - E2EIndexTest: Extended parameterization with pinMetadataInCache + test loop over enableCache.
   - SelectiveNimbleReaderTest: Added pinMetadataInCache test with enableCache loop.

6. **Benchmark tuning**: Changed chunk_index_min_avg_chunks default to 1.0, commented out custom sizeClassSizes, removed setIOExecutor in cluster index benchmark.

Reviewed By: tanjialiang, HuamengJiang, kewang1024

Differential Revision: D98260058
xiaoxmeng added a commit to xiaoxmeng/nimble that referenced this pull request Mar 28, 2026
…ng (facebookincubator#616)

Summary:
X-link: facebookincubator/velox#16948


CONTEXT: Nimble's metadata IO (stripe groups, cluster index, chunk index) uses a weak-pointer cache (ReferenceCountedCache) that expires entries as soon as the last reference is dropped. For workloads that access multiple stripes, metadata gets re-read and re-parsed on every stripe access. Additionally, when AsyncDataCache is enabled, both metadata and data are cached indiscriminately — metadata is small and hot while data is large and often cold, leading to cache pollution.

WHAT:
1. **MetadataCache**: New thread-safe template cache (MetadataCache.h) replacing ReferenceCountedCache with three entry types:
   - Preloaded: strong refs moved to weak cache after first access
   - Pinned: strong refs when `pinEntries_=true`, never evicted
   - Cached: weak refs (existing behavior)
   Uses read-lock fast path (shared_mutex) with builder called outside both locks.

2. **TabletReader**: Migrated stripeGroupCache_, clusterIndexCache_, chunkIndexCache_ from ReferenceCountedCache to MetadataCache. Added `pinMetadataInCache` option to pin parsed metadata objects with strong references.

3. **ReaderOptions**: Added `pinMetadataInCache()` getter/setter to velox::dwio::common::ReaderOptions. When enabled, pins metadata in the reader's MetadataCache so it survives across stripe accesses.

4. **ReaderBase**: Replaced `prevStripeIdentifier_` class member with local variable in `setStripe()` — only needs to stay alive during the call.

5. **Test coverage**:
   - MetadataCacheTest: 9 tests including preload, customBuilder, preloadDoesNotOverwrite (parameterized over pin/non-pin mode), and concurrent fuzzer test with 8 threads.
   - E2EFilterTest: Extended parameterization with pinMetadataInCache.
   - E2EIndexTest: Extended parameterization with pinMetadataInCache + test loop over enableCache.
   - SelectiveNimbleReaderTest: Added pinMetadataInCache test with enableCache loop.

6. **Benchmark tuning**: Changed chunk_index_min_avg_chunks default to 1.0, commented out custom sizeClassSizes, removed setIOExecutor in cluster index benchmark.

Reviewed By: tanjialiang, HuamengJiang, kewang1024

Differential Revision: D98260058
xiaoxmeng added a commit to xiaoxmeng/nimble that referenced this pull request Mar 29, 2026
…ng (facebookincubator#616)

Summary:
X-link: facebookincubator/velox#16948


CONTEXT: Nimble's metadata IO (stripe groups, cluster index, chunk index) uses a weak-pointer cache (ReferenceCountedCache) that expires entries as soon as the last reference is dropped. For workloads that access multiple stripes, metadata gets re-read and re-parsed on every stripe access. Additionally, when AsyncDataCache is enabled, both metadata and data are cached indiscriminately — metadata is small and hot while data is large and often cold, leading to cache pollution.

WHAT:
1. **MetadataCache**: New thread-safe template cache (MetadataCache.h) replacing ReferenceCountedCache with three entry types:
   - Preloaded: strong refs moved to weak cache after first access
   - Pinned: strong refs when `pinEntries_=true`, never evicted
   - Cached: weak refs (existing behavior)
   Uses read-lock fast path (shared_mutex) with builder called outside both locks.

2. **TabletReader**: Migrated stripeGroupCache_, clusterIndexCache_, chunkIndexCache_ from ReferenceCountedCache to MetadataCache. Added `pinMetadataInCache` option to pin parsed metadata objects with strong references.

3. **ReaderOptions**: Added `pinMetadataInCache()` getter/setter to velox::dwio::common::ReaderOptions. When enabled, pins metadata in the reader's MetadataCache so it survives across stripe accesses.

4. **ReaderBase**: Replaced `prevStripeIdentifier_` class member with local variable in `setStripe()` — only needs to stay alive during the call.

5. **Test coverage**:
   - MetadataCacheTest: 9 tests including preload, customBuilder, preloadDoesNotOverwrite (parameterized over pin/non-pin mode), and concurrent fuzzer test with 8 threads.
   - E2EFilterTest: Extended parameterization with pinMetadataInCache.
   - E2EIndexTest: Extended parameterization with pinMetadataInCache + test loop over enableCache.
   - SelectiveNimbleReaderTest: Added pinMetadataInCache test with enableCache loop.

6. **Benchmark tuning**: Changed chunk_index_min_avg_chunks default to 1.0, commented out custom sizeClassSizes, removed setIOExecutor in cluster index benchmark.

Reviewed By: tanjialiang, HuamengJiang, kewang1024

Differential Revision: D98260058
…ng (facebookincubator#16948)

Summary:
Pull Request resolved: facebookincubator#16948

X-link: facebookincubator/nimble#616

CONTEXT: Nimble's metadata IO (stripe groups, cluster index, chunk index) uses a weak-pointer cache (ReferenceCountedCache) that expires entries as soon as the last reference is dropped. For workloads that access multiple stripes, metadata gets re-read and re-parsed on every stripe access. Additionally, when AsyncDataCache is enabled, both metadata and data are cached indiscriminately — metadata is small and hot while data is large and often cold, leading to cache pollution.

WHAT:
1. **MetadataCache**: New thread-safe template cache (MetadataCache.h) replacing ReferenceCountedCache with three entry types:
   - Preloaded: strong refs moved to weak cache after first access
   - Pinned: strong refs when `pinEntries_=true`, never evicted
   - Cached: weak refs (existing behavior)
   Uses read-lock fast path (shared_mutex) with builder called outside both locks.

2. **TabletReader**: Migrated stripeGroupCache_, clusterIndexCache_, chunkIndexCache_ from ReferenceCountedCache to MetadataCache. Added `pinMetadataInCache` option to pin parsed metadata objects with strong references.

3. **ReaderOptions**: Added `pinMetadataInCache()` getter/setter to velox::dwio::common::ReaderOptions. When enabled, pins metadata in the reader's MetadataCache so it survives across stripe accesses.

4. **ReaderBase**: Replaced `prevStripeIdentifier_` class member with local variable in `setStripe()` — only needs to stay alive during the call.

5. **Test coverage**:
   - MetadataCacheTest: 9 tests including preload, customBuilder, preloadDoesNotOverwrite (parameterized over pin/non-pin mode), and concurrent fuzzer test with 8 threads.
   - E2EFilterTest: Extended parameterization with pinMetadataInCache.
   - E2EIndexTest: Extended parameterization with pinMetadataInCache + test loop over enableCache.
   - SelectiveNimbleReaderTest: Added pinMetadataInCache test with enableCache loop.

6. **Benchmark tuning**: Changed chunk_index_min_avg_chunks default to 1.0, commented out custom sizeClassSizes, removed setIOExecutor in cluster index benchmark.

Reviewed By: tanjialiang, HuamengJiang, kewang1024

Differential Revision: D98260058
@meta-codesync meta-codesync Bot closed this in 60fea0d Mar 29, 2026
meta-codesync Bot pushed a commit to facebookincubator/nimble that referenced this pull request Mar 29, 2026
…ng (#616)

Summary:
X-link: facebookincubator/velox#16948

Pull Request resolved: #616

CONTEXT: Nimble's metadata IO (stripe groups, cluster index, chunk index) uses a weak-pointer cache (ReferenceCountedCache) that expires entries as soon as the last reference is dropped. For workloads that access multiple stripes, metadata gets re-read and re-parsed on every stripe access. Additionally, when AsyncDataCache is enabled, both metadata and data are cached indiscriminately — metadata is small and hot while data is large and often cold, leading to cache pollution.

WHAT:
1. **MetadataCache**: New thread-safe template cache (MetadataCache.h) replacing ReferenceCountedCache with three entry types:
   - Preloaded: strong refs moved to weak cache after first access
   - Pinned: strong refs when `pinEntries_=true`, never evicted
   - Cached: weak refs (existing behavior)
   Uses read-lock fast path (shared_mutex) with builder called outside both locks.

2. **TabletReader**: Migrated stripeGroupCache_, clusterIndexCache_, chunkIndexCache_ from ReferenceCountedCache to MetadataCache. Added `pinMetadataInCache` option to pin parsed metadata objects with strong references.

3. **ReaderOptions**: Added `pinMetadataInCache()` getter/setter to velox::dwio::common::ReaderOptions. When enabled, pins metadata in the reader's MetadataCache so it survives across stripe accesses.

4. **ReaderBase**: Replaced `prevStripeIdentifier_` class member with local variable in `setStripe()` — only needs to stay alive during the call.

5. **Test coverage**:
   - MetadataCacheTest: 9 tests including preload, customBuilder, preloadDoesNotOverwrite (parameterized over pin/non-pin mode), and concurrent fuzzer test with 8 threads.
   - E2EFilterTest: Extended parameterization with pinMetadataInCache.
   - E2EIndexTest: Extended parameterization with pinMetadataInCache + test loop over enableCache.
   - SelectiveNimbleReaderTest: Added pinMetadataInCache test with enableCache loop.

6. **Benchmark tuning**: Changed chunk_index_min_avg_chunks default to 1.0, commented out custom sizeClassSizes, removed setIOExecutor in cluster index benchmark.

Reviewed By: tanjialiang, HuamengJiang, kewang1024

Differential Revision: D98260058

fbshipit-source-id: 0b35ba8ae543578ab5e761d652c23cff26011675
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Mar 29, 2026

This pull request has been merged in 60fea0d.

@aditi-pandit
Copy link
Copy Markdown
Collaborator

@xiaoxmeng, @Yuhta : Wondering if some of these ideas can be leveraged for ParquetReader as well.... Do you'll have any plans ?

@majetideepak @yingsu00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants