Skip to content

vector_of_kll_floats_sketches.get_quantiles() returns wrong values with float32 #63

@tyler-rt

Description

@tyler-rt
#!/usr/bin/env python3
"""
Minimal example: vector_of_kll_floats_sketches.get_quantiles() returns WRONG VALUES with float32
"""
import numpy as np
from datasketches import vector_of_kll_floats_sketches

# Create test data: 1000 samples between -100 and -10
np.random.seed(42)
test_data = np.random.uniform(-100, -10, size=(1000, 1)).astype(np.float32)

print("Test data: 1000 samples between -100 and -10")
print(f"True min: {test_data.min():.2f}, True max: {test_data.max():.2f}")

# Create sketch and add data
kll = vector_of_kll_floats_sketches(200, 1)
kll.update(test_data)

# Request p0.0001 (should be ~-100) and p0.9999 (should be ~-10)
ranks_list = [0.0001, 0.9999]
ranks_array32 = np.array(ranks_list, dtype=np.float32)
ranks_array64 = np.array(ranks_list, dtype=np.float64)


print("\n" + "="*60)
print("BUG: numpy array with dtype=np.float32 returns WRONG quantiles")
print("="*60)

quants_array = kll.get_quantiles(ranks_array32)
print(f"\nWith numpy array with dtype=np.float32: {ranks_array32}")
print(f"  p0.0001 = {quants_array[0][0]:.2f}  (expected: ~-100)")
print(f"  p0.9999 = {quants_array[0][1]:.2f}  (expected: ~-10)")
print(f"  ✗ WRONG: Both values near minimum!")

quants_array64 = kll.get_quantiles(ranks_array64)
print(f"\nWith numpy array with dtype=np.float64: {ranks_array64}")
print(f"  p0.0001 = {quants_array64[0][0]:.2f}  (expected: ~-100)")
print(f"  p0.9999 = {quants_array64[0][1]:.2f}  (expected: ~-10)")
print(f"  ✓ CORRECT")
Test data: 1000 samples between -100 and -10
True min: -99.58, True max: -10.03

============================================================
BUG: numpy array with dtype=np.float32 returns WRONG quantiles
============================================================

With numpy array with dtype=np.float32: [1.000e-04 9.999e-01]
  p0.0001 = -98.69  (expected: ~-100)
  p0.9999 = -99.50  (expected: ~-10)
  ✗ WRONG: Both values near minimum!

With numpy array with dtype=np.float64: [1.000e-04 9.999e-01]
  p0.0001 = -99.50  (expected: ~-100)
  p0.9999 = -10.28  (expected: ~-10)
  ✓ CORRECT

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions