Skip to content

Fix unit processing in timestamps#81

Open
jubeira wants to merge 1 commit into
QuantAMMProtocol:mainfrom
balancer:fix-data-processing
Open

Fix unit processing in timestamps#81
jubeira wants to merge 1 commit into
QuantAMMProtocol:mainfrom
balancer:fix-data-processing

Conversation

@jubeira
Copy link
Copy Markdown

@jubeira jubeira commented Apr 29, 2026

There's an inconsistency in the scripts that download and parse the pricing data:

python scripts/download_data.py USDC

    File "/home/juan/prj/quantammsim/scripts/download_data.py", line 38, in <module>                                                                                                                                                                          
      update_historic_data(token, DATA_DIR_STR)                                                                                                                                                                                                               
    File "/home/juan/prj/quantammsim/quantammsim/utils/data_processing/historic_data_utils.py", line 920, in update_historic_data                                                                                                                             
      raise Exception(                                                                                                                                                                                                                                        
  Exception: Invalid unix timestamp difference found at index 0 (1970-01-01 00:25:02.942000). All differences should be 60000ms (1 minute).                                                                                                                   

Diagnosis

The error isn't caused by the downloaded data — it's caused by a pandas 3.x behavior change in how DatetimeIndex/Series are converted to int64.

Root cause

In quantammsim/utils/data_processing/amalgamated_data_utils.py:36-46 (forward_fill_ohlcv_data):

  full_index = pd.date_range(
      start=pd.to_datetime(df.index.min(), unit="ms"),
      end=pd.to_datetime(df.index.max(), unit="ms"),
      freq="1min",
  )
  full_index = full_index.astype(np.int64) // 10**6   # ← assumes ns under the hood

In pandas ≤2.x, pd.date_range(...) always produced datetime64[ns], so astype(np.int64) // 10**6 correctly converted ns → ms. In pandas 3.0, pd.to_datetime(..., unit="ms") produces datetime64[ms], and pd.date_range propagates that
resolution. Now astype(np.int64) already returns ms — dividing by 10**6 again yields tiny garbage values.

Example:

  • Index dtype: datetime64[ms]
  • OLD: astype(int64) // 10**6 → [1502942, 1502942, 1502942] # BUG
  • NEW: as_unit("ms").astype(int64) → [1502942400000, 1502942460000, 1502942520000]

1502942 ms reinterpreted as ms is 1970-01-01 00:25:02.942000 — exactly the value in the error. The forward_fill_ohlcv_data reindex then gives every row the same garbage timestamp, so the very first diff fails the equality check.

Why now?

Two contributing things landed at once:

  1. Binance Vision changed timestamp resolution in monthly archives from ms to µs starting Jan 2025, and daily archives are µs (16-digit values like 1775001600000000). The existing get_binance_vision_data lambdas at historic_data_utils.py:712-718 happen
    to handle this OK (16 digits → //10⁶ → *1000).
  2. But the µs values in turn cause pd.to_datetime(..., unit="ms") to produce ms-resolution timestamps, which then hit the latent bug above.

Proposed fix

Force the unit explicitly before converting to int64. The pattern stops depending on whatever resolution pandas chose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant