Fix unit processing in timestamps by jubeira · Pull Request #81 · QuantAMMProtocol/quantammsim

jubeira · 2026-04-29T16:52:11Z

There's an inconsistency in the scripts that download and parse the pricing data:

python scripts/download_data.py USDC

    File "/home/juan/prj/quantammsim/scripts/download_data.py", line 38, in <module>                                                                                                                                                                          
      update_historic_data(token, DATA_DIR_STR)                                                                                                                                                                                                               
    File "/home/juan/prj/quantammsim/quantammsim/utils/data_processing/historic_data_utils.py", line 920, in update_historic_data                                                                                                                             
      raise Exception(                                                                                                                                                                                                                                        
  Exception: Invalid unix timestamp difference found at index 0 (1970-01-01 00:25:02.942000). All differences should be 60000ms (1 minute).

Diagnosis

The error isn't caused by the downloaded data — it's caused by a pandas 3.x behavior change in how DatetimeIndex/Series are converted to int64.

Root cause

In quantammsim/utils/data_processing/amalgamated_data_utils.py:36-46 (forward_fill_ohlcv_data):

  full_index = pd.date_range(
      start=pd.to_datetime(df.index.min(), unit="ms"),
      end=pd.to_datetime(df.index.max(), unit="ms"),
      freq="1min",
  )
  full_index = full_index.astype(np.int64) // 10**6   # ← assumes ns under the hood

In pandas ≤2.x, pd.date_range(...) always produced datetime64[ns], so astype(np.int64) // 10**6 correctly converted ns → ms. In pandas 3.0, pd.to_datetime(..., unit="ms") produces datetime64[ms], and pd.date_range propagates that
resolution. Now astype(np.int64) already returns ms — dividing by 10**6 again yields tiny garbage values.

Example:

Index dtype: datetime64[ms]
OLD: astype(int64) // 10**6 → [1502942, 1502942, 1502942] # BUG
NEW: as_unit("ms").astype(int64) → [1502942400000, 1502942460000, 1502942520000]

1502942 ms reinterpreted as ms is 1970-01-01 00:25:02.942000 — exactly the value in the error. The forward_fill_ohlcv_data reindex then gives every row the same garbage timestamp, so the very first diff fails the equality check.

Why now?

Two contributing things landed at once:

Binance Vision changed timestamp resolution in monthly archives from ms to µs starting Jan 2025, and daily archives are µs (16-digit values like 1775001600000000). The existing get_binance_vision_data lambdas at historic_data_utils.py:712-718 happen
to handle this OK (16 digits → //10⁶ → *1000).
But the µs values in turn cause pd.to_datetime(..., unit="ms") to produce ms-resolution timestamps, which then hit the latent bug above.

Proposed fix

Force the unit explicitly before converting to int64. The pattern stops depending on whatever resolution pandas chose.

Fix unit processing in timestamps.

0326003

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unit processing in timestamps#81

Fix unit processing in timestamps#81
jubeira wants to merge 1 commit into
QuantAMMProtocol:mainfrom
balancer:fix-data-processing

jubeira commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jubeira commented Apr 29, 2026

Diagnosis

Root cause

Why now?

Proposed fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant