small features: add option to save cache in parquet, save judge input…#35
small features: add option to save cache in parquet, save judge input…#35
Conversation
…, improve error handling of openrouter
| completion_A: str # completion of the first model | ||
| completion_B: str # completion of the second model | ||
| judge_completion: str # output of the judge | ||
| judge_input: str | None = None # input that was passed to the judge |
There was a problem hiding this comment.
Should this be added to the estimate_elo_ratings.py workflow as well?
There was a problem hiding this comment.
It uses judge_and_parse_prefs from this file so it is updated as well if I am not mistaken.
There was a problem hiding this comment.
Yes, but we are dropping it here because we are constructing the Dataframe manually
There was a problem hiding this comment.
good catch, I will change the elo-rating to also store input.
| return x | ||
|
|
||
| for col in df.select_dtypes(include="object").columns: | ||
| df[col] = df[col].apply(_to_python).astype(str) |
There was a problem hiding this comment.
I think we should be careful here if the dataframe can contain missing values. Calling .astype(str) on missing values (None or np.nan) converts them into strings "None" and "nan". When the parquet file is read back, they would be processed as strings instead of missing values.
There was a problem hiding this comment.
yes I agree but I dont see another way to serialize to parquet. I agree that this conversion is loosing the missingness information but I think all downstream code should probably exclude empty strings too when computing annotations.
There was a problem hiding this comment.
I searched a bit and I think we have two safe solutions:
Option 1: Parquet + JSON Sidecar
We selectively stringify only the complex objects (like dicts/lists), leave None/NaN untouched by filtering something like this:
df_cache[col] = df_cache[col].apply(
lambda x: x if x is None or (isinstance(x, float) and np.isnan(x)) else repr(_to_python(x))
)
and save a small .meta.json tracking which columns were altered. Then when reading we again filter to call ast.literal_eval only on cells that don't contain these missing values and were serialized.
Option 2: Compressed Pickle (.pkl.gz)
We could switch to df.to_pickle(cache_file, compression="gzip"). It would be simpler, but probably less storage-efficient.
Otherwise we could leave it like this and note it for later, if you think this is not a big issue at the moment.
There was a problem hiding this comment.
Option 1 seems like an overkill given that the option is for util which is deactivated by default.
Option 2 is a strong no, as we would be replacing the issue with a security issue (also we wont be getting the benefit of parquet of column storage compression).
I would be keen to merge as is, possibly adding a comment.
Alternatively, I am also fine to drop the feature and keep it in my branch.
There was a problem hiding this comment.
Then we could merge it as is so we can use it too 👍
…, improve error handling of openrouter, remove compute_cohen_kappa
Reasons: