-
Notifications
You must be signed in to change notification settings - Fork 145
Description
I'm doing a concat_rows with a dataframe with category columns, and got this warning:
CategoricalRemappingWarning: Local categoricals have different encodings, expensive re-encoding is done to perform this merge operation. Consider using a StringCache or an Enum type if the categories are known in advance
Repro:
DF.concat_rows(
DF.new(%{s: ["s1"]}, dtypes: [s: :category]),
DF.new(%{s: ["s2"]}, dtypes: [s: :category])
)I found some polars docs that explain the concept behind this warning very well: https://docs.pola.rs/user-guide/expressions/categorical-data-and-enums
However, there are a couple problems with the way this warning shows up in Explorer:
- Explorer doesn't support StringCache or enum types, so I don't think there's anything I can do to address this warning.
- I think it's logged by Polars directly to stdout/stderr, and doesn't go through the elixir logger. This means I don't get the normal metadata and log formatting that I configure through elixir, and I'm not sure I could even filter it out with Logger configuration if I wanted to. (Not sure on this though)
So the warning is just noise to me and there's nothing I can do to quiet it.
Some potential ways that Explorer could help with this:
- Support enum types.
- Support polars string cache (or other ways to control the category encoding).
- Add a note to the docs near
:categoryacknowledging this situation. - Adjust how polars logging gets handled by elixir (is it possible to pass it through Logger?), or at least document what happens to polars logs.
- Silence this specific warning from polars. (This PR might be a clue as to how: feat: Specific performance warnings from Rust to Python pola-rs/polars#12802 )
But, these would be pretty big changes to make just to quiet a log line, and also I'm only using the polars backend and haven't thought through how this would affect other backends.
So I'm not sure if there's anything to actually do here, feel free to close. I'm mostly making this issue to record that the issue exists and write down what I've learned in case others are looking for more info about it, I'm not requesting any particular fix.