Description
Describe the enhancement requested
Feature Request
There’s currently no utf8_zfill
kernel in pyarrow.compute
, so Python’s str.zfill()
behavior can't be reproduced efficiently with Arrow arrays.
While fixing pandas-dev/pandas#61485, I noticed Series.str.zfill()
breaks when used on ArrowDtype(pa.string())
because the backend expects a string-padding kernel like utf8_rjust
, but nothing exists for zfill. For now, it has to fall back to element-wise Python ops which aren't ideal
Reproduction
import pandas as pd
import pyarrow as pa
s = pd.Series(["A", "AB", "ABC"], dtype=pd.ArrowDtype(pa.string()))
s.str.zfill(3) # Currently falls back to Python and works via slow path
Expected behavior would be
'A' → '00A'
'AB' → '0AB'
'ABC' → 'ABC'
(no change since it's already 3 chars)
What we need
A kernel like pc.utf8_zfill(array, width)
that mimics Python’s str.zfill()
:
-
Pad strings with '0' from the left to reach width
-
Optional enhancement: handle signs (+, -) same as Python
Why it matters
This will help pandas fully support .str.zfill()
for Arrow-backed string arrays, similar to how utf8_rjust
, binary_join
, etc., already work natively. It'll avoid falling back to slower Python paths and ensure parity with standard Python string behavior
Notes
I’ve temporarily added a TODO
in the pandas code to switch over once this is available.
Component(s)
Python