Skip to content

[Python] Missing utf8_zfill in pyarrow.compute to support str.zfill behavior #46683

Open
@iabhi4

Description

@iabhi4

Describe the enhancement requested

Feature Request

There’s currently no utf8_zfill kernel in pyarrow.compute, so Python’s str.zfill() behavior can't be reproduced efficiently with Arrow arrays.
While fixing pandas-dev/pandas#61485, I noticed Series.str.zfill() breaks when used on ArrowDtype(pa.string()) because the backend expects a string-padding kernel like utf8_rjust, but nothing exists for zfill. For now, it has to fall back to element-wise Python ops which aren't ideal

Reproduction

import pandas as pd
import pyarrow as pa

s = pd.Series(["A", "AB", "ABC"], dtype=pd.ArrowDtype(pa.string()))
s.str.zfill(3)  # Currently falls back to Python and works via slow path

Expected behavior would be

'A' → '00A'
'AB' → '0AB'
'ABC' → 'ABC' (no change since it's already 3 chars)

What we need

A kernel like pc.utf8_zfill(array, width) that mimics Python’s str.zfill():

  • Pad strings with '0' from the left to reach width

  • Optional enhancement: handle signs (+, -) same as Python

Why it matters

This will help pandas fully support .str.zfill() for Arrow-backed string arrays, similar to how utf8_rjust, binary_join, etc., already work natively. It'll avoid falling back to slower Python paths and ensure parity with standard Python string behavior

Notes

I’ve temporarily added a TODO in the pandas code to switch over once this is available.

Component(s)

Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions