Description
Problem
Now that 95778 is in, the repr
of an int
can fail with a ValueError based on its size because repr
and str
are the same for int
thus huge values cannot have a repr.
We discussed this while working on that security fix but deemed that changing a repr was way beyond reason for a patch release bugfix. Raising the ValueError exception highlights the point in the code that potentially needs specific attention rather than allowing a new unexpected format of data to start showing up where it hadn't previously as a result of a patch release.
Enhancement Proposal
We could fix this annoyance if we are willing to change int
's repr. For huge values we could automatically repr them as hexadecimal. str
behavior would not change.
The auto-hex repr point needs to be at less bits than required to represent a sys.int_info.str_digits_check_threshold
decimal digit value so that there exists no scenario in which repr
of an int
could fail.
>>> int('1'+('0'*(sys.int_info.str_digits_check_threshold-1))).bit_count()
738
>>> int('1'+('0'*(sys.get_int_max_str_digits()-1))).bit_count()
4966
Perhaps all integers >512 bits (to pick an arbitrary nice threshold) could repr to hexadecimal:
>>> 2**511
6703903964971298549787012499102923063739682910296196688861780721860882015036773488400937149083451713845015929093243025426876941405973284973216824503042048
>>> 2**513
0x200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
>>> str(2**513)
'26815615859885194199148049996411692254958731641184786755447122887443528060147093953603748596333806855380063716372972101707507765623893139892867298012168192'
Effectively this behavior:
def proposed_int_repr(value: int):
if value.bit_count() <= 512:
return repr(value)
else:
return hex(value)
Potential wins
- We return to always being able to
repr
anint
other than a MemoryError. - Less hacky code is needed to see the actual value of an
int
when it is huge. Notebook users for example would see the result of their hugeint
computation instead of a ValueError. It'd just be in hex. (REPLs emit the repr)
On the other hand, I expect notebooks may choose to implement this in their own REPL repr code long before it is released into a CPython version that they're run on top of. - People don't need to check for
int
and implement their own specialized repr when they always want a value. - Minor: People start using hexadecimal constants for huge values in code rather than decimal when they pasted them in from a REPL. Faster parsing, shorter code.
Potential disruption
- Golden value tests comparing string form data.
- Code inadvertently using the
repr
expecting to always get a decimal value. Bug in user code: Should usestr
. - Stored reprs of data consumed at a distance by other code where it previously contained decimal values. Bug in user code: repr is not a data storage and transmission format.
If we didn't choose a low limit, but instead tied the switch over point to the largest binary value that fits within sys.get_int_str_max_digits()
decimal digits we'd be inconsistent between environments or programs that choose to change their digits limit but would avoid emitting hexadecimal unless we had no other choice. This variant could be thought of as:
def digit_limit_tied_proposed_int_repr(value: int):
try:
return repr(value)
except ValueError:
return hex(value)