Skip to content

FR: Change int repr on huge values to automatically use hexadecimal #96601

Closed as not planned
@gpshead

Description

@gpshead

Problem

Now that 95778 is in, the repr of an int can fail with a ValueError based on its size because repr and str are the same for int thus huge values cannot have a repr.

We discussed this while working on that security fix but deemed that changing a repr was way beyond reason for a patch release bugfix. Raising the ValueError exception highlights the point in the code that potentially needs specific attention rather than allowing a new unexpected format of data to start showing up where it hadn't previously as a result of a patch release.

Enhancement Proposal

We could fix this annoyance if we are willing to change int's repr. For huge values we could automatically repr them as hexadecimal. str behavior would not change.

The auto-hex repr point needs to be at less bits than required to represent a sys.int_info.str_digits_check_threshold decimal digit value so that there exists no scenario in which repr of an int could fail.

>>> int('1'+('0'*(sys.int_info.str_digits_check_threshold-1))).bit_count()
738
>>> int('1'+('0'*(sys.get_int_max_str_digits()-1))).bit_count()
4966

Perhaps all integers >512 bits (to pick an arbitrary nice threshold) could repr to hexadecimal:

>>> 2**511
6703903964971298549787012499102923063739682910296196688861780721860882015036773488400937149083451713845015929093243025426876941405973284973216824503042048
>>> 2**513
0x200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
>>> str(2**513)
'26815615859885194199148049996411692254958731641184786755447122887443528060147093953603748596333806855380063716372972101707507765623893139892867298012168192'

Effectively this behavior:

def proposed_int_repr(value: int):
    if value.bit_count() <= 512:
        return repr(value)
    else:
        return hex(value)

Potential wins

  1. We return to always being able to repr an int other than a MemoryError.
  2. Less hacky code is needed to see the actual value of an int when it is huge. Notebook users for example would see the result of their huge int computation instead of a ValueError. It'd just be in hex. (REPLs emit the repr)
    On the other hand, I expect notebooks may choose to implement this in their own REPL repr code long before it is released into a CPython version that they're run on top of.
  3. People don't need to check for int and implement their own specialized repr when they always want a value.
  4. Minor: People start using hexadecimal constants for huge values in code rather than decimal when they pasted them in from a REPL. Faster parsing, shorter code.

Potential disruption

  1. Golden value tests comparing string form data.
  2. Code inadvertently using the repr expecting to always get a decimal value. Bug in user code: Should use str.
  3. Stored reprs of data consumed at a distance by other code where it previously contained decimal values. Bug in user code: repr is not a data storage and transmission format.

If we didn't choose a low limit, but instead tied the switch over point to the largest binary value that fits within sys.get_int_str_max_digits() decimal digits we'd be inconsistent between environments or programs that choose to change their digits limit but would avoid emitting hexadecimal unless we had no other choice. This variant could be thought of as:

def digit_limit_tied_proposed_int_repr(value: int):
    try:
        return repr(value)
    except ValueError:
        return hex(value)

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.12only security fixesinterpreter-core(Objects, Python, Grammar, and Parser dirs)type-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions