Skip to content

Make chardet/charset_normalizer optional? #5871

Open
@akx

Description

@akx

With a routine version bump of requirements, I noticed chardet had been switched out for charset_normalizer (which I had never heard of before) in #5797, apparently due to LGPL license concerns.

I agree with @sigmavirus24's comment #5797 (comment) that it's strange for something as central in the Python ecosystem as requests is (45k stars, 8k forks, many contributors at the time of writing) to switch to such a relatively unknown and unproven library (132 stars, 5 forks, 2 contributors) for a hard dependency in something as central in the Python ecosystem as requests is.

The release notes say you could use pip install "requests[use_chardet_on_py3]" to use chardet instead of charset_normalizer, but with that extra set both libraries get installed.

I would imagine many users don't really necessarily need the charset detection features in Requests; could we open a discussion on making both chardet/charset_normalizer optional, á la requests[chardet] or requests[charset_normalizer]?

AFAICS, the only place where chardet is actually used in requests is Response.apparent_encoding, which is used by Response.text when there is no determined encoding.

Maybe apparent_encoding could try to

  1. as a built-in first attempt, try decoding the content as UTF-8 (which would likely be successful for many cases)
  2. if neither chardet or charset_normalizer is installed, warn the user ("No encoding detection library is installed. Falling back to XXXX. Please see YYYY for instructions" or somesuch) and return e.g. ascii
  3. use either chardet library as per usual

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions