Description
With a routine version bump of requirements, I noticed chardet
had been switched out for charset_normalizer
(which I had never heard of before) in #5797, apparently due to LGPL license concerns.
I agree with @sigmavirus24's comment #5797 (comment) that it's strange for something as central in the Python ecosystem as requests
is (45k stars, 8k forks, many contributors at the time of writing) to switch to such a relatively unknown and unproven library (132 stars, 5 forks, 2 contributors) for a hard dependency in something as central in the Python ecosystem as requests
is.
The release notes say you could use pip install "requests[use_chardet_on_py3]"
to use chardet
instead of charset_normalizer
, but with that extra set both libraries get installed.
I would imagine many users don't really necessarily need the charset detection features in Requests; could we open a discussion on making both chardet
/charset_normalizer
optional, á la requests[chardet]
or requests[charset_normalizer]
?
AFAICS, the only place where chardet
is actually used in requests
is Response.apparent_encoding
, which is used by Response.text
when there is no determined encoding.
Maybe apparent_encoding
could try to
- as a built-in first attempt, try decoding the content as UTF-8 (which would likely be successful for many cases)
- if neither
chardet
orcharset_normalizer
is installed, warn the user ("No encoding detection library is installed. Falling back to XXXX. Please see YYYY for instructions" or somesuch) and return e.g.ascii
- use either chardet library as per usual