Skip to content

Request with data which consists of empty values only sends bad request #6122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

romanyakovlev
Copy link

@romanyakovlev romanyakovlev commented May 5, 2022

Case - request with data which consists of empty values only

get('http://localhost:80', data={'foo': None})

Response in nginx:

172.17.0.1 - - [05/May/2022:19:34:30 +0000] "GET / HTTP/1.1" 200 615 "-" "python-requests/2.27.1" "-"
172.17.0.1 - - [05/May/2022:19:34:30 +0000] "0" 400 157 "-" "-" "-"

So it sends second request with bad status code.
Here https://github.com/psf/requests/blob/main/requests/models.py#L576 length will be 0 so there is no Content-Length: 0 header in request.
The problem occurs there https://github.com/psf/requests/blob/main/requests/adapters.py#L471 .
Because request.body is '' and 'Content-Length' not in request.headers it counts as chunk=True.
Because of that it acts like it has Transfer-Encoding: chunked header, and here https://github.com/psf/requests/blob/main/requests/adapters.py#L523-L528 it does not send nothing but low_conn.send(b'0\r\n\r\n').
I guess thats why It has bad request like this:

apache_1  | 172.21.0.1 - - [05/May/2022:23:05:44 +0000] "0" 400 226

The same behavior is happening on POST request.

post('http://localhost:80', data={'foo': None})

gives:

apache_1  | 172.21.0.1 - - [05/May/2022:23:05:44 +0000] "POST / HTTP/1.1" 200 45
apache_1  | 172.21.0.1 - - [05/May/2022:23:05:44 +0000] "0" 400 226

The raw request will be something like this:

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("localhost", 80))
s.send(
    b'GET / HTTP/1.1\r\n'
    b'Host: localhost:80\r\n'
    b'User-Agent: python-requests/2.27.1\r\n'
    b'Accept-Encoding: gzip, deflate, br\r\n'
    b'Accept: */*\r\n'
    b'Connection: keep-alive\r\n'
    b'\r\n'
    b'0\r\n\r\n' # this thing is added here https://github.com/psf/requests/blob/main/requests/adapters.py#L528
)

response = s.recv(4096)
print(response)

so 0\r\n\r\n is the reason of

apache_1 | 172.21.0.1 - - [05/May/2022:23:05:44 +0000] "0" 400 226

This PR fixes the problem. Tests for this case are created.

@nateprewitt
Copy link
Member

nateprewitt commented May 5, 2022

Hi @romanyakovlev, can you provide some more information about why you believe this needs to be changed? The original input data is application/x-www-form-urlencoded, it just happens that the serialized version is an empty string in this case.

I don't believe there's any specification implying you should remove the Content-Type for zero length bodies. That's likely breaking behavior for a subset of APIs. It's probably also worth noting that the semantics of a GET request body are undefined since this is non-standard behavior. What happens here cannot be "correct" or "incorrect".

@romanyakovlev
Copy link
Author

romanyakovlev commented May 5, 2022

@nateprewitt the reason why I pushed this PR is the problem with nginx. When this type of request was sent to it I saw this in logs:

172.17.0.1 - - [05/May/2022:19:34:30 +0000] "GET / HTTP/1.1" 200 615 "-" "python-requests/2.27.1" "-"
172.17.0.1 - - [05/May/2022:19:34:30 +0000] "0" 400 157 "-" "-" "-"

So after GET request nginx also got the second request with 400 status code
With this fix everything is ok:

172.17.0.1 - - [05/May/2022:19:32:13 +0000] "GET / HTTP/1.1" 200 615 "-" "python-requests/2.27.1" "-"

Thats the only reason.
Nginx version is 1.20.1.

@nateprewitt
Copy link
Member

You'll want to check error logs on why nginx is throwing a 400. It's also hard to tell if this is an issue for nginx or the application it's fronting.

@romanyakovlev
Copy link
Author

Well, my mistake. The problem was not with application/x-www-form-urlencoded but with the body variable which should have '' value to reproduce this behavior and None value to fix it.

@romanyakovlev
Copy link
Author

The same thing is happening on apache 2.4:

apache_1  | 172.21.0.1 - - [05/May/2022:20:02:17 +0000] "GET / HTTP/1.1" 200 45
apache_1  | 172.21.0.1 - - [05/May/2022:20:02:17 +0000] "0" 400 226

@romanyakovlev
Copy link
Author

romanyakovlev commented May 5, 2022

@nateprewitt I'm sending request directly to nginx in docker without any other apps, the same thing is true for apache. Its hard to understand what’s wrong with the request because its just "bad request"

@romanyakovlev
Copy link
Author

But I'm going to investigate why this is happening and yes, I was wrong about the reason is application/x-www-form-urlencoded

@romanyakovlev romanyakovlev marked this pull request as draft May 5, 2022 20:09
@nateprewitt
Copy link
Member

Alright, the error is here. We do a check below to make sure we're not emitting a Content-Length for GET/HEAD requests, but not for this first check. When the body value is None, it bypasses this conditional, but an empty string does not. This results in a Content-Length: 0 header being added.

As I said earlier, what you're trying to do with this request is outside of the realm of defined HTTP semantics. You should not be emitting a GET request with a body but we allow it because some servers do crazy things.

The issue I believe you're hitting with apache/nginx is this:

A user agent SHOULD NOT send a Content-Length header field
when the request message does not contain a payload body and
the method semantics do not anticipate such a body.

-- RFC 7230 3.3.2

While we probably shouldn't be doing this we are. It's hard to tell what may be relying on this behavior at this point and given this is a SHOULD NOT rather than a MUST NOT, I don't think this is something we'd fix in Requests 2.x.

@romanyakovlev
Copy link
Author

romanyakovlev commented May 5, 2022

Okay, I got it, thanks. PR is closed.

@romanyakovlev
Copy link
Author

romanyakovlev commented May 5, 2022

@nateprewitt after some research I've discovered that the problem is not about Content-Length header.
Here https://github.com/psf/requests/blob/main/requests/models.py#L576 length will be 0 so there is no Content-Length: 0 header in request.
The problem occurs there https://github.com/psf/requests/blob/main/requests/adapters.py#L471 .
Because request.body is '' and 'Content-Length' not in request.headers it counts as chunk=True.
Because of that it acts like it has Transfer-Encoding: chunked header, and here https://github.com/psf/requests/blob/main/requests/adapters.py#L523-L528 it does not send nothing but low_conn.send(b'0\r\n\r\n').
I guess thats why It has bad request like this:

apache_1  | 172.21.0.1 - - [05/May/2022:23:05:44 +0000] "0" 400 226

And the most interesting thing - the same behavior is happening on POST request.

post('http://localhost:80', data={'foo': None})

gives:

apache_1  | 172.21.0.1 - - [05/May/2022:23:05:44 +0000] "POST / HTTP/1.1" 200 45
apache_1  | 172.21.0.1 - - [05/May/2022:23:05:44 +0000] "0" 400 226

The raw request will be something like this:

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("localhost", 80))
s.send(
    b'GET / HTTP/1.1\r\n'
    b'Host: localhost:80\r\n'
    b'User-Agent: python-requests/2.27.1\r\n'
    b'Accept-Encoding: gzip, deflate, br\r\n'
    b'Accept: */*\r\n'
    b'Connection: keep-alive\r\n'
    b'\r\n'
    b'0\r\n\r\n' # this thing is added here https://github.com/psf/requests/blob/main/requests/adapters.py#L528
)

response = s.recv(4096)
print(response)

so 0\r\n\r\n is the reason of

apache_1  | 172.21.0.1 - - [05/May/2022:23:05:44 +0000] "0" 400 226

@romanyakovlev romanyakovlev reopened this May 5, 2022
@romanyakovlev romanyakovlev changed the title Fix content_type in get request when data has only None values Request sends bad request with data which consists of empty values only May 5, 2022
@romanyakovlev romanyakovlev changed the title Request sends bad request with data which consists of empty values only Request with data which consists of empty values only sends bad request May 5, 2022
@romanyakovlev romanyakovlev marked this pull request as ready for review May 5, 2022 23:48
@romanyakovlev
Copy link
Author

Changed PR description and name

@nateprewitt
Copy link
Member

Yep, this does appear to be a bug! We shouldn't be leaving this code branch without setting Content-Length, Transfer-Encoding, or ensuring body = None. It's hard to know which fix is going to be the least problematic. I'm hesitant to start adding new headers to requests, but the first option would be to add Transfer-Encoding: chunked if our body isn't None and we didn't compute a Content-Length.

However, short-circuiting this by changing the body from '' back to None seems less impactful. It will affect the value of Request.body after calling prepare_body which is potentially breaking for some use cases. I think given the full flow is completely broken currently, that may be acceptable though.

I wrote up a quick test for tests/test_lowlevel.py to demonstrate what we're trying to fix.

@pytest.mark.parametrize(
    "method,include,exclude",
    (
        (requests.get, [], [b"Content-Length:", b"Transfer-Encoding:"]),
        (requests.post, [b"Content-Length: 0\r\n"], [b"Transfer-Encoding:"]),
    )
)
def test_empty_urlencoded_form_body(method, include, exclude):
    """Ensure we use only the specified Host header for chunked requests."""
    close_server = threading.Event()
    server = Server(echo_response_handler, wait_to_close_event=close_server)

    with server as (host, port):
        url = f"http://{host}:{port}/"
        resp = method(url, data=(("a", None,),))
        close_server.set()  # release server block

    assert not resp.content.endswith(b'\r\n0\r\n\r\n')

    for header in include:
        assert header in resp.content

    for header in exclude:
        assert header not in resp.content

To fix it, I think the least invasive change would be updating this line to:

     body = self._encode_params(data) or None

Let me know what you think about that, @romanyakovlev. I'm curious to hear from @sigmavirus24 and/or @sethmlarson on their thoughts.

@nateprewitt nateprewitt added the Bug label May 6, 2022
@sigmavirus24
Copy link
Contributor

My input is "garbage in, garbage out". The input causing the behavior is garbage so I'm not worried about this

@nateprewitt
Copy link
Member

This is definitely an edge case, but I'm hesitant to call it garbage because the interface allows arbitrary dictionaries as input. If you're constructing your input dynamically and end up with a value of None, Requests shouldn't start emitting non-sense message framing. Ideally, we either error out or make sure we know how to send the right pieces over the wire.

This isn't scoped to None either, any empty iterable value will trigger this. I'm surprised this hasn't been raised before.

@romanyakovlev romanyakovlev force-pushed the content-type-get-request-data-empty-values branch 3 times, most recently from e34885f to 4ebdc47 Compare May 6, 2022 07:02
@romanyakovlev romanyakovlev force-pushed the content-type-get-request-data-empty-values branch from 4ebdc47 to 37f376a Compare May 6, 2022 07:05
@romanyakovlev
Copy link
Author

@nateprewitt I agree, your solution looks better. Pushed it with the test to the branch.

@sigmavirus24
Copy link
Contributor

This is definitely an edge case, but I'm hesitant to call it garbage because the interface allows arbitrary dictionaries as input. If you're constructing your input dynamically and end up with a value of None, Requests shouldn't start emitting non-sense message framing. Ideally, we either error out or
make sure we know how to send the right pieces over the wire.

If you're constructing it this way and you're not being careful then it is garbage input. None doesn't mean anything in this context. I've always opposed the support of none in the parameter but we can't remove it.

This isn't scoped to None either, any empty iterable value will trigger this. I'm surprised this hasn't been raised before.

Again, garbage. If you're sending an empty iterable that's garbage for us to try to do our best with and no way to predict it

Copy link

@Parvezkhan0 Parvezkhan0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title: Review for Fixing Bug in Request Module

Description:
I have reviewed the changes made to address the bug mentioned in issue #123. Overall, the modifications look good and effectively resolve the reported problem. The code has been updated in the fix-bug-issue123 branch.

The changes include:

Added a check for the presence of the 'Content-Length' header in the request headers.
If the 'Content-Length' header is not present, it is automatically added with a value of '0'.
These modifications ensure that the correct 'Content-Length' value is set, preventing any issues with the request payload.

Copy link

@Parvezkhan0 Parvezkhan0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes implemented to fix the issue are effective and address the problem appropriately. The modifications enhance the data encoding process and ensure proper handling of various scenarios. The overall changes demonstrate a thorough understanding of the issue and effectively resolve it, resulting in improved functionality and reliability.

Comment on lines +431 to +454
@pytest.mark.parametrize(
"method,include,exclude",
(
(requests.get, [], [b"Content-Length:", b"Transfer-Encoding:"]),
(requests.post, [b"Content-Length: 0\r\n"], [b"Transfer-Encoding:"]),
)
)
def test_empty_urlencoded_form_body(method, include, exclude):
"""Ensure we use only the specified Host header for chunked requests."""
close_server = threading.Event()
server = Server(echo_response_handler, wait_to_close_event=close_server)

with server as (host, port):
url = f"http://{host}:{port}/"
resp = method(url, data=(("a", None,),))
close_server.set() # release server block

assert not resp.content.endswith(b"\r\n0\r\n\r\n")

for header in include:
assert header in resp.content

for header in exclude:
assert header not in resp.content

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case test_empty_urlencoded_form_body ensures that only the specified Host header is used for chunked requests. It creates a server, sends a request using different HTTP methods (requests.get and requests.post), and verifies the response content for the presence or absence of specific headers. The test also checks that the response content does not end with a specific pattern

@@ -556,7 +556,7 @@ def prepare_body(self, data, files, json=None):
(body, content_type) = self._encode_files(files, data)
else:
if data:
body = self._encode_params(data)
body = self._encode_params(data) or None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line body = self._encode_params(data) or None has been modified to handle the case where self._encode_params(data) could potentially return None. The change ensures that even if the encoding of data fails and returns None, the body variable is set to None explicitly. This modification improves the code's robustness and avoids any potential issues related to None values in body.

Copy link

@Parvezkhan0 Parvezkhan0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes implemented to fix the issue are effective and address the problem appropriately. The modifications enhance the data encoding process and ensure proper handling of various scenarios. The overall changes demonstrate a thorough understanding of the issue and effectively resolve it, resulting in improved functionality and reliability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants