Add cgi_mode to parse_header to support LF in CGI headers #166

paulownia · 2025-03-25T00:21:14Z

fixes #165

Adds cgi_mode option to parse_header method to allow bare LF line breaks in CGI headers.

jeremyevans

This looks pretty good. I have a few suggestions, please let me know what you think.

lib/webrick/httputils.rb

lib/webrick/httpservlet/cgihandler.rb

jeremyevans · 2025-03-25T01:08:53Z

lib/webrick/httputils.rb

+      header_line = Regexp.new(/^([A-Za-z0-9!\#$%&'*+\-.^_`|~]+):([^\r\n\0]*?)#{line_break}\z/m)
+      continued_header_lines = Regexp.new(/^[ \t]+([^\r\n\0]*?)#{line_break}/m)


This allocates 2 regexps per call. Can we instead add 4 Regexp constants (2 for CGI mode and 2 for non-CGI mode), and use those constants?

Thanks for the suggestion. That makes a lot of sense, especially for a frequently called function like this. I'll update it to use constants.

jeremyevans · 2025-03-25T01:09:34Z

sig/httputils.rbs

@@ -26,7 +26,7 @@ module WEBrick

    HEADER_CLASSES: Hash[String, untyped]

-    def self?.parse_header: (String raw) -> Hash[String, Array[String]]
+    def self?.parse_header: (String raw, ?bool cgi_mode) -> Hash[String, Array[String]]


This would also need updating for the keyword argument, but I've never written RBS before, so I'm not sure how.

I have never written RBS either, but when I run rbs, it outputs the following:

def self?.parse_header: (untyped raw, ?cgi_mode: bool) -> untyped

So I think it would be written like this:

def self?.parse_header: (String raw, ?cgi_mode: bool) -> Hash[String, Array[String]]

use keyword argument Co-authored-by: Jeremy Evans <[email protected]>

paulownia · 2025-03-25T11:49:49Z

I've applied the suggested changes.

jeremyevans

Looks good, thank you!

ioquatix · 2025-03-27T08:30:42Z

lib/webrick/httputils.rb

+    REGEXP_CONTINUED_HEADER_LINE = /^[ \t]+([^\r\n\0]*?)\r\n/m
+    REGEXP_CONTINUED_CGI_HEADER_LINE = /^[ \t]+([^\r\n\0]*?)\r?\n/m
+
+    def parse_header(raw, cgi_mode: false)


Is this a public interface change? or is it internal to CGIHandler?

It is a public interface change. However, adding an optional keyword argument should be a backwards compatible change.

If that's the case, I'd like to set the bar a little higher on the naming of cgi_mode & related documentation.

Considering the general state of WEBrick's documentation, lack of documentation hardly seems like a blocker (though documentation improvements are obviously welcomed). If you don't like the argument name, please pick a new one (allow_bare_lf?) and I'm sure we can switch to it.

Thanks for the feedback. I've updated the comment for parse_header.

The separate regexes are a performance optimization, so we don't need to allocate 2 regex per call.

If these are an implementation detail, can we make them private?

Yes, but that's also true of many methods in Ruby, so I don't see why it should be a blocker.

It's not, it's an observation to explain my position.

I don't want to make structural changes when they aren't necessary to fix a bug.

Sometimes the shortest path from A to B is not the best one.

As this is a CGI specific code path, my preference is for this code not to leak outside CGIHandler. I'd like to hear back from @paulownia but Jeremy I don't mind if you merge this after that. I am not planning on fixing WEBRick's design issues.

Thank you for the detailed explanation. I understand the point about separating line reading from header parsing, and keeping CGI-specific code within CGIHandler. I agree that a cleaner design would be ideal if possible.

Since no major design changes are required, I will move the CGI-related code. But would simply moving the two constants to CGIHandler be sufficient? I'm not sure this is the best approach—any suggestions?

Yes and marking them as private would also be a good idea.

I fixed the code, but it seems better to pass the Regexp itself instead of cgi_mode. This way, we can use private_constant to make the constants completely private.

I realize I forgot to request a review. When you have time, could you take a look? I’d appreciate it!

jeremyevans

Still looks good, thank you for your patience.

ioquatix · 2025-04-06T23:29:48Z

Sorry I have been travelling a lot, I'll review it either today or later this week.

allow bare LF in cgi header

47dd05b

jeremyevans reviewed Mar 25, 2025

View reviewed changes

paulownia and others added 4 commits March 25, 2025 11:40

Update lib/webrick/httputils.rb

ee1d7ba

use keyword argument Co-authored-by: Jeremy Evans <[email protected]>

Update lib/webrick/httpservlet/cgihandler.rb

84e865a

use keyword argument Co-authored-by: Jeremy Evans <[email protected]>

fix rbs

4ab4c2d

use constants for parsing headers

efa1987

jeremyevans approved these changes Mar 25, 2025

View reviewed changes

jeremyevans requested a review from ioquatix March 25, 2025 14:12

ioquatix reviewed Mar 27, 2025

View reviewed changes

paulownia added 3 commits March 28, 2025 11:23

Updated the comment for parse_header

32887fd

move CGI-specific constants to CGIHandler

b6dd139

pass regexp to parse_header instead of cgi_mode

b7b3d61

jeremyevans approved these changes Apr 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cgi_mode to parse_header to support LF in CGI headers #166

Add cgi_mode to parse_header to support LF in CGI headers #166

paulownia commented Mar 25, 2025

jeremyevans left a comment

jeremyevans Mar 25, 2025

paulownia Mar 25, 2025

jeremyevans Mar 25, 2025

paulownia Mar 25, 2025

paulownia commented Mar 25, 2025

jeremyevans left a comment

ioquatix Mar 27, 2025 •

edited

Loading

jeremyevans Mar 27, 2025

ioquatix Mar 27, 2025

jeremyevans Mar 28, 2025

paulownia Mar 28, 2025

ioquatix Mar 29, 2025

paulownia Mar 30, 2025

ioquatix Mar 30, 2025

paulownia Mar 30, 2025

paulownia Apr 4, 2025

jeremyevans left a comment

ioquatix commented Apr 6, 2025

		header_line = Regexp.new(/^([A-Za-z0-9!\#$%&'+\-.^_`\|~]+):([^\r\n\0]?)#{line_break}\z/m)
		continued_header_lines = Regexp.new(/^[ \t]+([^\r\n\0]*?)#{line_break}/m)

Add cgi_mode to parse_header to support LF in CGI headers #166

Are you sure you want to change the base?

Add cgi_mode to parse_header to support LF in CGI headers #166

Conversation

paulownia commented Mar 25, 2025

jeremyevans left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulownia commented Mar 25, 2025

jeremyevans left a comment

Choose a reason for hiding this comment

ioquatix Mar 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyevans left a comment

Choose a reason for hiding this comment

ioquatix commented Apr 6, 2025

ioquatix Mar 27, 2025 •

edited

Loading