Skip to content

Add cgi_mode to parse_header to support LF in CGI headers #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

paulownia
Copy link
Contributor

fixes #165

Adds cgi_mode option to parse_header method to allow bare LF line breaks in CGI headers.

Copy link
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. I have a few suggestions, please let me know what you think.

Comment on lines 176 to 177
header_line = Regexp.new(/^([A-Za-z0-9!\#$%&'*+\-.^_`|~]+):([^\r\n\0]*?)#{line_break}\z/m)
continued_header_lines = Regexp.new(/^[ \t]+([^\r\n\0]*?)#{line_break}/m)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates 2 regexps per call. Can we instead add 4 Regexp constants (2 for CGI mode and 2 for non-CGI mode), and use those constants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. That makes a lot of sense, especially for a frequently called function like this. I'll update it to use constants.

@@ -26,7 +26,7 @@ module WEBrick

HEADER_CLASSES: Hash[String, untyped]

def self?.parse_header: (String raw) -> Hash[String, Array[String]]
def self?.parse_header: (String raw, ?bool cgi_mode) -> Hash[String, Array[String]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also need updating for the keyword argument, but I've never written RBS before, so I'm not sure how.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never written RBS either, but when I run rbs, it outputs the following:

 def self?.parse_header: (untyped raw, ?cgi_mode: bool) -> untyped

So I think it would be written like this:

 def self?.parse_header: (String raw, ?cgi_mode: bool) -> Hash[String, Array[String]]

paulownia and others added 4 commits March 25, 2025 11:40
@paulownia
Copy link
Contributor Author

I've applied the suggested changes.

Copy link
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

@jeremyevans jeremyevans requested a review from ioquatix March 25, 2025 14:12
REGEXP_CONTINUED_HEADER_LINE = /^[ \t]+([^\r\n\0]*?)\r\n/m
REGEXP_CONTINUED_CGI_HEADER_LINE = /^[ \t]+([^\r\n\0]*?)\r?\n/m

def parse_header(raw, cgi_mode: false)
Copy link
Member

@ioquatix ioquatix Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a public interface change? or is it internal to CGIHandler?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a public interface change. However, adding an optional keyword argument should be a backwards compatible change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, I'd like to set the bar a little higher on the naming of cgi_mode & related documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the general state of WEBrick's documentation, lack of documentation hardly seems like a blocker (though documentation improvements are obviously welcomed). If you don't like the argument name, please pick a new one (allow_bare_lf?) and I'm sure we can switch to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I've updated the comment for parse_header.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separate regexes are a performance optimization, so we don't need to allocate 2 regex per call.

If these are an implementation detail, can we make them private?

Yes, but that's also true of many methods in Ruby, so I don't see why it should be a blocker.

It's not, it's an observation to explain my position.

I don't want to make structural changes when they aren't necessary to fix a bug.

Sometimes the shortest path from A to B is not the best one.

As this is a CGI specific code path, my preference is for this code not to leak outside CGIHandler. I'd like to hear back from @paulownia but Jeremy I don't mind if you merge this after that. I am not planning on fixing WEBRick's design issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation. I understand the point about separating line reading from header parsing, and keeping CGI-specific code within CGIHandler. I agree that a cleaner design would be ideal if possible.

Since no major design changes are required, I will move the CGI-related code. But would simply moving the two constants to CGIHandler be sufficient? I'm not sure this is the best approach—any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and marking them as private would also be a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the code, but it seems better to pass the Regexp itself instead of cgi_mode. This way, we can use private_constant to make the constants completely private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize I forgot to request a review. When you have time, could you take a look? I’d appreciate it!

Copy link
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good, thank you for your patience.

@ioquatix
Copy link
Member

ioquatix commented Apr 6, 2025

Sorry I have been travelling a lot, I'll review it either today or later this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CGI script using LF line breaks in headers causes error
3 participants