-
Notifications
You must be signed in to change notification settings - Fork 102
Add cgi_mode to parse_header to support LF in CGI headers #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good. I have a few suggestions, please let me know what you think.
lib/webrick/httputils.rb
Outdated
header_line = Regexp.new(/^([A-Za-z0-9!\#$%&'*+\-.^_`|~]+):([^\r\n\0]*?)#{line_break}\z/m) | ||
continued_header_lines = Regexp.new(/^[ \t]+([^\r\n\0]*?)#{line_break}/m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allocates 2 regexps per call. Can we instead add 4 Regexp constants (2 for CGI mode and 2 for non-CGI mode), and use those constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. That makes a lot of sense, especially for a frequently called function like this. I'll update it to use constants.
sig/httputils.rbs
Outdated
@@ -26,7 +26,7 @@ module WEBrick | |||
|
|||
HEADER_CLASSES: Hash[String, untyped] | |||
|
|||
def self?.parse_header: (String raw) -> Hash[String, Array[String]] | |||
def self?.parse_header: (String raw, ?bool cgi_mode) -> Hash[String, Array[String]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also need updating for the keyword argument, but I've never written RBS before, so I'm not sure how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never written RBS either, but when I run rbs, it outputs the following:
def self?.parse_header: (untyped raw, ?cgi_mode: bool) -> untyped
So I think it would be written like this:
def self?.parse_header: (String raw, ?cgi_mode: bool) -> Hash[String, Array[String]]
use keyword argument Co-authored-by: Jeremy Evans <[email protected]>
use keyword argument Co-authored-by: Jeremy Evans <[email protected]>
I've applied the suggested changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you!
lib/webrick/httputils.rb
Outdated
REGEXP_CONTINUED_HEADER_LINE = /^[ \t]+([^\r\n\0]*?)\r\n/m | ||
REGEXP_CONTINUED_CGI_HEADER_LINE = /^[ \t]+([^\r\n\0]*?)\r?\n/m | ||
|
||
def parse_header(raw, cgi_mode: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a public interface change? or is it internal to CGIHandler
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a public interface change. However, adding an optional keyword argument should be a backwards compatible change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, I'd like to set the bar a little higher on the naming of cgi_mode
& related documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the general state of WEBrick's documentation, lack of documentation hardly seems like a blocker (though documentation improvements are obviously welcomed). If you don't like the argument name, please pick a new one (allow_bare_lf
?) and I'm sure we can switch to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I've updated the comment for parse_header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The separate regexes are a performance optimization, so we don't need to allocate 2 regex per call.
If these are an implementation detail, can we make them private?
Yes, but that's also true of many methods in Ruby, so I don't see why it should be a blocker.
It's not, it's an observation to explain my position.
I don't want to make structural changes when they aren't necessary to fix a bug.
Sometimes the shortest path from A to B is not the best one.
As this is a CGI specific code path, my preference is for this code not to leak outside CGIHandler
. I'd like to hear back from @paulownia but Jeremy I don't mind if you merge this after that. I am not planning on fixing WEBRick's design issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed explanation. I understand the point about separating line reading from header parsing, and keeping CGI-specific code within CGIHandler. I agree that a cleaner design would be ideal if possible.
Since no major design changes are required, I will move the CGI-related code. But would simply moving the two constants to CGIHandler be sufficient? I'm not sure this is the best approach—any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and marking them as private would also be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the code, but it seems better to pass the Regexp itself instead of cgi_mode. This way, we can use private_constant to make the constants completely private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I forgot to request a review. When you have time, could you take a look? I’d appreciate it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good, thank you for your patience.
Sorry I have been travelling a lot, I'll review it either today or later this week. |
fixes #165
Adds
cgi_mode
option toparse_header
method to allow bare LF line breaks in CGI headers.