Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Optimize AppendTokens() by adding Tokenizer::prefixUntil() #2003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Optimize AppendTokens() by adding Tokenizer::prefixUntil() #2003
Changes from all commits
8020835
775a49e
c82b576
177db9b
8986638
a2e873d
d20a96f
fcd34e2
4063879
f290fa9
8d848cd
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to replace new
{ findFirstOf, findFirstNotOf }
enum with SBuf method pointers, but the result was no more readable and more error-prone then current PR code because a prefix_() caller could supply a pointer to a method that compiles fine but is not compatible with prefix_() logic. The current/proposed "findFirstNotOf, tokenChars" and "findFirstOf, delimiters" calls are readable and safer.Moving findFirstOf() and findFirstNotOf() calls outside prefix_() does not work well either, for several reasons.
This comment does not request any changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, That trouble is a "red flag" for this alteration being a bad design change.
IMHO, the use-case I assume you are trying to achieve would be better reached by fixing the problems we have with the
token()
method design that are currently forcing some weird uses ofprefix()
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see "bad design change" signs here. My comment was describing troubles with rejected solutions, not with the proposed one, so "that trouble" does not really apply to the proposed code changes. Moreover, I do not think it is accurate to describe proposed code changes as "design change". The overall Tokenizer design remains the same. We are just adding another useful method.
- const auto tokenCharacters = delimiters.complement("non-delimiters");
Existing AppendTokens() always performs expensive CharacterSet negation and copying. It is definitely not "fine"! I have now added an explicit statement to the PR description to detail the problem solved by this PR.
This PR avoids expensive CharacterSet negation and copying on every AppendTokens() call. This optimization was planned earlier and does not preclude any Tokenizer::token() method redesign. If you would like to propose Tokenizer::token() design improvements, please do so, but please do not block this PR even if you think those future improvements are going to make this optimization unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, okay I see where you are coming from now regarding the need for change.
I still think fixing
Tokenizer::token
would be better over all. What you are callingprefixUntil
in this PR is what I recently have been thinkingtoken
should be doing. Would you mind making this PR do that and related update of the existingtoken
caller(s) ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I am sure you are aware passing a function pointer and calling it should be a far more efficient (and easily coded) solution than enumeration of all cases individually with hard-coded calls to those same function/methods. That is the red flag to me - something has gone wrong with the attempts to implement those should-be-better logic.
Not requesting a change due to this point, but FTR it smells bad to "optimize" code by replacing it with a known sub-optimal implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we should change token(), especially in this PR. That old method extracts and forgets leading and trailing delimiters; It should continue to do that. The new prefixUntil() method is similar to the existing prefix() method with regard to delimiter handling; prefix*() methods should continue to treat delimiters differently than token() does because their use cases are different.
Any speed difference between the two prefix_() designs is negligible/immaterial in this context; this PR optimizes a completely different (and actually significant!) expense. This design decision should be based on other factors. Since I have actually implemented a function pointer design (before rejecting it for reasons not covered in your analysis), I still believe I made the right call. Thank you for not insisting on changing that.