Skip to content

[IMPROVED] Route sid parse performance #478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 9, 2017

Conversation

miraclesu
Copy link
Contributor

@miraclesu miraclesu commented Apr 27, 2017

  • Link to issue
  • Documentation added (if applicable)
  • Tests added
  • Branch rebased on top of current master (git pull --rebase origin master)
  • Changes squashed to a single commit (described here)
  • Build is green in Travis CI
  • You have certified that the contribution is your original work and that you license the work to the project under the MIT license

Changes proposed in this pull request:

Base on func (s *Server) routeSidQueueSubscriber(rsid []byte) (*subscription, bool) rsid comes from func routeSid(sub *subscription) string, I think we can replace regexp.FindSubmatch with a simple function.

The Benchmark result:

go test -bench=. -benchmem
BenchmarkRegexp-4   	 1000000	      1124 ns/op	     144 B/op	       2 allocs/op
BenchmarkParse-4    	50000000	        26.2 ns/op	       0 B/op	       0 allocs/op

The Benchmark code:

const (
	RSID  = "RSID"
	QRSID = "QRSID"

	QRSID_LEN = len(QRSID)

	RSID_CID_INDEX   = 1
	RSID_SID_INDEX   = 2
	EXPECTED_MATCHES = 3
)

var qrsidRe = regexp.MustCompile(`QRSID:(\d+):([^\s]+)`)
var _rsid = []byte("QRSID:1:hello")

func BenchmarkRegexp(b *testing.B) {
	for i := 0; i < b.N; i++ {
		if !bytes.HasPrefix(_rsid, []byte(QRSID)) {
			continue
		}
		matches := qrsidRe.FindSubmatch(_rsid)
		if matches == nil || len(matches) != EXPECTED_MATCHES {
			continue
		}
		_, _ = matches[RSID_CID_INDEX], matches[RSID_SID_INDEX]
	}
}

func BenchmarkParse(b *testing.B) {
	for i := 0; i < b.N; i++ {
		cid, sid, ok := parseRouteSid(_rsid)
		if !ok {
			continue
		}

		_, _ = cid, sid
	}
}

func parseRouteSid(rsid []byte) (uint64, []byte, bool) {
	if !bytes.HasPrefix(rsid, []byte(QRSID)) {
		return 0, nil, false
	}

	// We don't care what's char of rsid[QRSID_LEN+1], it should be ':'
	for i, count := QRSID_LEN+1, len(rsid); i < count; i++ {
		switch rsid[i] {
		case ':':
			return uint64(parseInt64(rsid[QRSID_LEN+1 : i])), rsid[i+1:], true
		}
	}
	return 0, nil, true
}

// Ascii numbers 0-9
const (
	asciiZero = 48
	asciiNine = 57
)

// parseInt64 expects decimal positive numbers. We
// return -1 to signal error
func parseInt64(d []byte) (n int64) {
	if len(d) == 0 {
		return -1
	}
	for _, dec := range d {
		if dec < asciiZero || dec > asciiNine {
			return -1
		}
		n = n*10 + (int64(dec) - asciiZero)
	}
	return n
}

/cc @nats-io/core

Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! If removing regex, I would completely do without the RSID_CID_INDEX and the like and simply have your parse function returns []byte, []byte, bool.
If not, at least maybe make QRSID+":" a constant.

server/route.go Outdated
@@ -491,6 +484,24 @@ func routeSid(sub *subscription) string {
return fmt.Sprintf("%s%s:%d:%s", qi, RSID, sub.client.cid, sub.sid)
}

func parseRouteSid(rsid []byte) (matches [EXPECTED_MATCHES][]byte, ok bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you get rid of regex, why don't return []byte, []byte, bool instead? Why carry the array of matches that is specific to regex?

server/route.go Outdated
@@ -491,6 +484,24 @@ func routeSid(sub *subscription) string {
return fmt.Sprintf("%s%s:%d:%s", qi, RSID, sub.client.cid, sub.sid)
}

func parseRouteSid(rsid []byte) (matches [EXPECTED_MATCHES][]byte, ok bool) {
if !bytes.HasPrefix(rsid, []byte(QRSID+":")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make QRSID+":" a constant then, for example QRSID_PREFIX

server/route.go Outdated
for i := start; i < len(rsid); i++ {
switch rsid[i] {
case ':':
matches[RSID_CID_INDEX], matches[RSID_SID_INDEX] = rsid[start:i], rsid[i+1:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change to return individual []byte, I would then do return rsid[start:i], rsid[i+1:], true.

server/route.go Outdated
}
}

return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nil, nil, false

server/route.go Outdated
return
}

start := len(QRSID + ":")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even this offset could be made a constant.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 3077dab on miraclesu:improve_route_sid_parse into ** on nats-io:master**.

@miraclesu
Copy link
Contributor Author

@kozlovic Thanks for your response! How about the new codes?

I can't repeat the test fail on go1.6.4 on my local environment.

server/route.go Outdated
@@ -491,6 +481,20 @@ func routeSid(sub *subscription) string {
return fmt.Sprintf("%s%s:%d:%s", qi, RSID, sub.client.cid, sub.sid)
}

func parseRouteSid(rsid []byte) (cid uint64, sid []byte, ok bool) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more clearly with use named function parameters :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is a matter of taste ;-) I know that @derekcollison (the author of the project) does not like it. I have to say that I myself do not like the return without the values since I always have to do a double take and think of what (and what value) is actually been returned.
The plus that I see of naming the returned parameters are when documenting the function, or if you want to use a defer to act upon the returned value.

server/route.go Outdated
for i := QRSID_PREFIX_LEN; i < len(rsid); i++ {
switch rsid[i] {
case ':':
return uint64(parseInt64(rsid[QRSID_PREFIX_LEN:i])), rsid[i+1:], true
Copy link
Member

@kozlovic kozlovic Apr 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the server would receive a corrupted message, like QRSID:1:, the server would crash. You need to check that there is actually something past the : character. I believe that the previous use of the regex would have returned false and so server would not crash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test for this as well.

Copy link
Contributor Author

@miraclesu miraclesu May 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the server would crash when receive a corrupted message like QRSID:1:, rsid[i+1:] is reslicing and never out bound of the array.
I had updated the function and added some tests for invalid queue sid, the expects as same with use the regex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, you are right, since i will always be <len(rsid), reslicing at i+1 would simply return an empty slice in that case.

Although your code is now equivalent in behavior with the regex, I now realize that we may have a problem and need to discuss this with @derekcollison. It looks to me that as long as we have a QRSID prefix, we should always return ok, no matter if we are able to decode cid and/or sid.

My reasoning is that if we return !ok when for instance getting QRSID:1 2\r\nok\r\n like in one of your test, the server ends up sending to a regular subscriber. I am not sure this is the original intent. As you can see in here, it looks like if the message was for a queue subscription, the server should not attempt to deliver to other subs, even if no actual queue subscriber is found (line 1124).

Let me check with Derek and once we agree we will get back to you (if parseRouteSid should return ok as long as the prefix is found).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Usually the server wouldn't send invalid queue MSG,but everything is possible on Internet :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look sometime today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is correct, we should always return true if we think it marks a queue subscription via QRSID prefix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And by prefix, you mean the presence of QRSID, not necessarily QRSID:?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct.

Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix the check for array boundaries.

@miraclesu miraclesu force-pushed the improve_route_sid_parse branch 2 times, most recently from 8ae8466 to f520651 Compare May 6, 2017 04:35
@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 90.884% when pulling f520651 on miraclesu:improve_route_sid_parse into 11d3eed on nats-io:master.

Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on latest conversation, parseRouteSid should return true as long as sid starts with QRSID, which means that messages will not be delivered to regular subscribers.

server/route.go Outdated
RSID_CID_INDEX = 1
RSID_SID_INDEX = 2
EXPECTED_MATCHES = 3
QRSID_PREFIX = QRSID + ":"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on latest discussion, let's get rid of QRSID_PREFIX and if you want, you could keep QRSID_LEN = len(QRSID).

server/route.go Outdated
return 0, nil, false
}

for i := QRSID_PREFIX_LEN; i < len(rsid); i++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add back:

start := QRSID_LEN+1
for i:=start; i<len(rsid); i++ {
...

server/route.go Outdated
for i := QRSID_PREFIX_LEN; i < len(rsid); i++ {
switch rsid[i] {
case ':':
return uint64(parseInt64(rsid[QRSID_PREFIX_LEN:i])), rsid[i+1:], len(rsid[i+1:]) > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change back to rsid[start:i] and change last returned value to true

server/route.go Outdated
return uint64(parseInt64(rsid[QRSID_PREFIX_LEN:i])), rsid[i+1:], len(rsid[i+1:]) > 0
}
}
return 0, nil, false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with true

@@ -390,17 +390,26 @@ func TestRouteQueueSemantics(t *testing.T) {
routeSend("MSG foo RSID:1:1 2\r\nok\r\n")
// Queue group one.
routeSend("MSG foo QRSID:1:2 2\r\nok\r\n")
// Invlaid queue sid.
routeSend("MSG foo QRSID 2\r\nok\r\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the changes, those invalid SID, since they all start with QRSID, should not translate into a message to regular subscribers.


// Use ping roundtrip to make sure its processed.
routeSend("PING\r\n")
routeExpect(pongRe)

// Should be 2 now, 1 for all normal, and one for specific queue subscriber.
matches = clientExpectMsgs(2)
// Should be 3 now, 1 for all normal, and one for specific queue subscriber,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained above, that should not change the number of messages delivered to regular subscribers, so you would need to revert those changes.

@miraclesu miraclesu force-pushed the improve_route_sid_parse branch from f520651 to 22e05cc Compare May 9, 2017 08:06
@miraclesu miraclesu force-pushed the improve_route_sid_parse branch from 22e05cc to 26ef3f8 Compare May 9, 2017 08:17
@miraclesu
Copy link
Contributor Author

I had updated the codes.

@kozlovic
Copy link
Member

kozlovic commented May 9, 2017

LGTM

@derekcollison derekcollison merged commit 01d2f1d into nats-io:master May 9, 2017
@miraclesu miraclesu deleted the improve_route_sid_parse branch May 10, 2017 01:53
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 90.996% when pulling 22e05cc on miraclesu:improve_route_sid_parse into 6a12c21 on nats-io:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants