Skip to content

rg doesn't stop after first match when -q is used #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
0xmohit opened this issue Sep 25, 2016 · 9 comments
Closed

rg doesn't stop after first match when -q is used #77

0xmohit opened this issue Sep 25, 2016 · 9 comments
Labels
bug A bug.

Comments

@0xmohit
Copy link

0xmohit commented Sep 25, 2016

Example:

$ seq 100000 > t
$ time grep -q 1 t

real    0m0.003s
user    0m0.000s
sys     0m0.000s
$ time rg -q 1 t

real    0m0.139s
user    0m0.124s
sys     0m0.012s
$ time grep -q a t

real    0m0.003s
user    0m0.000s
sys     0m0.004s
$ rg -q a t

real    0m0.017s
user    0m0.008s
sys     0m0.008s

The same was observed while searching in real world code:

~/go/src/cmd/compile/internal/gc$ time grep OLITERAL sinit.go 
        case OLITERAL:
                        if e.Expr.Op == OLITERAL {
        case OLITERAL:
                if l.Class == PEXTERN && r.Left.Op == OLITERAL {
                        if e.Expr.Op == OLITERAL {
        return n.Op == OLITERAL && n.Val().Ctype() != CTNIL
                        if n.Op == OARRAYLIT && index.Op != OLITERAL {
        case OLITERAL:
        case OLITERAL:
        case OLITERAL:

real    0m0.004s
user    0m0.004s
sys     0m0.000s
~/go/src/cmd/compile/internal/gc$ time rg OLITERAL sinit.go 
302:    case OLITERAL:
345:                    if e.Expr.Op == OLITERAL {
381:    case OLITERAL:
416:            if l.Class == PEXTERN && r.Left.Op == OLITERAL {
450:                    if e.Expr.Op == OLITERAL {
583:    return n.Op == OLITERAL && n.Val().Ctype() != CTNIL
645:                    if n.Op == OARRAYLIT && index.Op != OLITERAL {
654:    case OLITERAL:
1278:   case OLITERAL:
1391:   case OLITERAL:

real    0m0.028s
user    0m0.028s
sys     0m0.000s
@BurntSushi
Copy link
Owner

Your first case is pretty pathological, but interesting. The problem is that you're searching for a single byte in a file where that byte is extremely common. It's worth optimizing, but pretty low proirity.

Your second case looks like it's on a tiny sample. There's another issue for looking at making startup time faster, since that's almost assuredly what you're observing.

Certainly, I don't see any relationship in terms of performance between your examples.

@0xmohit
Copy link
Author

0xmohit commented Sep 25, 2016

Certainly, I don't see any relationship in terms of performance between your examples.

The intent was to illustrate the performance issue when searching in a single file; those were intended to be different.

Moreover, the first one specified -q; the only byte being searched for was the first one in the input.

@BurntSushi
Copy link
Owner

@0xmohit Oh! I missed the -q flag and jumped straight to unrelated things. :-) Thanks for pointing that out. I don't think rg is quitting after it finds the first match.

@BurntSushi
Copy link
Owner

But yeah, in the second case, the issue is definitely startup time. I was able to reproduce it.I'm going to lump that part of your issue in with #33, and change the first part to "fix the -q flag."

@BurntSushi BurntSushi changed the title rg is slow when operating on a single file rg doesn't stop after first match when -q is used Sep 25, 2016
@BurntSushi
Copy link
Owner

BurntSushi commented Sep 25, 2016

For reference, if we remove the bug from the benchmark, rg does beat grep:

$ seq 100000000 > t
$ ls -lh
total 848M
-rw-r--r-- 1 andrew users 848M Sep 25 10:09 t
$ time rg 1 t | wc -l
56953280

real    0m3.381s
user    0m3.790s
sys     0m0.370s
$ time grep 1 t | wc -l
56953280

real    0m4.011s
user    0m4.293s
sys     0m0.707s

This suggests rg isn't actually suffering pathological behavior, and this is just a plain ol' bug.

@BurntSushi BurntSushi added the bug A bug. label Sep 25, 2016
@BurntSushi
Copy link
Owner

Clarification, grep in ASCII mode is roughly the same as rg in Unicode mode:

$ time LC_ALL=C grep 1 t | wc -l
56953280

real    0m3.514s
user    0m3.547s
sys     0m0.633s
$ time LC_ALL=en_US.UTF-8 grep 1 t | wc -l
56953280

real    0m3.942s
user    0m3.997s
sys     0m0.627s

Benchmarks are hard. :-)

@0xmohit
Copy link
Author

0xmohit commented Sep 25, 2016

Benchmarks are hard. :-)

Oh, yes! Quoting Benchmark_(computing)#Challenges:

Benchmarking is not easy and often involves several iterative rounds in order to arrive at predictable, useful conclusions. Interpretation of benchmarking data is also extraordinarily difficult.

@BurntSushi BurntSushi mentioned this issue Sep 25, 2016
4 tasks
@0xmohit
Copy link
Author

0xmohit commented Sep 27, 2016

It seems that this would only stop searching a file after a matched; when searching in a directory, it'll probably still scan all files.


Edit: To elaborate:

rg -q . dir_with_lots_of_files

should probably exit almost immediately.

@BurntSushi
Copy link
Owner

@0xmohit Oh... Nice! That should be an easy one to fix. I created #116. Thanks!

amsharma91 added a commit to amsharma91/ripgrep that referenced this issue Sep 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
Development

No branches or pull requests

2 participants