-
Notifications
You must be signed in to change notification settings - Fork 21
Missing reports from large sites #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
My gut intuition says this is because the crawler is finding items on the internet that it shouldn't be passing to Lighthouse (but is because of the However, it definitely could be the size of the queue or something else that I'm not sure of. I'm not sure where to look for the root cause of this. |
I'm unable to reproduce this error. When running auto-lighthouse on |
Sorry for the confusion. |
As you mentioned in PR #133 it seems possible that this issue is the result of this upstream lighthouse issue. (Or maybe I'm misunderstanding?) Having discovered this, I'm curious about what the implications are for the current architecture of this tool. Sounds like we're not supposed to run lighthouse a bunch inside a |
From my understanding, and this comment from Patrick, it sounds like running Lighthouse in parallel is a valid use case if you're okay with a loss of accuracy in the performance metrics. Now I don't know yours and your company's use case, but if you're using Lighthouse to audit the other metrics, maybe I can create some way to handle that. I'd have to do some timing tests to see how fast Lighthouse can run if one is only auditing the performance metrics to justify my first thought solution though. Just for context, I'm thinking of a parallel run of metrics that aren't performance based then a sequential run of the performance metric. However, that's running Lighthouse four times on each page, which is why I'd need to do a quick check to see how fast the auditing can be done with different categories. |
I support the idea of just adding the child process. One way to offset the potential inaccuracy (resulting from resource limitations) might be to add an option to control the amount of concurrency. Then the user could choose the balance between accuracy and performance. If you set the concurrency to 1, I don't think we should expect to hit resource limits because this would be the exact use case the tool was designed for. Users with more horsepower, or less concern for accuracy could turn up the concurrency to run more tests in parallel. Based on my limited understanding of the relevant code, both of these seem pretty straight-forward to do. |
Issue is looking mighty stale |
Describe the bug
I'm trying generate reports for a large site (~6760 pages) and it is only producing 437 report files.
To Reproduce
Steps to reproduce the behavior:
git clone ...
npm install
cd ./auto-lighthouse
npm run start -- -u https://example.com --format=csv --respectRobots=false
Expected behavior
I expect the crawler to find ~6760 pages, then generate 13522 report files (extra two for aggregated).
Instead, I find ~437 report files and an error in the console.
Pushed: ...
6760 times.Generating 13522 reports!
(so far so good)Wrote ...
437 times, followed by an error.It appears that the script is chocking on something before it finishes writing all the files. It may have something to do with the race condition mentioned in this unmerged PR.
Here's an abridged version of the full transcript:
The text was updated successfully, but these errors were encountered: