-
Notifications
You must be signed in to change notification settings - Fork 181
Enhance scraper log archiving #1449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance scraper log archiving #1449
Conversation
Note this is in scope for Denise. I have tested it on Linux and it works well. Testing on Windows (with the wide string stuff) now. |
As usual, doesn't work on Windows. I must need an imbue... Nope that is not it. The archivescraperlog to force immediate archiving is not working on Windows. And I checked RC2 and it doesn't work there either... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested, and it created the logarchive directory as expected. I dumped a bunch of old scraper log archives into it. The next housekeeping cycle didn't do anything because the PrevArchiveCheckDate
starts with today seems to initialize with not_a_date_time
on boot--fine. I ran archivescraperlog
, and it deleted all but the 14 most recent archives. Great. I ran it again and it deleted the rest of the archives from the directory. Is that expected behavior?
Hmm... not supposed to do that. |
This commit implements putting the log archive files in the logarchive subdirectory to prevent clutter. It also implements deleting old log files per the following configuration file (or command line) parameters: deleteoldlogarchives=0|1 (default true) logarchiveretainnumfiles=x (default 14) deleteoldlogarchives defaults to true, which then means that every time the log archiver is triggered (typically at midnight UTC) it will archive the current scraper log file into the logarchive directory, and then apply the rule to retain the x most recent files, where x is specified by logarchiveretainnumfiles. An immediate archive (and directory cleanup) can be triggered by issuing the rpc command archivescraperlog. If deletelogarchives is set to false, then the scraper.log files are archived into the logarchive directory, but never deleted.
44f39dc
to
94cb95a
Compare
For Windows, this part....
I think doesn't work when another instance of the logger has a file handle open, for the same reason that we were having the problem with the dang client_state.xml file. |
I will try to troubleshoot it tonight. The Linux build is working correctly now.
…Sent from my iPhone
On Apr 30, 2019, at 7:34 PM, Cy Rossignol ***@***.***> wrote:
@cyrossignol commented on this pull request.
In src/scraper/scraper.cpp:
> +
+ std::set<fs::directory_entry, std::greater <fs::directory_entry>> SortedDirEntries;
+
+ // Iterate through the log archive directory and delete the oldest files beyond the retention rule
+ // The names are in format scraper-YYYYMMDDHHMMSS for the scraper logs, so filter by containing scraper
+ // The greater than sort in the set should then return descending order by datetime.
+ for (fs::directory_entry& DirEntry : fs::directory_iterator(LogArchiveDir))
+ {
+ std::string sFilename = DirEntry.path().filename().string();
+ size_t FoundPos = sFilename.find("scraper");
+
+ if (FoundPos != string::npos) SortedDirEntries.insert(DirEntry);
+ }
+
+ // Now iterate through set of filtered filenames. Delete all files greater than retention count.
+ unsigned int i;
I'll play with it a bit when I get home...don't have a Windows box to try it on at the moment.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I used Sysinternals Process Explorer to manually close the single file handle held open by the logger, and then ran archivescraperlog on my Windows testnet wallet. It worked. So this confirms the issue is trying to rename the file with another open handle on it. I think I have a solution. Will post new commit soon. |
to work properly on Windows
Inelegant, but works. Tested on both Linux and Windows. |
src/scraper/scraper.cpp
Outdated
@@ -304,6 +365,8 @@ void _log(logattribute eType, const std::string& sCall, const std::string& sMess | |||
|
|||
log.output(sOut); | |||
|
|||
log.closelogfile(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow. Do we want to close the file each time we log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to. There is no other way to do it. The performance loss is minimal. I already checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we only need to close the file when we want to rotate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two threads are writing to that file. If one thread closes it before rotating, the other still has an open handle, so the rename won't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Windows"
src/scraper/scraper.cpp
Outdated
@@ -304,6 +365,8 @@ void _log(logattribute eType, const std::string& sCall, const std::string& sMess | |||
|
|||
log.output(sOut); | |||
|
|||
log.closelogfile(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with closing the log file after each message. The file should be closed WHEN archiving it. Two threads writing to the same file with two handles and two logger objects is bad design. A file should be opened by one instance of logger only and that instance must have mutex to lock when archiving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If desired, the logging class can be split into backend and frontend. The backend holding only the mutex and file handle and frontend doing the formating and holding a reference to the backend. There can be then multiple frontends pointing to the same file and the archiving operation performed on the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I don't like about that we are not just renaming it. We are also compressing the file for archiving. The compression could take a while. The advantage of the approach as in the original scraper (before PR) and this PR is that the rename of the above file allows the archive process to work on the renamed file. The normal logging then immediately opens a new scraper.log file and continues writing and is not blocked by the compression process, which is acting on the renamed file.
Exerting a singleton pattern here will put the compression call in line with the rename, and the mutex will not be released for another use of the logger until the whole compression process is over.
So to recover that we would have to spawn a separate thread to do the compression. This is getting too complicated for the last minute to get Denise out the door.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I obviously am not suggesting to complicate matters TAHT much. Only the close-rename-reopen needs to be under the lock of mutex.
This is why I did not want to get into this at this stage in Denise. Now we are delaying the release for major surgery on the scraper logger. I am thinking about just removing the ability to instantly archive, which eliminates the second instance without major surgery. :|
…Sent from my iPhone
On May 1, 2019, at 4:27 AM, Tomáš ***@***.***> wrote:
@tomasbrod commented on this pull request.
In src/scraper/scraper.cpp:
> @@ -304,6 +365,8 @@ void _log(logattribute eType, const std::string& sCall, const std::string& sMess
log.output(sOut);
+ log.closelogfile();
If desired, the logging class can be split into backend and frontend. The backend holding only the mutex and file handle and frontend doing the formating and holding a reference to the backend. There can be then multiple frontends pointing to the same file and the archiving operation performed on the backend.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retested the archivescraperlog
command--works on Windows and Linux now 👍 Missed the window for the daily rotation last night I think, so I'm still waiting to observe that, but I don't see any obvious issues.
I'm not a big fan of reopening and closing the log file after each write either, but if we're talking about refactoring application-wide logging, maybe this problem will disappear in the next release.
Tomas, unless you really object, I would like to go with this as is. It works, and performs reasonably well. We need to overhaul the entire logging utility and combine the scraper and main logs together and implement Bitcoin's logging. This belongs in Elizabeth. It is too much to do at the last minute for Denise. |
@jamescowens I'd remove the RPC commands for manipulating the logs. It's a very niche use-case and I'd be surprised if we ever need it. If it simplifies the implementation then all the better. |
I just tested on Linux and Windows, and that looks like it solved it. Nope first execution good on Windows, second failed. |
63175c9
to
2fc6ca7
Compare
I forgot to put the file reopen in the block containing the rename... |
2fc6ca7
to
7dc3fa2
Compare
a la Bitcoin's newer approach.
7dc3fa2
to
85274ea
Compare
Ok. Looks like it is fully working now. Please take a look. I would like to get this merged. I testing on both Linux and Windows... |
utACK as-is and we'll rewrite/fine tune for Elizabeth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing it later sounds good. I am unblocking this, but I cant test it.
@tomasbrod I tested on both Linux and Windows with the singleton pattern and it is working well... Merging. |
Late to the party... FWIW I was building...just tested and looks good! |
Thanks @cyrossignol. |
Added: - Replace NeuralNetwork with portable C++ scraper #1387 (@jamescowens, @tomasbrod, @Cycy, @TheCharlatan, @denravonska). - Allow compile flags to be used for depends #1423 (@G-UK). - Add stake splitting and side staking info to getmininginfo #1424 (@jamescowens). - Add freedesktop.org desktop file and icon set #1438 (@a123b). Changed: - Disable Qt for windows Travis builds #1276 (@TheCharlatan). - Replace use of AppCache PROJECT section with strongly-typed structures #1415 (@cyrossignol). - Change dumpwallet to use appropriate data directory #1416 (@jamescowens). - Optimize ExtractXML() calls by avoiding unnecessary string copies #1419 (@cyrossignol). - Change signature of IsLockTimeWithinMinutes #1422 (@jamescowens). - Restore old poll output for getmininginfo RPC #1437 (@a123b). - Prevent segfault when using rpc savescraperfilemanifest #1439 (@jamescowens). - Improve miner status messages for ineligible staking balances #1447 (@cyrossignol). - Enhance scraper log archiving #1449 (@jamescowens). Fixed: - Re-enable full GUI 32-bit Windows builds - part of #1387 (@jamescowens). - Re-activate Windows Installer #1409 (@TheCharlatan). - Fix Depends and Travis build issues for ARM #1417 (@jamescowens). - Fix syncupdate icons #1421 (@jamescowens). - Fix potential BOINC crash when reading projects #1426 (@cyrossignol). - Fix freeze when unlocking wallet #1428 (@denravonska). - Fix RPC after high priority alert #1432 (@denravonska). - Fix missing poll in GUI when most recent poll expired #1455 (@cyrossignol). Removed: - Remove old, rudimentary side staking implementation #1381 (@denravonska). - Remove auto unlock #1402 (@denravonska). - Remove superblock forwarding #1430 (@denravonska).
This commit implements putting the log archive files in the
logarchive subdirectory to prevent clutter.
It also implements deleting old log files per the following
configuration file (or command line) parameters:
deleteoldlogarchives=0|1 (default true)
logarchiveretainnumfiles=x (default 14)
deleteoldlogarchives defaults to true, which then means that
every time the log archiver is triggered (typically at midnight UTC)
it will archive the current scraper log file into the logarchive
directory, and then apply the rule to retain the x most recent files,
where x is specified by logarchiveretainnumfiles.
An immediate archive (and directory cleanup) can be triggered by issuing
the rpc command archivescraperlog.
If deletelogarchives is set to false, then the scraper.log files are
archived into the logarchive directory, but never deleted.