Skip to content

[Work In Progress] Gracefully reload the ZIM Library #734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Optimus-NP
Copy link

@Optimus-NP Optimus-NP commented Feb 22, 2025

Fixes #729

Refactor and Improve Code Formatting in kiwix-serve.cpp

  • Introduced listenDirectoryChanges() function (Linux-only) to monitor directory changes and reload the library dynamically.
  • Used inotify to detect modifications in watched directories and trigger library updates.
  • Implemented a dedicated monitor thread to watch for changes in the specified directory.
  • Ensured proper synchronization and thread safety while handling directory monitoring.
  • Reformatted and improved code readability by adjusting indentation, spacing, and alignment.
  • Fixed inconsistent whitespace in function definitions, loops, and conditions.
  • Standardized #include ordering for better organization.
  • Reordered #include statements to maintain consistency across platforms.
  • Improved error handling by adding better structured try-catch blocks.
  • Standardized macro definitions and improved formatting for readability.
  • Improved logging messages for better debugging and clarity.

Testing

 /usr/local/bin/kiwix-serve --port=8080 --library /tmp/zimlibrary/linux_library.xml --monitorLibrary

After running the specified command, the console logs indicated a successful configuration reload following modifications to the file /tmp/zimlibrary/linux_library.xml. To verify the changes, I accessed the web interface at http://localhost:8080/ and confirmed that the library updates were applied as expected.

Refactor and Improve Code Formatting in kiwix-serve.cpp

- Introduced `listenDirectoryChanges()` function (Linux-only) to monitor directory changes and reload the library dynamically.
- Used `inotify` to detect modifications in watched directories and trigger library updates.
- Implemented a dedicated **monitor thread** to watch for changes in the specified directory.
- Ensured proper synchronization and thread safety while handling directory monitoring.
- Reformatted and improved code readability by adjusting indentation, spacing, and alignment.
- Fixed inconsistent whitespace in function definitions, loops, and conditions.
- Standardized `#include` ordering for better organization.
- Reordered `#include` statements to maintain consistency across platforms.
- Improved error handling by adding better structured `try-catch` blocks.
- Standardized macro definitions and improved formatting for readability.
- Improved logging messages for better debugging and clarity.
@kelson42
Copy link
Contributor

kelson42 commented Mar 4, 2025

@Optimus-NP Thank you for your PR. This is really a lot of code with different purposes. Can you please put all the cosmetic changes in a dedicated PR?

@kelson42 kelson42 requested a review from veloman-yunkan March 4, 2025 04:20
@kelson42
Copy link
Contributor

kelson42 commented Mar 4, 2025

@Optimus-NP Thank younfor your PR. This is really a lot of code with different purposes. Can you please put all the cosmetocs changes in a dedicated PR?

Regarding your solution, I really don't know if this is the good approach and not even sure this is what is requested here. To me, it seems to fix #735 with a linux specific approach (so won't work on Windows).

@kelson42 kelson42 self-requested a review March 4, 2025 04:26
@kelson42 kelson42 marked this pull request as draft March 4, 2025 04:27
@Optimus-NP
Copy link
Author

Optimus-NP commented Mar 9, 2025

Hi @kelson42 @rgaudin ,

Thank you for reviewing the PR I shared. I initially misunderstood the ask in the issue: #729, but I now have a clearer understanding. Please take a look at my revised approach below.

Identifying Keys to Purge

Currently, when we reload the library via the Kiwix Manager from KiwixServer::reload, Kiwix Manager does not return any data to Kiwix Server. Moving forward, I plan to modify this behavior so that Kiwix Manager returns information about books that have been added, deleted, or modified. Using this data, we can then call the relevant purge APIs in the Varnish cache to update only the affected entries.

Purging in the Varnish Cache

Although I’m not fully familiar with the data structures used in the Varnish cache, if the data is stored per book, updating the signature of the KiwixManager::reload function should allow us to determine which cache keys need to be purged. Based on my research into Varnish cache documentation—particularly the CLI tool varnishadm—it appears possible to purge specific books from the cache rather than clearing the entire catalog for library.kiwix.org. This would enable more efficient cache management.

Request for Input from Kiwix Maintainers

I’d love to hear your thoughts on this approach. Additionally, I would appreciate any insights into how Varnish cache is currently implemented in front of the servers for library.kiwix.org.

I’m excited about working on this task and would love to implement this solution or explore better alternatives you might suggest.

@evrial
Copy link

evrial commented Mar 9, 2025

Honestly I think the most graceful way is keeping things simple and accept a path as single valid input to search zim files. This way you don't need any monitor thread and platform specific code.

@rgaudin
Copy link
Member

rgaudin commented Mar 10, 2025

@Optimus-NP there is not and should not be anything about varnish in kiwix. Varnish is an example of the need we have to be aware when the library is reloaded. We have a featuere that automatically refreshes the library ; we should have a way to be informed when this happens. That's the core of #729. The rest is context to justify the new feature request.

@Optimus-NP
Copy link
Author

Optimus-NP commented Mar 15, 2025

Thank you for your response @rgaudin

"Could you please point me to the part of the code where you mentioned, 'To work around this, we are now waiting 10s after writing the XML file to disk and purging the cache. Of course, that's arbitrary, ugly, and fragile.'?

Thank you for your help!"

@rgaudin
Copy link
Member

rgaudin commented Mar 15, 2025

@Optimus-NP kiwix/operations@29f06d6 but that is not and should not be relevant.

@Optimus-NP
Copy link
Author

@rgaudin

Thank you for Code link

Would it be safe to assume that the Kiwix server and library-maintain.py script are executing on the same host?

If they do, I have a proposal to streamline the process:
1️⃣ When the Kiwix server successfully reloads the library path, it writes a confirmation message to a file.
2️⃣ A daemon process monitors this file and automatically purges the cache when an update is detected.
3️⃣ This way, we can eliminate the need to call the purge API from the library-maintain.py script, making the workflow more efficient.

Let me know your thoughts on this approach!

@rgaudin
Copy link
Member

rgaudin commented Mar 17, 2025

No, we definitely dont want to do that

@Optimus-NP
Copy link
Author

Thanks for the feedback, @rgaudin.

I’ve made a small adjustment to my proposal.

Rather than using a file for message passing, we could leverage Apache Kafka to manage communication more effectively. When the Kiwix server successfully reloads the library, it could publish a message to a Kafka topic. A daemon could then subscribe to this topic and trigger the cache purge automatically. This would remove the need for the purge API in the library-maintain.py script and make the process more scalable.

Thanks for your esteemed guidance so far.

@Optimus-NP
Copy link
Author

@rgaudin

Requesting traction on this issue.

@rgaudin
Copy link
Member

rgaudin commented Mar 24, 2025

@Optimus-NP I am not the maintainer of kiwix-tools nor libkiwix (where this might lend depending on how it's implemented).
It's a decision for @veloman-yunkan and @kelson42 to make I believe but proposal and discussion are to happen in the ticket, not in a PR IMO.

@Optimus-NP
Copy link
Author

@Optimus-NP I am not the maintainer of kiwix-tools nor libkiwix (where this might lend depending on how it's implemented). It's a decision for @veloman-yunkan and @kelson42 to make I believe but proposal and discussion are to happen in the ticket, not in a PR IMO.

Thanks for your response, @rgaudin. I have taken the conversation back into the ticket by design proposal through this comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Knowing when library has been reloaded
4 participants