Skip to content

Add automated link validation to GitHub workflow #295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Karrenbelt opened this issue Jun 20, 2024 · 6 comments
Open

Add automated link validation to GitHub workflow #295

Karrenbelt opened this issue Jun 20, 2024 · 6 comments

Comments

@Karrenbelt
Copy link
Contributor

What content is missing?

There is currently no automated process to check the validity of links in the markdown files within the repository. This leads to potential broken links, affecting the user experience and the reliability of the documentation.

Related content in the wiki

Links break, such as this link here.

> Check this documentary with Dennis Ritchie and Ken Thompson, which perfectly captures spirit and ideas behind UNIX: https://yewtu.be/watch?v=tc4ROCJYbm0

Other relevant resources

N/A


Proposal

I propose to add a test as part of the workflow to automatically check the validity of all links in the markdown files within the repository. This workflow should:

  1. Gather all links, both to local documents/sections and to external resources.
  2. Verify that the links are valid by checking:
    • The path to a local document exists.
    • HTTP requests to external resources return a status code 200 (OK).

To optimize the process:

  • Return a complete report; do not error on first failure.
  • A retry mechanism with a backoff strategy is recommended.
  • A hashset, hashmap, or caching mechanism can be used to prevent redundant checks.
@raxhvl
Copy link
Member

raxhvl commented Jun 22, 2024

I agree with the sentiment.

Docsify uses client side rendering, which could make tracking internal pages challenging. Webarchive could potentially be rate limited if we send too many requests. I think the effort required for the workflow to reliably work will be non-trivial.

There are some github actions that checks broken links. Maybe test them out on a fork and let us know how well that works out.

@raxhvl
Copy link
Member

raxhvl commented Feb 26, 2025

@taxmeifyoucan This sounds like an overkill for now. Should we close this issue?

@taxmeifyoucan
Copy link
Contributor

I think we could add actions for this. People are still finding broken links so having an automated check that helps us keep it up to date would be great. I didn't have time to try any solution but I am sure there are many trivial link check actions, the markdown linter might include it as well

@raxhvl
Copy link
Member

raxhvl commented Feb 27, 2025

Let me take a look this weekend.

@raxhvl
Copy link
Member

raxhvl commented Mar 3, 2025

@taxmeifyoucan I tried few plugins. I found lychee to be the fastest.

Findings:

  1. Web archive always errors out.
  2. GNU's website rate limits, so we always get 429 after few requests.
  3. There are few 403 false positives ( see errors in docs/wiki/research/PBS/PEPC.md in the full report)
  4. It throws error for some relative links, not sure why

It did catch some missing links, check the full report here

@raxhvl
Copy link
Member

raxhvl commented Apr 17, 2025

bumping this @taxmeifyoucan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants