Skip to content

chrisodicho/article-archiver

Repository files navigation

Article Archiver production

The purpose of this library is to convert online articles and blog posts into local markdown by only preserving:

  • article content
  • media assets
  • meta data

The heavy lifting around scraping is done with Cypress and the content is enhanced with Mozilla Readability.


Getting Started

⚠️ This library is under development and not expected to work until the TODO's are completed ⚠️

Installation

npm install -g article-archiver

Usage

npx article-archiver <urls>

Architecture

Architecture

TODO

  • setup cypress
  • configure cypress to scrape URL's
  • implement code cleaner and enhancer
  • implement readability
  • wire up scraper to enhancer
  • setup http server for tmp files
  • setup website-scraper
  • wire up archiver to save local assets to tmp folder
  • setup utf8 and turndown transformers
  • wire up transformer to merge meta data and write to output

About

Node tool to scrape and transform articles into Markdown for local reading

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •