New Extractor Idea: `scihub-dl` to auto-detect inline DOI numbers and download academic paper PDFs

New extractor idea: SCIHUB

e.g. take this academic paper for example: https://www.cell.com/current-biology/fulltext/S0960-9822(19)31469-1

If a full paper PDF is available on scihub, e.g.: https://sci-hub.se/https://www.cell.com/current-biology/fulltext/S0960-9822(19)31469-1 it could be downloaded to a `./archive/<timestmap>/scihub/` output folder.

```bash
# try downloading via verbatim URL first
$ scihub.py -d https://www.cell.com/current-biology/fulltext/S0960-9822(19)31469-1'
DEBUG:Sci-Hub:Successfully downloaded file with identifier https://www.cell.com/current-biology/fulltext/S0960-9822(19)31469-1
```

We could also look for a DOI number in the page URL or page html contents e.g.: `10.1016/j.cub.2019.11.030` using a regex and try downloading that.
```bash
# otherwise try downloading via any regex-extracted bare DOI numbers on the page or in the URL
$ scihub.py -d '10.1016/j.cub.2019.11.030'
DEBUG:Sci-Hub:Successfully downloaded file with identifier 10.1016/j.cub.2019.11.030

$ ls
c28dc1242df6f931c29b9cd445a55597-.cub.2019.11.030.pdf
```

**New Dependencies:**
- https://github.com/zaytoun/scihub.py

**New Extractors:**
- `extractors/scihub.py`

**New Config Options:**
- `SAVE_SCIHUB=True`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

New Extractor Idea: `scihub-dl` to auto-detect inline DOI numbers and download academic paper PDFs #720

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

New Extractor Idea: scihub-dl to auto-detect inline DOI numbers and download academic paper PDFs #720

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

New Extractor Idea: `scihub-dl` to auto-detect inline DOI numbers and download academic paper PDFs #720