Skip to content

Commit c4482fd

Browse files
StarUIbinuxjxltomsdvcrxplusplus1
authored
pull merge (#1)
* fix test 1.1.1.1 took by cloudflare * Fixed db inconsistency (binux#779) * Fixed db creation inconsistency in taskdb, projectdb and resultdb * Fixed typo * using reserved ip address for testing rolling out version 0.3.10 * Fix mysql return bytes as field name type (binux#787) * use pip version of mysql-connector-python for testing * fix mysql return bytes as field names type * fix raise Unread result found error This error raise on mysql-connector-python with C extension * fix test Pure version raise InterfaceError, but C extension version raise DatabaseError * fix binux#799 * optimise scheluler dynamic select limit and improve task queue (binux#796) * optimise scheduler select-limit and task queue * fix test case in python2.6 * fix: time priority queue only compare exetime * update:add test case for time priority queue * optimise: add globally auto increasing value for task to keep priority queue in order * change async to async_ (binux#803) * change async to async_ * change async to async_ in tests * change async_ to async_mode * modify async to async_mode to support python3.7 * add python3.7 CI test * add python3.7 CI test * add python3.7 CI test * add python3.7 CI test * remove python3.7 CI test * add py3.7-dev CI test * add support py3.7-dev CI test * removed 2.6 due to lack of support, changed pip install for 3.5 due to pip versioning * feature puppeteer js engine * features: add opened pages maximum limit, default 5 * fix: python3.5 install lxml error * add puppeteer fetcher * update * fix bugs 1. some args "async" haven't been replaced completely yet 2. delete Python 3.3 in .travis.yml because the current version of lxml is not supported by Python3.3 * use suggested python3.7 build * fix build for 3.3 * 1. python2.7 image is different when using metrix 2. pip install just works now days * sudo not required any more? * try not to specify a version for apt-get * fix setup.py test for py3.3 * try manually install * try again * fix for 3.7 * try install librt * try again * allow fail * updated requirements.txt to fixed package versions * port to python 3.6 * upgrade python-six * updated travis.yml * fixed "connect to scheduler rpc error: error(111, Connection refused)" error * fixed phantomjs libssl_conf.so error * travis test * another Travis test * trying to trace "cannot find module express" error in Travis * using NODE_PATH env var * moved NODE_PATH assignment after install * making symlink to node_modules * travis test * node modules are currently missing from travis * added npm install to travis.yml * fixed travis node dependancy issues * using run_in_thread for scheduler and fetcher dispatch again * accommodate changes made in run.py to tests * changed test_90_docker_scheduler * added extra asserts to tests * test * upgraded sqlAlchemy * sqlalchemy upgrade * sqlalchemy upgrade * sqlalchemy upgrade * sqlalchemy upgrade * sqlalchemy upgrade * sqlalchemy upgrade * sqlalchemy upgrade fix * sqlalchemy upgrade * added extra assertions * sqlalchemy upgrade * sqlalchemy upgrade * sqlalchemy upgrade * undo previous * tracing errors * fix sqlalchemy data encoding * sqlalchemy changed dict encoding to pure json string * test_10_save mongodb fix * undo previous * tracing test_10_save mongodb bug * tracing test_10_save mongodb bug * upgraded pymongo * mongo tests now passing * fixed test_a110_one failing by "fetcher() got an unexpected keyword argument xmlrpc" * upgraded pika * tracing RabbitMQ ConnectionRefusedError: [Errno 111] Connection refused * fixed typo * tracing RabbitMQ ConnectionRefusedError: [Errno 111] Connection refused * tracing RabbitMQ ConnectionRefusedError: [Errno 111] Connection refused * tracing RabbitMQ ConnectionRefusedError: [Errno 111] Connection refused * switching to Pika for Rabbitmq * skip TestAmqpRabbitMQ * travis test * travis build failing with 0 errors and 0 failures, 40 "unexpected successes" * added updated docker-compose.yaml * cleanup * initial couchdb projectdb implementation * test url parser * fix couchdb connect url * fix couchdb connect url * fix couchdb json encoding * fix couchdb json encoding * fix couchdb url encoding * fix couchdb urls * fixed couchdb request headers * travis upgrade couchdb * travis upgrade couchdb * travis upgrade couchdb * travis upgrade couchdb * travis upgrade couchdb * fixed "Fields must be an array of strings, not: null" eroor * fixed responses * fixed drop database * tracing insertion issue * fixed default values * tracing update bug * fixed update bug * fixed drop bug * changed default fields * fixed drop bug * fixed _default_fields usage * fixed update bug * fixed update bug * fixed drop bug * tracing update bug * fixed drop bug * tracing drop bug * fixed drop bug * fixed db naming issue * fixed drop bug * initial resultdb implementation * added resultdb tests * fix resultdb tests * fix resultdb init * fix resultdb init * fix missing class var * fixed get_docs * fixed db naming * fixed db naming * fixed db naming * fixed get_docs * minor fixes * fixed update_doc * fixed update_doc * fixed get_doc * fixed get_docs * fixed get_docs * fixed parse * fixed get_all_docs * fixed get_doc * fixed update_doc * minor fixes * fixed select * initial taskdb implementation * added debug prints * added collection_prefix * minor fixes * minor fixes * fixed update * fixed test_25_get_task * fixed status_count selector * fixed update * tracing test_create_project bug * fixed collection naming * Revert "fixed collection naming" This reverts commit 0d89a0d. * fixed collection naming * minor fixes * minor fixes * fixed test_z10_drop * fixed test_50_load_tasks * fixed get_docs * fixed get methods * cleanup * removed python 3.3 and added 3.7 and 3.8 * added index * tracing index create bug * fixed index create bug * fixed index create bug * fixed index create bug * minor test fixes * added couchdb test run * added couchdb test run * full working example * fixed test setup * fixed test setup * updated travis file for couchdb auth * updated travis file for couchdb auth * added credentials exception * fixed credentials * fixed test auth * fixed test auth * tracing auth issue * tracing auth issue * fixed test auth issue * fixed test test_60a_docker_couchdb * fixed test test_60a_docker_couchdb * cleanup * attempting to remove "unexpected successes" * tracing "unexpected successes" * tracing "unexpected successes" * tracing "unexpected successes" * tracing "unexpected successes" * tracing "unexpected successes" * tracing "unexpected successes" * Revert "tracing "unexpected successes"" This reverts commit 829da8c. * tracing "unexpected successes" * tracing "unexpected successes" in crawl * tracing "unexpected successes" in crawl * tracing "unexpected successes" * tracing "unexpected successes" * tracing "unexpected successes" * tracing "unexpected successes" * tracing "unexpected successes" * tracing "unexpected successes" * tracing "unexpected successes" * fixed "unexpected successes" * fixed TestFetcherProcessor * fixed TestFetcherProcessor * fixed TestFetcherProcessor * fix BaseHandler * fix BaseHandler * fix BaseHandler * fix BaseHandler * fix BaseHandler * fix BaseHandler * fix BaseHandler * removed beanstalkc * cleanup * removed 3.8 from travis * removed python 3.8 from setup.py * fixed test_60_relist_projects change * fixed .travis * added https to couchdb + cleanup + added couchdb to docs * added extra comment on top of docker-compose example * fixed docker-compose issue * improve docker-compose sample * remove demo link * fix test break because couchdb failing to start * try to use non-auth for CouchDB test * more couchdb_password * improve couchdb allow empty username password * drop support for couchdb Co-authored-by: Roy Binux <[email protected]> Co-authored-by: jxltom <[email protected]> Co-authored-by: binux <[email protected]> Co-authored-by: sdvcrx <[email protected]> Co-authored-by: Lucas <[email protected]> Co-authored-by: vibiu <[email protected]> Co-authored-by: farmercode <[email protected]> Co-authored-by: Phillip <[email protected]> Co-authored-by: feiyang <[email protected]> Co-authored-by: clchen <[email protected]> Co-authored-by: v1nc3nt <[email protected]> Co-authored-by: Keith Tunstead <[email protected]>
1 parent c8d4558 commit c4482fd

File tree

128 files changed

+4991
-2460
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

128 files changed

+4991
-2460
lines changed

.github/ISSUE_TEMPLATE.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
<!--
2+
Thanks for using pyspider!
3+
4+
如果你需要使用中文提问,请将问题提交到 https://segmentfault.com/t/pyspider
5+
-->
6+
7+
* pyspider version:
8+
* Operating system:
9+
* Start up command:
10+
11+
### Expected behavior
12+
13+
<!-- What do you think should happen? -->
14+
15+
### Actual behavior
16+
17+
<!-- What actually happens? -->
18+
19+
### How to reproduce
20+
21+
<!--
22+
23+
The best chance of getting help is providing enough information that can be reproduce the issue you have.
24+
25+
If it's related to API or extraction behavior, please paste the script of your project.
26+
If it's related to scheduling of whole project, please paste the screenshot of queue status on the top in dashboard.
27+
28+
-->

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
*.py[cod]
22
data/*
3-
3+
.venv
4+
.idea
45
# C extensions
56
*.so
67

.travis.yml

Lines changed: 24 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,41 @@
11
language: python
2+
cache: pip
23
python:
3-
- "2.6"
4-
- "2.7"
5-
- "3.3"
6-
- "3.4"
4+
- 3.5
5+
- 3.6
6+
- 3.7
7+
#- 3.8
78
services:
9+
- docker
810
- mongodb
911
- rabbitmq
10-
- redis-server
11-
- elasticsearch
12+
- redis
13+
- mysql
14+
# - elasticsearch
15+
- postgresql
1216
addons:
13-
postgresql: "9.4"
17+
postgresql: "9.4"
18+
apt:
19+
packages:
20+
- rabbitmq-server
21+
env:
22+
- IGNORE_COUCHDB=1
23+
1424
before_install:
1525
- sudo apt-get update -qq
16-
- sudo apt-get install -y beanstalkd
17-
- echo "START=yes" | sudo tee -a /etc/default/beanstalkd > /dev/null
18-
- sudo service beanstalkd start
26+
- curl -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/deb/elasticsearch/2.4.0/elasticsearch-2.4.0.deb && sudo dpkg -i --force-confnew elasticsearch-2.4.0.deb && sudo service elasticsearch restart
27+
- npm install express puppeteer
28+
- sudo docker pull scrapinghub/splash
29+
- sudo docker run -d --net=host scrapinghub/splash
1930
before_script:
2031
- psql -c "CREATE DATABASE pyspider_test_taskdb ENCODING 'UTF8' TEMPLATE=template0;" -U postgres
2132
- psql -c "CREATE DATABASE pyspider_test_projectdb ENCODING 'UTF8' TEMPLATE=template0;" -U postgres
2233
- psql -c "CREATE DATABASE pyspider_test_resultdb ENCODING 'UTF8' TEMPLATE=template0;" -U postgres
2334
- sleep 10
2435
install:
25-
- pip install http://cdn.mysql.com/Downloads/Connector-Python/mysql-connector-python-2.0.4.zip#md5=3df394d89300db95163f17c843ef49df
26-
- pip install --allow-all-external -e .[all,test]
36+
- pip install https://github.com/marcus67/easywebdav/archive/master.zip
37+
- sudo apt-get install libgnutls28-dev
38+
- pip install -e .[all,test]
2739
- pip install coveralls
2840
script:
2941
- coverage run setup.py test

Dockerfile

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,28 @@
1-
FROM cmfatih/phantomjs
1+
FROM python:3.6
22
MAINTAINER binux <[email protected]>
33

4-
# install python
5-
RUN apt-get update && \
6-
apt-get install -y python python-dev python-distribute python-pip && \
7-
apt-get install -y libcurl4-openssl-dev libxml2-dev libxslt1-dev python-lxml python-mysqldb libpq-dev
4+
# install phantomjs
5+
RUN mkdir -p /opt/phantomjs \
6+
&& cd /opt/phantomjs \
7+
&& wget -O phantomjs.tar.bz2 https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2 \
8+
&& tar xavf phantomjs.tar.bz2 --strip-components 1 \
9+
&& ln -s /opt/phantomjs/bin/phantomjs /usr/local/bin/phantomjs \
10+
&& rm phantomjs.tar.bz2
11+
# Fix Error: libssl_conf.so: cannot open shared object file: No such file or directory
12+
ENV OPENSSL_CONF=/etc/ssl/
13+
14+
# install nodejs
15+
ENV NODEJS_VERSION=8.15.0 \
16+
PATH=$PATH:/opt/node/bin
17+
WORKDIR "/opt/node"
18+
RUN apt-get -qq update && apt-get -qq install -y curl ca-certificates libx11-xcb1 libxtst6 libnss3 libasound2 libatk-bridge2.0-0 libgtk-3-0 --no-install-recommends && \
19+
curl -sL https://nodejs.org/dist/v${NODEJS_VERSION}/node-v${NODEJS_VERSION}-linux-x64.tar.gz | tar xz --strip-components=1 && \
20+
rm -rf /var/lib/apt/lists/*
21+
RUN npm install puppeteer express
822

923
# install requirements
10-
RUN pip install http://cdn.mysql.com/Downloads/Connector-Python/mysql-connector-python-2.0.4.zip#md5=3df394d89300db95163f17c843ef49df
11-
ADD requirements.txt /opt/pyspider/requirements.txt
24+
COPY requirements.txt /opt/pyspider/requirements.txt
1225
RUN pip install -r /opt/pyspider/requirements.txt
13-
RUN pip install -U pip
1426

1527
# add all repo
1628
ADD ./ /opt/pyspider
@@ -19,7 +31,10 @@ ADD ./ /opt/pyspider
1931
WORKDIR /opt/pyspider
2032
RUN pip install -e .[all]
2133

22-
VOLUME ["/opt/pyspider"]
34+
# Create a symbolic link to node_modules
35+
RUN ln -s /opt/node/node_modules ./node_modules
36+
37+
#VOLUME ["/opt/pyspider"]
2338
ENTRYPOINT ["pyspider"]
2439

25-
EXPOSE 5000 23333 24444 25555
40+
EXPOSE 5000 23333 24444 25555 22222

README.md

Lines changed: 6 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
1-
pyspider [![Build Status]][Travis CI] [![Coverage Status]][Coverage] [![Try]][Demo]
1+
pyspider [![Build Status]][Travis CI] [![Coverage Status]][Coverage]
22
========
33

4-
A Powerful Spider(Web Crawler) System in Python. **[TRY IT NOW!][Demo]**
4+
A Powerful Spider(Web Crawler) System in Python.
55

66
- Write script in Python
77
- Powerful WebUI with script editor, task monitor, project manager and result viewer
88
- [MySQL](https://www.mysql.com/), [MongoDB](https://www.mongodb.org/), [Redis](http://redis.io/), [SQLite](https://www.sqlite.org/), [Elasticsearch](https://www.elastic.co/products/elasticsearch); [PostgreSQL](http://www.postgresql.org/) with [SQLAlchemy](http://www.sqlalchemy.org/) as database backend
9-
- [RabbitMQ](http://www.rabbitmq.com/), [Beanstalk](http://kr.github.com/beanstalkd/), [Redis](http://redis.io/) and [Kombu](http://kombu.readthedocs.org/) as message queue
9+
- [RabbitMQ](http://www.rabbitmq.com/), [Redis](http://redis.io/) and [Kombu](http://kombu.readthedocs.org/) as message queue
1010
- Task priority, retry, periodical, recrawl by age, etc...
11-
- Distributed architecture, Crawl Javascript pages, Python 2&3, etc...
11+
- Distributed architecture, Crawl Javascript pages, Python 2.{6,7}, 3.{3,4,5,6} support, etc...
1212

1313
Tutorial: [http://docs.pyspider.org/en/latest/tutorial/](http://docs.pyspider.org/en/latest/tutorial/)
1414
Documentation: [http://docs.pyspider.org/](http://docs.pyspider.org/)
@@ -41,15 +41,15 @@ class Handler(BaseHandler):
4141
}
4242
```
4343

44-
[![Demo][Demo Img]][Demo]
45-
4644

4745
Installation
4846
------------
4947

5048
* `pip install pyspider`
5149
* run command `pyspider`, visit [http://localhost:5000/](http://localhost:5000/)
5250

51+
**WARNING:** WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network or [enable `need-auth` for webui](http://docs.pyspider.org/en/latest/Command-Line/#-config).
52+
5353
Quickstart: [http://docs.pyspider.org/en/latest/Quickstart/](http://docs.pyspider.org/en/latest/Quickstart/)
5454

5555
Contribute
@@ -66,18 +66,9 @@ TODO
6666

6767
### v0.4.0
6868

69-
- [x] local mode, load script from file.
70-
- [x] works as a framework (all components running in one process, no threads)
71-
- [x] redis
72-
- [x] shell mode like `scrapy shell`
7369
- [ ] a visual scraping interface like [portia](https://github.com/scrapinghub/portia)
7470

7571

76-
### more
77-
78-
- [x] edit script with vim via [WebDAV](http://en.wikipedia.org/wiki/WebDAV)
79-
80-
8172
License
8273
-------
8374
Licensed under the Apache License, Version 2.0
@@ -88,7 +79,5 @@ Licensed under the Apache License, Version 2.0
8879
[Coverage Status]: https://img.shields.io/coveralls/binux/pyspider.svg?branch=master&style=flat
8980
[Coverage]: https://coveralls.io/r/binux/pyspider
9081
[Try]: https://img.shields.io/badge/try-pyspider-blue.svg?style=flat
91-
[Demo]: http://demo.pyspider.org/
92-
[Demo Img]: https://github.com/binux/pyspider/blob/master/docs/imgs/demo.png
9382
[Issue]: https://github.com/binux/pyspider/issues
9483
[User Group]: https://groups.google.com/group/pyspider-users

config_example.json

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{
2+
"taskdb": "couchdb+taskdb://user:password@couchdb:5984",
3+
"projectdb": "couchdb+projectdb://user:password@couchdb:5984",
4+
"resultdb": "couchdb+resultdb://user:password@couchdb:5984",
5+
"message_queue": "amqp://rabbitmq:5672/%2F",
6+
"webui": {
7+
"username": "username",
8+
"password": "password",
9+
"need-auth": true,
10+
"scheduler-rpc": "http://scheduler:23333",
11+
"fetcher-rpc": "http://fetcher:24444"
12+
}
13+
}

docker-compose.yaml

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
version: "3.7"
2+
3+
# replace /path/to/dir/ to point to config.json
4+
5+
# The RabbitMQ and CouchDB services can take some time to startup.
6+
# During this time most of the pyspider services will exit and restart.
7+
# Once RabbitMQ and CouchDB are fully up and running everything should run as normal.
8+
9+
services:
10+
rabbitmq:
11+
image: rabbitmq:alpine
12+
container_name: rabbitmq
13+
networks:
14+
- pyspider
15+
command: rabbitmq-server
16+
mysql:
17+
image: mysql:latest
18+
container_name: mysql
19+
volumes:
20+
- /tmp:/var/lib/mysql
21+
environment:
22+
- MYSQL_ALLOW_EMPTY_PASSWORD=yes
23+
networks:
24+
- pyspider
25+
phantomjs:
26+
image: pyspider:latest
27+
container_name: phantomjs
28+
networks:
29+
- pyspider
30+
volumes:
31+
- ./config_example.json:/opt/pyspider/config.json
32+
command: -c config.json phantomjs
33+
depends_on:
34+
- couchdb
35+
- rabbitmq
36+
restart: unless-stopped
37+
result:
38+
image: pyspider:latest
39+
container_name: result
40+
networks:
41+
- pyspider
42+
volumes:
43+
- ./config_example.json:/opt/pyspider/config.json
44+
command: -c config.json result_worker
45+
depends_on:
46+
- couchdb
47+
- rabbitmq
48+
restart: unless-stopped # Sometimes we'll get a connection refused error because couchdb has yet to fully start
49+
processor:
50+
container_name: processor
51+
image: pyspider:latest
52+
networks:
53+
- pyspider
54+
volumes:
55+
- ./config_example.json:/opt/pyspider/config.json
56+
command: -c config.json processor
57+
depends_on:
58+
- couchdb
59+
- rabbitmq
60+
restart: unless-stopped
61+
fetcher:
62+
image: pyspider:latest
63+
container_name: fetcher
64+
networks:
65+
- pyspider
66+
volumes:
67+
- ./config_example.json:/opt/pyspider/config.json
68+
command : -c config.json fetcher
69+
depends_on:
70+
- couchdb
71+
- rabbitmq
72+
restart: unless-stopped
73+
scheduler:
74+
image: pyspider:latest
75+
container_name: scheduler
76+
networks:
77+
- pyspider
78+
volumes:
79+
- ./config_example.json:/opt/pyspider/config.json
80+
command: -c config.json scheduler
81+
depends_on:
82+
- couchdb
83+
- rabbitmq
84+
restart: unless-stopped
85+
webui:
86+
image: pyspider:latest
87+
container_name: webui
88+
ports:
89+
- "5050:5000"
90+
networks:
91+
- pyspider
92+
volumes:
93+
- ./config_example.json:/opt/pyspider/config.json
94+
command: -c config.json webui
95+
depends_on:
96+
- couchdb
97+
- rabbitmq
98+
restart: unless-stopped
99+
100+
networks:
101+
pyspider:
102+
external:
103+
name: pyspider
104+
default:
105+
driver: bridge

docs/About-Projects.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,26 @@
11
About Projects
22
==============
33

4-
In most case, a project is one script you write for one website.
4+
In most cases, a project is one script you write for one website.
55

6-
* Projects are independent, but you can import another project as module with `from projects import other_project`
7-
* project has 5 status: `TODO`, `STOP`, `CHECKING`, `DEBUG`, `RUNNING`
6+
* Projects are independent, but you can import another project as a module with `from projects import other_project`
7+
* A project has 5 status: `TODO`, `STOP`, `CHECKING`, `DEBUG` and `RUNNING`
88
- `TODO` - a script is just created to be written
9-
- `STOP` - you can mark a project `STOP` if you want it STOP (= =).
10-
- `CHECKING` - when a running project is modified, to prevent incomplete modification, project status will set as `CHECKING` automatically.
11-
- `DEBUG`/`RUNNING` - these two status have on difference to spider. But it's good to mark as `DEBUG` when it's running the first time then change to `RUNNING` after checked.
9+
- `STOP` - you can mark a project as `STOP` if you want it to STOP (= =).
10+
- `CHECKING` - when a running project is modified, to prevent incomplete modification, project status will be set as `CHECKING` automatically.
11+
- `DEBUG`/`RUNNING` - these two status have no difference to spider. But it's good to mark it as `DEBUG` when it's running the first time then change it to `RUNNING` after being checked.
1212
* The crawl rate is controlled by `rate` and `burst` with [token-bucket](http://en.wikipedia.org/wiki/Token_bucket) algorithm.
13-
- `rate` - how many requests in one seconds
14-
- `burst` - consider this situation, `rate/burst = 0.1/3`, it means spider scrawl 1 page every 10 seconds. All tasks are finished, project is checking last updated items every minute. Assume that 3 new items are found, pyspider will "burst" and crawl 3 tasks without waiting 3*10 seconds. However, the fourth task needs wait 10 seconds.
15-
* to delete a project, set `group` to `delete` and status to `STOP`, wait 24 hours.
13+
- `rate` - how many requests in one second
14+
- `burst` - consider this situation, `rate/burst = 0.1/3`, it means that the spider scrawls 1 page every 10 seconds. All tasks are finished, project is checking last updated items every minute. Assume that 3 new items are found, pyspider will "burst" and crawl 3 tasks without waiting 3*10 seconds. However, the fourth task needs wait 10 seconds.
15+
* To delete a project, set `group` to `delete` and status to `STOP`, wait 24 hours.
1616

1717

1818
`on_finished` callback
1919
--------------------
2020
You can override `on_finished` method in the project, the method would be triggered when the task_queue goes to 0.
2121

22-
Example 1: when you starts a project to crawl a website with 100 pages, the `on_finished` callback will be fired when 100 pages success crawled or failed after retries.
23-
Example 2: A project with `auto_recrawl` tasks will **NEVER** trigger the `on_finished` callback, because time queue will never become 0 when auto_recrawl tasks in it.
24-
Example 3: A project with `@every` decorated method will trigger the `on_finished` callback every time when the new submitted tasks finished.
22+
Example 1: When you start a project to crawl a website with 100 pages, the `on_finished` callback will be fired when 100 pages are successfully crawled or failed after retries.
23+
24+
Example 2: A project with `auto_recrawl` tasks will **NEVER** trigger the `on_finished` callback, because time queue will never become 0 when there are auto_recrawl tasks in it.
25+
26+
Example 3: A project with `@every` decorated method will trigger the `on_finished` callback every time when the newly submitted tasks are finished.

0 commit comments

Comments
 (0)