Skip to content

ops: jobs: new jobs docs #784

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

yashlamba
Copy link
Member

Creating this PR to collaborate on Jobs documentation

@yashlamba yashlamba marked this pull request as ready for review June 20, 2025 09:32
@carlinmack carlinmack moved this from In progress to In review 🔍 in Sprint Q2/2025 🌻 Jun 20, 2025
Co-authored-by: Carlin MacKenzie <[email protected]>
Copy link
Contributor

@carlinmack carlinmack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a couple things to fix :)

@carlinmack carlinmack moved this from In review 🔍 to In progress in Sprint Q2/2025 🌻 Jun 20, 2025
```

### Scheduler
Invenio Job System uses a custom celery task scheduler which requires you to run a separate - celery beat. You can run the custom beat in a separate container/shell by running:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Invenio Job System uses a custom celery task scheduler which requires you to run a separate - celery beat. You can run the custom beat in a separate container/shell by running:
The Invenio job system uses a custom celery task scheduler which requires a separate celery beat. You can run the custom beat in a separate container/shell like so:

@yashlamba yashlamba moved this from In progress to In review 🔍 in Sprint Q2/2025 🌻 Jun 20, 2025

## Why a Job System?

Traditionally, Celery tasks in InvenioRDM were launched programmatically or via CLI. This created challenges:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence is a bit strange to me, because usually we didn't run celery tasks by cli, in majority of cases celery tasks were launched by the application or cron

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that majority of cases but I have trigered a fet times the user_sync task via CLI, otherwise there was no other way.

replace user with instance manager/developer.

Co-authored-by: Karolina <[email protected]>

## Logging & Auditability

If a Celery task logs via `current_app.logger`, and is invoked as a job, its logs are captured and stored with the run metadata. These logs are shown in the admin UI. All logs are stored in OpenSearch and displayed via the job run UI and API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part looks to me more as a admin panel user documentation, not the maintenance description for jobs module. For maintenance it would be more relevant to explain how the log line ends up in the report (how to call logger and when - for different log levels)

- ⚠️ Partial Success (`TaskExecutionError`).
- ❌ Failure (any other exception).

Use `TaskExecutionError` when the task completes but with non-fatal issues (e.g., partial data processed).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general comment: when I read TaskExecutionError I expect failure and description in this sentence says it is used for non-fatal issues. I think it is confusing semantically

- Run immediately from the UI.
- Scheduled with CRON or interval-based configurations via the admin panel.

**Note:** There is no built-in concurrency lock. If the same job is triggered twice in parallel, both will run. Developers must ensure idempotency in task logic if required.
Copy link
Contributor

@kpsherva kpsherva Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be highlighted in red, not only note

Comment on lines +145 to +148
Jobs can be:

- Run immediately from the UI.
- Scheduled with CRON or interval-based configurations via the admin panel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to me also that could be a part of admin user documentation, not maintenance


---

## Summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a summary?


---

## Error Handling and Logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section should be next to https://github.com/inveniosoftware/docs-invenio-rdm/pull/784/files#diff-f746076d7b34f08dba7552507cae91027d192ff10bd6118a49b7e97d5261e205R17, I think it is more relevant to mention it there, while creating the task to remember to log things


- **Since:** Optional timestamp in `YYYY-MM-DD HH:mm` format to continue processing from that particular timestamp. _Uses the last successful run if left empty._

#### Advanced Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part is only related with the datastream jobs, right? this is not made really clear IMO

@@ -0,0 +1,5 @@
## Jobs

A powerful new job management system, giving administrators control over background tasks directly from the administration interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A powerful new job management system, giving administrators control over background tasks directly from the administration interface.
A powerful asynchronous tasks management interface called Invenio Jobs, giving administrators control over background tasks directly from the administration interface.

Co-authored-by: Karolina <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In review 🔍
Development

Successfully merging this pull request may close these issues.

Document and explain the new Jobs feature introduced in invenio-jobs
5 participants