Skip to content

v13: add curation checks documentation #796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions docs/maintenance/architecture/curation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Curation checks

**Intended audience**

This guide is intended for maintainers and developers of InvenioRDM itself.

**Scope**

The guide provides a high-level architectural overview of checks in InvenioRDM.

## Overview

Checks provide a way to run automated verification on draft and inclusion requests to communities. As such, checks require both:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v minor wording:

Suggested change
Checks provide a way to run automated verification on draft and inclusion requests to communities. As such, checks require both:
Checks provide a way to run automated verification on draft review requests and record inclusion requests for a given community. As such, checks require both:


* A **community** that has at least on check config defined
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* A **community** that has at least on check config defined
* A **community** that has at least one check configuration (config) defined

or keep with "configuration" throughout.

* A draft review or record inclusion **request**

Checks, as designed, cannot be run on a draft without both a community and a request.

## Check Config

A check config defines the parameters for a check in a community. Note that each [type of check](#check-component) requires a separate config so there can be multiple per community. See the [Operate an Instance](../../operate/customize/curation-checks.md) documentation for usage details.

## Check Run

A check run is the result of running the check rules against a draft or a record.

## Check Component

A check component is the code which executes the check on the record in accordance with the `params` defined in the database. At current there are two check components defined:

* MetadataCheck — uses the [metadata check config schema](../../reference/checks_config.md) to verify the metadata of a record
* FileFormatsCheck - verifies the extensions of the records files to check if they are open.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor just for super clarity

Suggested change
* FileFormatsCheck - verifies the extensions of the records files to check if they are open.
* FileFormatsCheck - verifies the extensions of the records files to check if they adhere to an open standard.


Check components are designed so that future checks can interact with third-party systems.
43 changes: 43 additions & 0 deletions docs/operate/customize/curation-checks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Curation checks

_Introduced in v13_

> For the mental model of how checks are structured conceptually, refer to the [Maintain and Develop](../../maintenance/architecture/curation.md) documentation.

## Enabling checks

To enable checks, you need to set the following configuration in your `invenio.cfg` file:

```python
# Hook into community request actions
from invenio_rdm_records.checks import requests as checks_requests
RDM_COMMUNITY_SUBMISSION_REQUEST_CLS = checks_requests.CommunitySubmission
RDM_COMMUNITY_INCLUSION_REQUEST_CLS = checks_requests.CommunityInclusion

# Enable the feature flag
CHECKS_ENABLED = True
```

## Configuring checks

!!! warning "Community settings for checks"
The UI for managing checks is not yet available in InvenioRDM
v13. Checks are currently managed programmatically via the Python shell.

Checks are added to a community like so:

```python
from invenio_checks.models import CheckConfig, Severity

check_config = CheckConfig(
community_id=<community-uuid>,
check_id="metadata",
params={ ... },
severity=Severity.INFO,
enabled=True,
)
db.session.add(check_config)
db.session.commit()
```

To run the checks, submit a draft or record to a community and open the corresponding request.
214 changes: 214 additions & 0 deletions docs/reference/checks_config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
# Metadata Checks Configuration Schema
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

placement discussion : would maybe place this in "Maintain and Develop" (even if it is a reference). It's not a general reference usable by many different kinds of users right? Maintainers and maybe operators only. Keeping it close to the other relevant doc makes it easier to grasp the concept in a one-stop-shop location.


This documentation explains the schema and configuration format for creating metadata validation rules in the checks system.

## Overview

The system provides a flexible way to define validation rules for record metadata using JSON stored in the database. Rules can check individual fields, compare values, and validate lists of items with complex conditions. This schema provides a powerful way to define complex validation logic while staying human-readable and maintainable.

## Rule Configuration

A rule configuration is a dictionary with the following structure:

```python
{
"id": "unique-rule-id", # Required
"title": "Human-readable title",
"message": "Error message shown when rule fails",
"description": "Detailed description of the rule",
"level": "info|warning|failure", # Default: "info"
"condition": { # Optional condition expression
"type": "expression_type",
# ... expression configuration
},
"checks": [ # List of check expressions
{
"type": "expression_type",
# ... expression configuration
}
]
}
```

This object is stored in the `params` section of the `CheckConfig` model in the database.

## Expression Types

In order to describe the rules which make up a check or condition, we have created a grammar of **expressions** which can be composed in JSON. An expression is a structured, logical unit which can 1. contain expressions and 2. be evaluated. It can be thought of as how you might compose an Excel cell formula, where each cell reference or function is an expression. We start from the most basic expression, the `field.`

### 1. Field Expression

Accesses a field from the metadata using dot notation, returning that value.

```python
{
"type": "field",
"path": "metadata.title" # Path to the field
}
```

### 2. Comparison Expression

Compares two values using one of the following operators:

- `==` - **Equal to**
- `!=` - **Not equal to**
- `in` - Value is **in** list/dict/string
- `not in` - Value is **not in** list/dict/string
- `~=` - List/dict/string **contains** value
- `!~=` - List/dict/string does **not contain** value
- `^=` - **Starts with**
- `!^=` - Does **not start with**
- `$=` - **Ends with**
- `!$=` - Does **not end with**

```python
{
"type": "comparison",
"left": { # Left operand (can be another expression)
"type": "field",
"path": "metadata.title"
},
"operator": "==", # Comparison operator
"right": "Expected Value" # Right operand (literal value)
}
```

### 3. Logical Expression

Combines multiple expressions with logical operators.

```python
{
"type": "logical",
"operator": "and", # "and" or "or"
"expressions": [ # List of expressions to combine
{
"type": "field",
"path": "metadata.title"
},
{
"type": "comparison",
# ... comparison config
}
]
}
```

### 4. List Expression

Validates lists of items using one of three operators:

- `exists` - Checks if the list exists and is not empty
- `any` - At least one item must match the predicate
- `all` - All items must match the predicate

```python
{
"type": "list",
"operator": "any", # "any", "all", or "exists"
"path": "metadata.creators", # Path to the list field
"predicate": { # Condition to apply to list items (required for "any" and "all")
"type": "expression_type",
# ... expression configuration
}
}
```

## Complete Example

Here's a complete example showing multiple rules with different expression types:

```python
{
"id": "metadata-validation",
"title": "Metadata Validation",
"description": "Validates basic metadata requirements",
"rules": [
{
"id": "license:exists",
"level": "error",
"title": "Record license",
"message": "Licenses are required.",
"description": "All submissions must specify the licensing terms.",
"checks": [
{
"path": "metadata.rights",
"type": "list",
"operator": "exists",
"predicate": {}
}
]
},
{
"id": "creators:identifier",
"level": "info",
"title": "Creator Identifiers",
"message": "Affiliations are recommended for all creators",
"description": "All creators should have a persistent identifier (e.g. an ORCID)",
"condition": {
"path": "metadata.creators",
"type": "list",
"operator": "exists",
"predicate": {
}
},
"checks": [
{
"path": "metadata.creators",
"type": "list",
"operator": "all",
"predicate": {
"type": "logical",
"operator": "and",
"expressions": [
{
"path": "person_or_org.identifiers",
"type": "field"
},
{
"path": "person_or_org.identifiers",
"type": "list",
"operator": "any",
"predicate": {
"path": "identifier",
"type": "field"
}
}
]
}
}
]
},
]
}
```

If these checks fail for a draft or record, the following error messages will be returned in the draft

```javascript
"errors": [
{
"field": "metadata.rights",
"messages": [
"Licenses are required."
],
"description": "All submissions must specify the licensing terms.",
"severity": "error",
"context": {
"community": "<community-uuid>"
}
},
{
"field": "metadata.creators",
"messages": [
"Affiliations are recommended for all creators"
],
"description": "All creators should have a persistent identifier (e.g. an ORCID)",
"severity": "info",
"context": {
"community": "<community-uuid>"
}
},
],
```
4 changes: 4 additions & 0 deletions docs/releases/v13/version-v13.0.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,10 @@ An overview of all the collections can be found in the community browse page (if

- https://127.0.0.1:5000/communities/<community_slug>/browse

### Curation

It is now possible to configure automated **checks** in your communities to provide instant feedback on draft review and record inclusion requests. Checks provide feedback to both the user and reviewer that submissions to your community are compliant with your curation policy. For example, you can enforce that submissions to your community must be preprints, funded by a specific grant or any other requirement on the metadata or files.

### Helm charts

To be announced?
Expand Down
3 changes: 3 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ nav:
- Overview: "operate/customize/configuration.md"
- Audit logs: "operate/customize/audit-logs.md"
- Authentication: "operate/customize/authentication.md"
- Curation checks: "operate/customize/curation-checks.md"
- DOI registration: "operate/customize/dois.md"
- DNB URNs registration: "operate/customize/urns.md"
- Emails:
Expand Down Expand Up @@ -173,6 +174,7 @@ nav:
- Records: maintenance/architecture/records.md
- Communities: maintenance/architecture/communities.md
- Requests: maintenance/architecture/requests.md
- Curation: maintenance/architecture/curation.md
- Event handling: maintenance/architecture/event_handling.md
- Notifications: maintenance/architecture/notifications.md
- Recommended reading: maintenance/architecture/reading.md
Expand Down Expand Up @@ -255,6 +257,7 @@ nav:
- Invenio-cli commands: reference/cli.md
- Invenio commands: reference/cli_invenio.md
- Metadata: reference/metadata.md
- Checks config schema: reference/checks_config.md
- OAI-PMH: reference/oai_pmh.md
- REST API:
- Overview: "reference/rest_api_index.md"
Expand Down