-
Notifications
You must be signed in to change notification settings - Fork 911
[RFC] [Semantic Convention] "Job" traces #1582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] [Semantic Convention] "Job" traces #1582
Conversation
|
f420fcb
to
5208277
Compare
I think we also need to do some modeling work. E.g. to answer the following questions:
|
For DelayedJob, Rake, Airflow, etc. it would be Any code that enqueues (and does not execute) the jobs would be
In practice, child jobs are typically enqueued as new jobs with new
A process which listens/consumes from Kafka and then performs a job synchronously is probably traced as a |
I am more familiar with https://spring.io/projects/spring-batch which, it seems to me, follows different approach, hence my questions. |
Btw, @johnnyshields have you seen open-telemetry/semantic-conventions#1640 ? |
@iNikem no i had not seen that, funny because I searched a bit. I think it is roughly compatible with this one. |
This roughly corresponds to the first two spans for spring-batch that I listed here.
This is rather different from what spring batch jobs look like. Spring batch jobs are hierarchical: every job is composed of steps, each step may be split into chunks (roughly equivalent to DB transactions), and each chunk may read, process and write several items. Everything in the same process. |
- id: scheduled_run_time | ||
type: string | ||
brief: > | ||
A string containing the time when the job was scheduled to run, specified in | ||
[ISO 8601](https://www.iso.org/iso-8601-date-and-time-format.html) | ||
format expressed in [UTC](https://www.w3.org/TR/NOTE-datetime). | ||
For job attempts which are being retried, this should reflect when the current | ||
attempt was scheduled (e.g. using a back-off timer.) | ||
examples: "2021-03-23T13:47:06Z" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need a small, separate semantic convention for scheduled operations: you can run a cron job (or use a ScheduledExecutorService
in Java, or Quartz, or...) that runs any code, not necessarily a batch job or a FAAS function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the key question remaining for this RFC.
I believe Apache Airflow is similar to Spring Batch in this regard. I guess there is a distinction between "Batch/Scheduled Jobs" and "Queued Jobs". |
Co-authored-by: Francis Bogsanyi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: #1582 (comment)
How to decide when use messaging convention and when this new job? E.g. Kafka consumer listening a topic and then doing some job, is it messaging or job? Why?
A process which listens/consumes from Kafka and then performs a job synchronously is probably traced as a
Messaging consumer
because it's Kafka listening library will be instrumented. Theoretically you could also add in aJob
span but it would be redundant with the message consumer.
I also think this is an important question. We probably want to avoid cases where the sender uses messaging and the receiver job spans or vice versa. After looking at attributes I think that we should remove these:
- job.queue_name: messaging.destination
- net.peer.*: Also referenced by messaging.
If a job system uses queuing with a separate sender/producer of the jobs, messaging semantic conventions should be used additionally if further description of the queuing is desired.
Important editorial note: Please generate a markdown table in a new file under specification/trace/semantic_conventions/
and add a link in the README in that directory. Alternatively, you can add it to the same markdown file as messaging and change the title.
Adding a markdown file also gives you room for additional general explanation text.
@Oberon00 I disagree that Job systems that have "queues" should use Messaging semantics; that's not a good litmus test. "Queue" in the context of a Jobs is usually an optional feature. For example, Ruby's Delayed Job will work all jobs in FIFO order, but it optionally allows you to set named "queues" and then assign worker resources to specific queues. Unlike a Messaging system where the queue/topic is the primary feature, in a Job system the queue is an optional/secondary attribute. A better litmus test is, "does work happen based on the contents/payload of the message". Message system delivers the payload without looking inside; its job is simply to forward the message. A Job system does work based on the payload; a successfully worked Job is usually the end of the chain (unless a new/different message is generated as a result, e.g. an email sending job.) In addition, |
This comment has been minimized.
This comment has been minimized.
Ping so that this doesn't become stale, this looks like a useful addition |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
We can't merge this without more reviews, and I believe there is more work to do. |
@johnnyshields I can help with the documentation if needed - is there a preferred way you'd like to collaborate? |
…hnnyshields/opentelemetry-specification into semantic-convention-trace-job
@ahayworth I've naively committed all my work in progress on the docs. If you can just fork my PR and clean everything up that will be fine. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Hi @johnnyshields , are you playing to continue this work in the future? |
I'd be glad if someone can help take over this work to get to the finish line. Unfortunately I don't have the bandwidth for this at the moment. |
This PR defines a semantic convention for "Job" traces.
A "Job" models a batch job or task, which can be either enqueued, scheduled, or run on-demand.
Examples of job-related systems are:
The spec for "Job" would be somewhere between "Messaging" and "FAAS". It would cover both the producer and consumer aspects of running jobs.
As noted above, OTEL instrumentation libraries are using the "Messaging" convention to represent Jobs today. However "Messaging", which is intended for Kafka, RabbitMQ, etc. is not a great fit:
FAAS is also not ideal, as it is typically for serverless providers such as AWS Lambda which the infrastructure running the job has been abstracted away, and the primary focus is on the "function".