Skip to content

Trigger without a broker does not correctly reflect ready status #2996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matzew opened this issue Apr 15, 2020 · 16 comments · Fixed by #3144
Closed

Trigger without a broker does not correctly reflect ready status #2996

matzew opened this issue Apr 15, 2020 · 16 comments · Fixed by #3144
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@matzew
Copy link
Member

matzew commented Apr 15, 2020

Describe the bug

apply a trigger with a missing broker, the READY and REASON status fields have no info

Expected behavior

It should be stated that the trigger is not ready and give a reason

To Reproduce

apply a trigger, but have no broker

Knative release version

  • 0.13
  • 0.14
  • HEAD

Additional context

From the controller log:

{
  "level": "error",
  "ts": "2020-04-15T14:28:48.706Z",
  "logger": "broker-controller",
  "caller": "broker/reconciler.go:144",
  "msg": "resource \"default/default\" no longer exists",
  "commit": "02bc516",
  "knative.dev/traceid": "82fc04a2-31dc-476f-a3cf-abb1613925b1",
  "knative.dev/key": "default/default",
  "stacktrace": "knative.dev/eventing/pkg/client/injection/reconciler/eventing/v1alpha1/broker.(*reconcilerImpl).Reconcile\n\tknative.dev/eventing/pkg/client/injection/reconciler/eventing/v1alpha1/broker/reconciler.go:144\nknative.dev/eventing/vendor/knative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/eventing/vendor/knative.dev/pkg/controller/controller.go:394\nknative.dev/eventing/vendor/knative.dev/pkg/controller.(*Impl).Run.func2\n\tknative.dev/eventing/vendor/knative.dev/pkg/controller/controller.go:343"
}

and the output of the trigger:

k get trigger
NAME                  READY   REASON   BROKER    SUBSCRIBER_URI   AGE
testevents-trigger0                    default                    2m6s

Also, there is no status on the trigger...

@matzew matzew added the kind/bug Categorizes issue or PR as related to a bug. label Apr 15, 2020
@grantr
Copy link
Contributor

grantr commented Apr 15, 2020

This happens because triggers are not reconciled when their broker doesn't exist.

One possible solution is to alter the reconciler so that instead of immediately returning when encountering a broker key that doesn't exist, it instantiates a new broker and reconciles anyway. The reconciler would need to be aware that if it receives a broker that has not been created yet, it should only reconcile the broker's triggers and not update the broker's status. The code handling this is generated by genreconciler so it would need to be overridden.

I think @n3wscott also had thoughts about this

@vaikas vaikas added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 20, 2020
@vaikas vaikas added this to the v0.15.0 milestone Apr 20, 2020
@grantr
Copy link
Contributor

grantr commented Apr 27, 2020

One possible solution is to alter the reconciler so that instead of immediately returning when encountering a broker key that doesn't exist, it instantiates a new broker and reconciles anyway.

It's actually more complicated than I originally thought. Triggers created before their brokers exist cannot be reconciled by any broker-classed reconciler because they aren't classified on creation. They only get a broker class indirectly by being related to an existing classified broker. In the quote above, no broker-classed reconciler can correctly be "the reconciler" that reconciles the trigger with an unsaved broker.

@vaikas vaikas self-assigned this May 4, 2020
@markfisher
Copy link
Contributor

markfisher commented May 12, 2020

I stumbled upon this one also. The trigger reconciler currently checks for the existence of a Trigger's Broker, so in the case that Broker does not exist, it should also be setting the BrokerReady status to "False" with a "BrokerDoesNotExist" reason. It will not interfere with other Broker reconcilers since those will not be managing it until the Broker does exist.

@vaikas
Copy link
Contributor

vaikas commented May 12, 2020

yep, on my todo list, hoping to get this banged out today or tmw

@vaikas
Copy link
Contributor

vaikas commented May 12, 2020

Ho hum. As discussed a couple of weeks ago, we were going to look at defaulting the status field, so started to play with that. I'm either screwing something up (most likely), or you can't update the status in the defaulter. Anyways, after trying to add that logic in the defaulter, I see the patch bytes being properly set, but they are not making it into the storage. You can see it's defaulting the spec.broker and that works. The status.observedgeneration is just a test in case I was screwing up the structs somehow. Anyways, just jotting this down here.

[
  {
    "op": "add",
    "path": "/status",
    "value": {}
  },
  {
    "op": "add",
    "path": "/spec/broker",
    "value": "default"
  },
  {
    "op": "add",
    "path": "/spec/filter",
    "value": {}
  },
  {
    "op": "add",
    "path": "/spec/subscriber/ref/namespace",
    "value": "triggertest"
  },
  {
    "op": "add",
    "path": "/status/observedGeneration",
    "value": 99999
  },
  {
    "op": "add",
    "path": "/status/conditions",
    "value": [
      {
        "lastTransitionTime": "2020-05-12T21:34:44Z",
        "message": "Broker default does not exist or there is no matching BrokerClass for it",
        "reason": "BrokerDoesNotExist",
        "status": "False",
        "type": "BrokerReady"
      },
      {
        "lastTransitionTime": "2020-05-12T21:34:44Z",
        "message": "Broker default does not exist or there is no matching BrokerClass for it",
        "reason": "BrokerDoesNotExist",
        "status": "False",
        "type": "Ready"
      }
    ]
  },
  {
    "op": "add",
    "path": "/metadata/labels",
    "value": {
      "eventing.knative.dev/broker": "default"
    }
  },
  {
    "op": "add",
    "path": "/metadata/annotations/eventing.knative.dev~1creator",
    "value": "[email protected]"
  },
  {
    "op": "add",
    "path": "/metadata/annotations/eventing.knative.dev~1lastModifier",
    "value": "[email protected]"
  }
]```

@scothis
Copy link
Contributor

scothis commented May 12, 2020

@vaikas even if you get the defaulter working, writing an inaccurate status condition is a bad idea. Even if the status lives for only a few milliseconds until the reconciler sets the correct status, other tools watching the resource will see the false status and be confused.

A common pattern we've used in CLIs is to create a resource and then watch for its Ready condition to become True or False. If we see a the condition flip false, we report the error to the user. With this change, the trigger creation would always report as failed.

@vaikas
Copy link
Contributor

vaikas commented May 12, 2020

Well, we could set the status to unknown if that would make it better.

@scothis
Copy link
Contributor

scothis commented May 12, 2020

Well, we could set the status to unknown if that would make it better.

That resolves my objection, but doesn't do anything in the context of this issue

@vaikas
Copy link
Contributor

vaikas commented May 12, 2020

I think the fundamental issue is that the user gets no feedback, and setting the status to Unknown with the message saying why that might not be the case seems to do something about this issue?
But I'm all open for better suggestions.

@markfisher
Copy link
Contributor

right here it is valid to set the Trigger's BrokerReady status to "False": https://github.com/knative/eventing/blob/master/pkg/reconciler/trigger/trigger.go#L55

why not do it there?

@vaikas
Copy link
Contributor

vaikas commented May 12, 2020

We talked about that (having a 'catch-all' reconciler for Trigger resources) and IIRC we tried this in the past with some other resource, maybe channels? where there was a controller that would modify a resource that it was not responsible for and it lead to race conditions and hilarity. @Harwayne you recall the deets here?
So, in this case, the reconciler does not actually modify the resource but patches the label on the namespace, changing it to modify triggers directly mosdef "should" work, but again, when we discussed this we decided against that because of previous problems with this approach.
If we set it to false where you say, it still has the problem with @scothis because the trigger will actually reconcile since the default broker gets created (yeah, another issue about default brokers).

@grantr
Copy link
Contributor

grantr commented May 12, 2020

I think it makes sense to use Unknown in the webhook with the message proposed above or something similar. That gives the user some idea of what the current state of the trigger is (waiting for a broker to show up).

@scothis I think it mostly resolves this issue because it gives the user feedback about what would help the trigger progress to a ready state. As far as whether the trigger's status is more accurately false or unknown, we've had similar discussions before in other contexts. It seems to boil down to: is waiting for another object to exist a terminal state or an unknown state? If the object is never created, the status is effectively terminal but there's no way for the reconciler to know that. I haven't seen an answer yet that satisfies all stakeholders.

@markfisher
Copy link
Contributor

If the catch-all Trigger reconciler is only taking responsibility when there is no Broker, then it shouldn't be competing with anything, since based on the current design of relying on the broker's annotation, the Trigger will only be managed by that Broker's reconciler once the Broker itself is available and the associated reconciler then takes over control with no overlap.

As far as creating brokers by default, I've already explained why I'm not in favor here: #2970 (comment)

@grantr
Copy link
Contributor

grantr commented May 12, 2020

We talked about that (having a 'catch-all' reconciler for Trigger resources) and IIRC we tried this in the past with some other resource, maybe channels? where there was a controller that would modify a resource that it was not responsible for and it lead to race conditions and hilarity.

There was a controller that would update a channel's status if it didn't get a provisioner after some time. Might be worth revisiting since I think we have more sophisticated ways to patch an object's status now (there was recent conversation about this somewhere). If we're using it to flip from unknown to false, we still need to decide at least part of the "when is non-existence terminal" question above.

@vaikas
Copy link
Contributor

vaikas commented May 13, 2020

So, it's impossible to set the Status in defaulting webhook today, live and learn ✅

Ok, so we have couple of options here for where to set this (decouple to what we set it to):

  1. With V1 CRDs, we can set this (not dynamically, but for this use case I reckon it's fine) to be upon creation to a known good state. say, unknown, no broker
  2. We can set the status in the catch-all, which should be eventually consistent with some hiccups (races?)

So, I'll change track and set the Condition to Unknown with the message like I had before, sound reasonable?
And @markfisher yes, only taking the responsibility when there's no broker is just a bit racy, but should not be trashy.

@vaikas
Copy link
Contributor

vaikas commented May 13, 2020

@markfisher yeah, knew there was another issue tracking that, so didn't want to lump that into here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants