Description
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
- Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
- If an issue is assigned to the
modular-magician
user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned tohashibot
, a community member has claimed the issue already.
Terraform Version
$ terraform -v
Terraform v0.13.4
+ provider registry.terraform.io/hashicorp/google v3.59.0
+ provider registry.terraform.io/hashicorp/google-beta v3.59.0
Affected Resource(s)
- google_dataflow_job
Terraform Configuration Files
resource "google_pubsub_topic" "test-topic" {
name = "test-metrics-sink"
}
resource "google_pubsub_subscription" "test-sub" {
name = "test-metrics-source"
topic = "projects/myproject/topics/sourcetopic" // this should be a topic with some traffic
expiration_policy { ttl = "86400s" }
message_retention_duration = "600s"
}
resource "google_dataflow_job" "test_job" {
name = "test-ps2ps-tf"
template_gcs_path = "gs://dataflow-templates/2021-02-15-00_RC00/Cloud_PubSub_to_Cloud_PubSub"
temp_gcs_location = "gs://mybucket/temp"
zone = "us-east1-b"
max_workers = 2
machine_type = "n1-standard-2"
on_delete = "drain"
additional_experiments = [
"enable_windmill_service",
"enable_streaming_engine",
]
labels = {
# These labels get auto-magically set in dataflow when it detects you're using a template that
# the gcloud team wrote. If you don't manually specify them then terraform thinks you've
# removed them and redeploys the job every time you apply regardless if you changed anything.
goog-dataflow-provided-template-name = "cloud_pubsub_to_cloud_pubsub"
goog-dataflow-provided-template-version = "2021-02-15-00_rc00"
}
parameters = {
inputSubscription = google_pubsub_subscription.test-sub.id
outputTopic = google_pubsub_topic.test-topic.id
}
}
Debug Output
https://gist.github.com/n-oden/d5fd36c7b54fb68a50afce095a9a591b
Expected Behavior
Terraform should launch a job using the google pubsub-to-pubsub template, and the streaming engine feature should be enabled for the job.
It's not so much that terraform is misbehaving per se here -- the API request it makes to dataflow.googleapis.com is correct per the manifest above. The problem is that there is no support for setting an important boolean in the JSON document that gets posted to /v1b3/projects/myproject/locations/us-east1/templates
. Read on below:
Actual Behavior
The job created by terraform does not have streaming engine enabled, and worse yet does not actually process any data.
The issue here appears to be that streaming engine is no longer enableable via the additional_experiments
list: there is now a first-class configuration option in the environment
section of the json document that is posted to google to create a new job.
If you create a dataflow job using a google-provided template with the gcloud
cli tool, the --enable-streaming-engine
flag will cause a key to be added to the environment
object in the POST data.
There is no way to do this presently with terraform: there is no enable_streaming_engine
argument for a google_dataflow_job
resource, and passing enable_streaming_engine
as a string inside the additional_experiments
block as previously noted produces a broken job.
Steps to Reproduce
terraform apply
To see what should happen, you can use the gcloud
cli tool:
gcloud --log-http dataflow jobs run test-ps2ps \
--enable-streaming-engine \
--gcs-location gs://dataflow-templates/latest/Cloud_PubSub_to_Cloud_PubSub \
--parameters=inputSubscription=projects/myproject/subscriptions/test-metrics-source,outputTopic=projects/myproject/topics/test-metrics-sink \
--staging-location=gs://mybucket/staging/
You'll see in the log-http output that the cli makes the following API call:
==== request start ====
uri: https://dataflow.googleapis.com/v1b3/projects/myproject/locations/us-central1/templates?alt=json
method: POST
== headers start ==
accept: application/json
accept-encoding: gzip, deflate
authorization: --- Token Redacted ---
content-length: 385
content-type: application/json
== headers end ==
== body start ==
{
"environment": {
"enableStreamingEngine": true,
"tempLocation": "gs://mybucket/staging/"
},
"gcsPath": "gs://dataflow-templates/latest/Cloud_PubSub_to_Cloud_PubSub",
"jobName": "test-ps2ps2",
"location": "us-central1",
"parameters": {
"inputSubscription": "projects/myproject/subscriptions/test-metrics-source",
"outputTopic": "projects/myproject/topics/test-metrics-sink"
}
}
== body end ==
Important Factoids
To my intense aggravation, the enableStreamingEngine
key is not documented in google's official docs for the environment
object: https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs#environment but the gcloud tool is absolutely using it. :(