Add google_dataproc_job resource #253

nickithewatt · 2017-07-26T23:59:25Z

@danawillow As requested ...
This is PR 2 of 2 (splitting #231 up) and looking to address #31 (Add support for Google Cloud Dataproc) specifically adding the google_dataproc_job resource.

!! NOTE: This PR is dependent on (rebased on) PR #252 !!

To Recap:
The jobs are pretty much fleshed out. There is one google_dataproc_job resource with different xxx_config blocks for the different job types

google_dataproc_job
- via pyspark_config support for PySpark jobs
- via spark_config support for Spark jobs
- via sparksql_config support for Spark-SQL jobs
- via hadoop_config support for Hadoop jobs
- via hive_config support for Hive jobs
- via pig_config support for Pig jobs

Specifically for google_dataproc_job, create essentially submits a job to the cluster to let it run. It does not wait for it to finish. Updating of jobs doesn't really make much sense, and for delete I am genuinely deleting the job. Under normal circumstances, Dataproc won't let you delete active jobs. I have therefore added a force_destroy option which if true, will first cancel the job before deleting.

danawillow

Thanks again for helping out with dataproc, @nickithewatt!

danawillow · 2017-08-10T23:27:09Z

google/resource_dataproc_job.go

+		},
+
+		Schema: map[string]*schema.Schema{
+


Can you also add https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs#JobReference?

(for this particular case I don't mind you top-leveling project so we can easily use the getProject helper)

danawillow · 2017-08-10T23:27:38Z

google/resource_dataproc_job.go

+
+		Schema: map[string]*schema.Schema{
+
+			"cluster": {


Do you think people might want access to clusterUuid also? If so let's just have a block for https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs#JobPlacement

danawillow · 2017-08-10T23:30:29Z

google/resource_dataproc_job.go

+				Elem:     &schema.Schema{Type: schema.TypeString},
+			},
+
+			"status": {


Do you think it would also make sense to include the rest of the status fields in https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs#JobStatus?

danawillow · 2017-08-10T23:46:39Z

google/resource_dataproc_job.go

+				Computed: true,
+			},
+
+			"outputUri": {


Want to just call this driver_output_resource_uri to match the API in case another type of output URI comes along in the future?

danawillow · 2017-08-10T23:47:40Z

google/resource_dataproc_job.go

+				Optional: true,
+				Computed: true,
+			},
+


Not sure if it's super necessary but it probably would be very little effort to also add driver_control_files_uri

danawillow · 2017-08-11T00:04:50Z

google/resource_dataproc_job.go

+	}
+}
+
+func getPySparkJob(config map[string]interface{}) *dataproc.PySparkJob {


usually we call these functions expand[object], so this would be called expandPySparkJob

danawillow · 2017-08-11T00:05:25Z

google/resource_dataproc_job.go

+
+// ---- Spark Job ----
+
+func sparkTFSchema() *schema.Schema {


same comments as above apply here and for the rest of the types of jobs

danawillow · 2017-08-11T00:06:41Z

google/resource_dataproc_job.go

+					Type:          schema.TypeString,
+					Optional:      true,
+					ForceNew:      true,
+					ConflictsWith: []string{"hadoop_config.main_jar"},


because of the way lists are represented in state, this should (I believe) be hadoop_config.0.main_jar

danawillow · 2017-08-11T00:11:55Z

website/docs/r/dataproc_job.html.markdown

+Manages a job resource within a Dataproc cluster within GCE. For more information see
+[the official dataproc documentation](https://cloud.google.com/dataproc/).
+
+!> **Note:** This resource does not really support 'update' functionality. Once created


can you be more definitive in these statements? This resource does not support 'update' and changing any attributes will cause the creation

danawillow · 2017-08-11T00:13:48Z

website/docs/r/dataproc_job.html.markdown

+
+* `labels` - (Optional) The list of labels (key/value pairs) to add to the job.
+
+The **pyspark_config** supports:


The syntax we've been standardizing on for these would be:

The `pyspark_config` block supports:

nickithewatt · 2017-08-11T16:05:44Z

Thanks, aiming to try and get round to these changes some time in the next few days.

danawillow · 2017-11-01T23:32:29Z

Hey @nickithewatt, think you'll have time for this one? I'm happy to take it over, I understand how much of a pain I was during the other review (though I'm quite confident this one will be less painful :) )

nickithewatt · 2017-11-02T12:38:38Z

Hi @danawillow, no worries, appreciate the time taken by you too to review from your side :) Happy to try and get this one going again.

nickithewatt · 2017-11-04T07:31:54Z

@danawillow have a few other items on my plate but hope to be able get to this in just over a week or so

nickithewatt · 2017-11-14T00:13:51Z

Hi @danawillow, have also done the changes for dataproc_job now as well. Thanks

nickithewatt · 2017-11-14T12:09:21Z

@danawillow Do you want me to squash and rebase?

danawillow · 2017-11-14T17:53:36Z

Only if you want- I usually squash PRs when they get merged anyway, so it doesn't make a difference to me. Thanks @nickithewatt! I'll make sure to get to this this week.

danawillow

Thanks @nickithewatt! Looks good overall, just a few things I had comments on.

danawillow · 2017-11-15T19:16:28Z

google/provider_test.go

@@ -84,6 +85,21 @@ func TestProvider_getRegionFromZone(t *testing.T) {
 	}
 }

+func TestConvertStringMap(t *testing.T) {


this should probably be moved to utils_test.go

danawillow · 2017-11-15T19:22:58Z

google/resource_dataproc_job.go

+							Type:        schema.TypeInt,
+							Description: "Maximum number of times per hour a driver may be restarted as a result of driver terminating with non-zero code before job is reported failed.",
+							Optional:    true,
+							ForceNew:    true,


Let's add a validation.IntAtMost(10) for this

danawillow · 2017-11-15T19:26:22Z

google/resource_dataproc_job.go

+
+			"placement": {
+				Type:     schema.TypeList,
+				Optional: true,


Is this actually optional?

no, changed ...

danawillow · 2017-11-15T19:29:49Z

google/resource_dataproc_job.go

+				Type:     schema.TypeBool,
+				Default:  false,
+				Optional: true,
+				ForceNew: true,


I wonder whether it makes sense to make this updatable, in case the user doesn't realize the attribute exists (or forgets to set it) and then later realizes they want to force delete the job. What do you think?

fair enough, added

danawillow · 2017-11-15T19:37:49Z

google/resource_dataproc_job.go

+	},
+}
+
+func flattenJobReference(r *dataproc.JobReference) []map[string]interface{} {


Let's move all the functions that apply to all dataproc jobs into one area- these seem to have found themselves in the middle of the PySpark section.

danawillow · 2017-11-15T19:46:22Z

google/resource_dataproc_job_test.go

+
+		jobCompleteTimeoutMins := 3
+		waitErr := dataprocJobOperationWait(config, region, project, job.Reference.JobId,
+			"Awaiting Dataproc job completion of failure", jobCompleteTimeoutMins, 1)


should that be "completion or failure"?

danawillow · 2017-11-15T19:47:27Z

google/resource_dataproc_job_test.go

+		config := testAccProvider.Meta().(*Config)
+		jobId := s.RootModule().Resources[n].Primary.ID
+		found, err := config.clientDataproc.Projects.Regions.Jobs.Get(
+			config.Project, rs.Primary.Attributes["region"], jobId).Do()


any reason to use config.Project here but getTestProject above? I'm fine with either one, but may as well be consistent

danawillow · 2017-11-15T19:50:17Z

website/docs/r/dataproc_job.html.markdown

+Manages a job resource within a Dataproc cluster within GCE. For more information see
+[the official dataproc documentation](https://cloud.google.com/dataproc/).
+
+!> **Note:** This resource does not support 'update' and changing any attributes will cause the 


how does "changing any attributes will cause the resource to be recreated" sound?

danawillow · 2017-11-15T19:52:28Z

website/docs/r/dataproc_job.html.markdown

+In addition to the arguments listed above, the following computed attributes are
+exported:
+
+* `reference.cluster_uuid` - A cluster UUID generated by the Cloud Dataproc service when the job is submitted.


Because of how nested objects work, this is actually reference.0.cluster_uuid (likewise for status)

danawillow · 2017-11-15T19:58:30Z

google/resource_dataproc_job.go

+				MaxItems: 1,
+				Elem: &schema.Resource{
+					Schema: map[string]*schema.Schema{
+						"job_id": {


I'm not going to block on this, but I'd love to see a ValidateFunc here (I think the regex you want is ^[a-zA-Z0-9_-]{1,100}$)

nickithewatt · 2017-11-16T17:34:10Z

@danawillow changes made, think I have addressed them all. Did a squash and rebase as well.

danawillow · 2017-11-21T17:32:38Z

Looks good @nickithewatt! I pushed a few small changes to your branch so we didn't have to do another back-and-forth for small things, hope that's all right with you! Merging now.

nickithewatt · 2017-11-22T20:30:45Z

Thanks @danawillow

* Add google_dataproc_job resource * Correct state ref in docs * make tests parallel * cleanup, mostly whitespace related * docs fmt

@chrisst

/cc @chrisst

ghost · 2020-03-30T14:12:22Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

nickithewatt mentioned this pull request Jul 27, 2017

(OLD) Google Cloud Dataproc support #231

Closed

8 tasks

nickithewatt force-pushed the dataproc_job branch 2 times, most recently from 2d632d2 to bb3f879 Compare August 3, 2017 22:30

danawillow suggested changes Aug 11, 2017

View reviewed changes

danawillow self-assigned this Aug 17, 2017

nickithewatt mentioned this pull request Aug 21, 2017

Add google_dataproc_cluster resource #252

Merged

radeksimko added enhancement new-resource labels Oct 8, 2017

nickithewatt closed this Nov 4, 2017

nickithewatt reopened this Nov 4, 2017

paddycarver added the waiting-response label Nov 7, 2017

danawillow removed the waiting-response label Nov 15, 2017

danawillow suggested changes Nov 15, 2017

View reviewed changes

Add google_dataproc_job resource

578952b

nickithewatt force-pushed the dataproc_job branch from 9133cb3 to 578952b Compare November 16, 2017 17:26

Correct state ref in docs

4aaa4f5

danawillow added 3 commits November 17, 2017 15:47

make tests parallel

fa67dad

cleanup, mostly whitespace related

4b3d63f

docs fmt

b3f5d49

danawillow approved these changes Nov 21, 2017

View reviewed changes

danawillow merged commit 68f7d77 into hashicorp:master Nov 21, 2017

luis-silva pushed a commit to luis-silva/terraform-provider-google that referenced this pull request May 21, 2019

Add missing stackdriver group to website sidebar (hashicorp#253)

d3c0e80

 /cc @chrisst

ghost locked and limited conversation to collaborators Mar 30, 2020


		* `labels` - (Optional) The list of labels (key/value pairs) to add to the job.

		The pyspark_config supports:

Add google_dataproc_job resource #253

Add google_dataproc_job resource #253

Conversation

nickithewatt commented Jul 26, 2017 • edited Loading

danawillow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nickithewatt commented Aug 11, 2017 via email

danawillow commented Nov 1, 2017

nickithewatt commented Nov 2, 2017

nickithewatt commented Nov 4, 2017

nickithewatt commented Nov 14, 2017

nickithewatt commented Nov 14, 2017

danawillow commented Nov 14, 2017

danawillow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nickithewatt commented Nov 16, 2017

danawillow commented Nov 21, 2017

nickithewatt commented Nov 22, 2017

ghost commented Mar 30, 2020

nickithewatt commented Jul 26, 2017 •

edited

Loading