Skip to content

Updating google cloud nio to 0.107.0 #6042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Sep 11, 2019
Merged

Updating google cloud nio to 0.107.0 #6042

merged 9 commits into from
Sep 11, 2019

Conversation

lbergelson
Copy link
Member

@lbergelson
Copy link
Member Author

@SHuang-Broad Lets see if it passes tests...

@lbergelson
Copy link
Member Author

Hmn. This is failing with 403 unauthorized errors. Seems like something about authentication changed.

@SHuang-Broad
Copy link
Contributor

you mean tests?
I'm running the "failed" tool on an actual dataproc cluster right now. Should have a result in 10 minutes.

@lbergelson
Copy link
Member Author

Yeah, the cloud tests are all failing.

@SHuang-Broad
Copy link
Contributor

@lbergelson
Tested working.

#!/bin/bash

set -eu

echo "=============================="
export CRAM_BUCKET="gs://broad-dsde-methods-shuang/tmp/test/"
gsutil ls  -l  -h "${CRAM_BUCKET}"
echo "=============================="

# on Louis branch containing NIO 100
echo "=============================="
export GATK_DIR="/Users/shuang/GATK/gatk"
export CLUSTER_NAME="shuang-nio-100"
cd "${GATK_DIR}" && git pull && git checkout lb_update_nio && \
./gradlew clean installAll && \
cd scripts/sv/ && \
bash create_cluster.sh \
  "${GATK_DIR}" \
  broad-dsde-methods \
  "${CLUSTER_NAME}" \
  3h \
  120m \
  gs://broad-dsde-methods-sv/reference/GRCh38/ \
  "${CRAM_BUCKET}" \
  gs://broad-dsde-methods-shuang/init/default_init.sh
echo "=============================="

# on master (expected to fail)
echo "=============================="
export GATK_DIR="/Users/shuang/GATK/forks/GatkFork"
export CLUSTER_NAME="shuang-nio-81"
cd "${GATK_DIR}" && git checkout master && git fetch upstream && git rebase upstream/master && \
./gradlew clean installAll && \
cd scripts/sv/ && \
bash create_cluster.sh \
  "${GATK_DIR}" \
  broad-dsde-methods \
  "${CLUSTER_NAME}" \
  3h \
  120m \
  gs://broad-dsde-methods-sv/reference/GRCh38/ \
  "${CRAM_BUCKET}" \
  gs://broad-dsde-methods-shuang/init/default_init.sh
echo "=============================="

Copy link
Contributor

@SHuang-Broad SHuang-Broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
tested running fine on actual dataproc clusters.

@lbergelson
Copy link
Member Author

We're seeing a lot of failures of the form:

  com.google.cloud.storage.StorageException: 806222273987-uilktks3j6i7962rp0v7nusveer58497@developer.gserviceaccount.com does not have serviceusage.services.use access to project 685190392835.
        Caused by:
        shaded.cloud_nio.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
        {
          "code" : 403,
          "errors" : [ {
            "domain" : "global",
            "message" : "806222273987-uilktks3j6i7962rp0v7nusveer58497@developer.gserviceaccount.com does not have serviceusage.services.use access to project 685190392835.",
            "reason" : "forbidden"
          } ],
          "message" : "806222273987-uilktks3j6i7962rp0v7nusveer58497@developer.gserviceaccount.com does not have serviceusage.services.use access to project 685190392835."
        }

It looks like it now requires some new permission for the service accounts but our existing service account doesn't have that permission.

@lbergelson
Copy link
Member Author

@SHuang-Broad Thanks, that's good to know that it actually fixes the problem you're having... I think we need to investigate this error a bit more though before we can merge.

@SHuang-Broad
Copy link
Contributor

Ha? That's strange.
Is the project id 685190392835 dsde-methods?

@lbergelson
Copy link
Member Author

No... it's not. It doesn't seem to correspond to any broad project I can see.

@SHuang-Broad SHuang-Broad mentioned this pull request Jul 17, 2019
@SHuang-Broad
Copy link
Contributor

@lbergelson I did an experiment in #6046 with NIO 94, same errors.

@SHuang-Broad
Copy link
Contributor

@lbergelson

I've run a few more tests, and found the following table

NIO_VER	403_DENIED_PROJECT_ID
99	685190392835
98	685190392835
97	685190392835
96	539774316296
95	685190392835
94	685190392835
93	539774316296
92	685190392835
91	539774316296
90	539774316296
89	PASS
88	PASS
87	PASS
86	PASS
85	PASS

Googling both of the two mysterious project IDs, I landed onto tests by this Fiji project (here and here).
I parsed the test log and found the relevant part (for ID 685190392835)

"gce":{

    "instance":{
        "attributes":{
            "startup-script":"#!/usr/bin/env bash\necho poweroff | at now + 130 minutes\ncat > ~travis/.ssh/authorized_keys <<EOF\nssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDdzVnIEg2ribEvhEvFjR9IFPAkIVtQwZhlgUAHu1BgjBugFRiqg3eaPMOeOuIZBvzwoyotHIVp3XvAfivGyCW4Ke7+2cqlcX1L8kcmoWLm2fdLGlLr/lZnAjQtexMC76uLtR8udqWA0e2sqrSJs4H/blOQmHWPrl/VSG7daoVptzqXihRmXN+/Huo7mTxAjTUEjk4IOBn7sv7G5qLrEPv78AJIZhWHdhUTGLvx+YpzQvX8pE53TMi9W4ovkZTCwhSO3WYyBOY7H1xjeYb9XWTeP563Du1b0JMpQgtFLQUVXio9NzXZE55ovvGDRSLds+VfPsv4G/Whhq76dEZ+wZO3\n\nEOF\n"
        },
        "cpuPlatform":"Intel Haswell",
        "description":"Travis CI python test VM",
        "disks":[{"deviceName":"persistent-disk-0","index":0,"mode":"READ_WRITE","type":"PERSISTENT"}],
        "hostname":"testing-gce-ec8614d2-40a2-4138-801e-d42d811590a2.c.travis-ci-prod-2.internal",
        "id":8221730359445041428,
        "image":"",
        "licenses":[{"id":"1000010"}],
        "machineType":"projects/685190392835/machineTypes/n1-standard-2",
        "maintenanceEvent":"NONE",
        "networkInterfaces":[{"accessConfigs":[{"externalIp":"104.198.203.242","type":"ONE_TO_ONE_NAT"}],"forwardedIps":[],"ip":"10.128.0.163","network":"projects/685190392835/networks/default"}],
        "scheduling":{"automaticRestart":"TRUE","onHostMaintenance":"MIGRATE","preemptible":"FALSE"},
        "serviceAccounts":{
            "[email protected]":{
                "aliases":["default"],
                "email":"[email protected]",
                "scopes":["https://www.googleapis.com/auth/userinfo.email",
                          "https://www.googleapis.com/auth/devstorage.full_control",
                          "https://www.googleapis.com/auth/compute"]
            },
            "default":{
                "aliases":["default"],
                "email":"[email protected]",
                "scopes":["https://www.googleapis.com/auth/userinfo.email",
                          "https://www.googleapis.com/auth/devstorage.full_control",
                          "https://www.googleapis.com/auth/compute"]}
            },
        "tags":["testing"],
        "virtualClock":{"driftToken":"11704388862566216373"},
        "zone":"projects/685190392835/zones/us-central1-b"
    },

    "project":{
        "attributes":{
            "sshKeys":"henrik:ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQChY0pdGXohYN7KRnQa3VIcDoVBrxZVHkhOFc1SROV2T+gTOunYbOW5C4V1P2MGG6FcKeoQTJzXgPbZurM5l1AfEbKeCde778QyyxbcjpYvKyY5b4qVO79nOKAg1qHIqUl+2txv7X6tPv4Q99T7UBechuc5awnkJZKqP1s1qJ9BYYYAPukZPbhAkjkvPSaJfIi+py2p6L9mXFtrAhYNH1flE9GErAsf2Hq8zQvx4hmTseumv4Fb9rVogcBJOqhmDQmYwTg2rEbdLAjbqY7Sf4kjdOfF7uhwasZgVMjF1z5utnvHd2wC/cjkuDZB4UhetLTeOWDtvgZxF/uVJSTU2AGD google-ssh {\"userName\":\"[email protected]\",\"expireOn\":\"2016-03-04T00:52:57+0000\"}\nhenrik:ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBJ6IOlU4vY6QLWKOX52Opcdx/2zNgJyMq7ntIf8qD+CbMMfUy5C6WJnjn4E2lvYqYaVIotY196cVazh0Jj8E/co= google-ssh {\"userName\":\"[email protected]\",\"expireOn\":\"2016-03-04T00:52:42+0000\"}\n"
        },
        "numericProjectId":685190392835,
        "projectId":"travis-ci-prod-2"
    }
},


In the meantime, I've also found this site for ID 539774316296.

Not sure if these info are useful for you guys (who know much more about this than I do) to debug the issue.

@codecov
Copy link

codecov bot commented Jul 25, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@8d88f6e). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             master     #6042   +/-   ##
==========================================
  Coverage          ?   87.013%           
  Complexity        ?     32636           
==========================================
  Files             ?      2011           
  Lines             ?    150967           
  Branches          ?     16134           
==========================================
  Hits              ?    131361           
  Misses            ?     14021           
  Partials          ?      5585

@lbergelson
Copy link
Member Author

I finally opened an issue with google cloud to look into this... https://github.com/googleapis/google-cloud-java/issues/5884

@droazen droazen self-requested a review August 29, 2019 18:38
@droazen droazen self-assigned this Aug 29, 2019
build.gradle Outdated
@@ -69,7 +69,7 @@ final testNGVersion = '6.11'
// Using the shaded version to avoid conflicts between its protobuf dependency
// and that of Hadoop/Spark (either the one we reference explicitly, or the one
// provided by dataproc).
final googleCloudNioDependency = 'com.google.cloud:google-cloud-nio:0.81.0-alpha:shaded'
final googleCloudNioDependency = 'com.google.cloud:google-cloud-nio:0.106.0-alpha:shaded'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.107.0 is now out, so might as well update to that. 0.105.0 included the shading fix we needed to add a BigQuery dependency in master.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that happened in the last day I guess... will update. We're going to need a picard update before this will work though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we have a nasty bug where we are using the shaded version but also including the non-shaded versions transitive dependencies. Resolving that might help with bigquery too.

@droazen droazen changed the title Updating google cloud nio to 0.100.0 Updating google cloud nio to 0.106.0 Aug 29, 2019
@droazen droazen changed the title Updating google cloud nio to 0.106.0 Updating google cloud nio to 0.107.0 Aug 30, 2019
@droazen droazen assigned lbergelson and unassigned droazen and SHuang-Broad Aug 30, 2019
@droazen
Copy link
Contributor

droazen commented Sep 3, 2019

@lbergelson As discussed, it's not essential that we update to 107. I'd be happy with either 105 or 106, since the shading fix was released in 105.

build.gradle Outdated
@@ -59,7 +59,7 @@ repositories {

final requiredJavaVersion = "8"
final htsjdkVersion = System.getProperty('htsjdk.version','2.20.3')
final picardVersion = System.getProperty('picard.version','2.20.5')
final picardVersion = System.getProperty('picard.version','2.20.6-5-g1b4178f-SNAPSHOT')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to actual released version of Picard

@lbergelson lbergelson merged commit 78bf002 into master Sep 11, 2019
@lbergelson lbergelson deleted the lb_update_nio branch September 11, 2019 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Strange behavior in ParallelCopyGCSDirectoryIntoHDFSSpark with NIO ver > 0.66
3 participants