Use CMEK with Dataproc Serverless

By default, Dataproc Serverless encrypts customer content at rest. Dataproc Serverless handles encryption for you without any additional actions on your part. This option is called Google default encryption.

If you want to control your encryption keys, then you can use customer-managed encryption keys (CMEKs) in Cloud KMS with CMEK-integrated services including Dataproc Serverless. Using Cloud KMS keys gives you control over their protection level, location, rotation schedule, usage and access permissions, and cryptographic boundaries. Using Cloud KMS also lets you track key usage, view audit logs, and control key life cycles. Instead of Google owning and managing the symmetric key encryption keys (KEKs) that protect your data, you control and manage these keys in Cloud KMS.

After you set up your resources with CMEKs, the experience of accessing your Dataproc Serverless resources is similar to using Google default encryption. For more information about your encryption options, see Customer-managed encryption keys (CMEK).

Use CMEK

Follow the steps in this section to use CMEK to encrypt data that Dataproc Serverless writes to persistent disk and to the Dataproc staging bucket.

  1. Create a key using the Cloud Key Management Service (Cloud KMS).

  2. Copy the resource name.

    Copy the resource name.
    The resource name is is constructed as follows:
    projects/PROJECT_ID/locations/REGION/keyRings/KEY_RING_NAME/cryptoKeys/KEY_NAME
    

  3. Enable the Compute Engine, Dataproc, and Cloud Storage Service Agent service accounts to use your key:

    1. See Protect resources by using Cloud KMS keys > Required Roles to assign the Cloud KMS CryptoKey Encrypter/Decrypter role to the Compute Engine Service Agent service account. If this service account is not listed on the IAM page in Google Cloud console, click Include Google-provided role grants to list it.
    2. Assign the Cloud KMS CryptoKey Encrypter/Decrypter role to the Dataproc Service Agent service account. You can use the Google Cloud CLI to assign the role:

       gcloud projects add-iam-policy-binding KMS_PROJECT_ID \
       --member serviceAccount:service-PROJECT_NUMBER@dataproc-accounts.iam.gserviceaccount.com \
       --role roles/cloudkms.cryptoKeyEncrypterDecrypter
      

      Replace the following:

      KMS_PROJECT_ID: the ID of your Google Cloud project that runs Cloud KMS. This project can also be the project that runs Dataproc resources.

      PROJECT_NUMBER: the project number (not the project ID) of your Google Cloud project that runs Dataproc resources.

    3. Enable the Cloud KMS API on the project that runs Dataproc Serverless resources.

    4. If the Dataproc Service Agent role is not attached to the Dataproc Service Agent service account, then add the serviceusage.services.use permission to the custom role attached to the Dataproc Service Agent service account. If the Dataproc Service Agent role is attached to the Dataproc Service Agent service account, you can skip this step.

    5. Follow the steps to add your key on the bucket.

  4. When you submit a batch workload:

    1. Specify your key in the Batch kmsKey parameter.
    2. Specify the name of your Cloud Storage bucket in the Batch stagingBucket parameter.