Use BigLake metastore with the Iceberg REST catalog

The managed Apache Iceberg REST catalog in BigLake metastore creates interoperability between all your query engines by offering a single source of truth for all your Iceberg data. It lets query engines, such as Apache Spark, discover, read metadata from, and manage Iceberg tables in a consistent way.

The Iceberg tables that you use with the Iceberg REST catalog are called BigLake tables for Apache Iceberg (preview). These are Iceberg tables that you create from open source engines and store in Cloud Storage. They can be read by open source engines or BigQuery. Writes are only supported from open source engines. In this document, we refer to these tables as BigLake Iceberg tables.

Before you begin

  1. Verify that billing is enabled for your Google Cloud project.

    Learn how to check if billing is enabled on a project.
  2. Enable the BigLake API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

    Enable the API

  3. Optional: Ask an administrator to set up credential vending for the first time.
  4. Optional: Understand how BigLake metastore works and why you should use it.

Required roles

To get the permissions that you need to use the Iceberg REST catalog in BigLake metastore, ask your administrator to grant you the following IAM roles:

  • Perform administrative tasks, such as managing catalog user access, storage access, and the catalog's credential mode:
  • Read table data in credential vending mode: BigLake Viewer (roles/biglake.viewer) on the project
  • Write table data in credential vending mode: BigLake Editor (roles/biglake.editor) on the project
  • Read catalog resources and table data in non-credential vending mode:
  • Manage catalog resources and write table data in non-credential vending mode:

For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Set up credential vending mode

Credential vending mode is a storage access delegation mechanism that allows BigLake metastore administrators to control permissions directly on BigLake metastore resources, eliminating the need for catalog users to have direct access to Cloud Storage buckets. It lets BigLake administrators give users permissions on specific data files.

A catalog administrator enables credential vending on the Iceberg REST catalog client.

As a catalog user, you can then instruct the Iceberg REST catalog to return downscoped storage credentials by specifying the access delegation, which is part of the Iceberg REST Catalog API specification. For more information, see Configure a query engine with the Iceberg REST catalog.

To initialize the catalog and enable credential vending mode, follow these steps.

  1. Initialize the catalog with the following command:

    curl -H "x-goog-user-project: PROJECT_ID" -H "Accept: application/json" -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" https://biglake.googleapis.com/iceberg/v1/restcatalog/v1/config?warehouse=gs://CLOUD_STORAGE_BUCKET_NAME

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • CLOUD_STORAGE_BUCKET_NAME: the name of the Cloud Storage bucket that stores the Iceberg table.

    The output of the curl command is similar to the following. The catalog prefix value can be found in the overrides.prefix field in the response:

    {
      "overrides": {
        "catalog_credential_mode": "CREDENTIAL_MODE_END_USER",
        "prefix": "projects/PROJECT_ID/catalogs/CLOUD_STORAGE_BUCKET_NAME"
      },
      "endpoints": [
        "GET /v1/{prefix}/namespaces",
        "POST /v1/{prefix}/namespaces",
        "GET /v1/{prefix}/namespaces/{namespace}",
        "HEAD /v1/{prefix}/namespaces/{namespace}",
        "DELETE /v1/{prefix}/namespaces/{namespace}",
        "POST /v1/{prefix}/namespaces/{namespace}/properties",
        "GET /v1/{prefix}/namespaces/{namespace}/tables",
        "POST /v1/{prefix}/namespaces/{namespace}/tables",
        "GET /v1/{prefix}/namespaces/{namespace}/tables/{table}",
        "HEAD /v1/{prefix}/namespaces/{namespace}/tables/{table}",
        "POST /v1/{prefix}/namespaces/{namespace}/tables/{table}",
        "DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table}"
      ]
    }
    
  2. Enable credential vending mode and extract the service account to give permissions to with the following command:

    curl -X PATCH -H "Content-Type: application/json" -H "x-goog-user-project: PROJECT_ID" -H "Accept: application/json" -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" https://biglake.googleapis.com/iceberg/v1/restcatalog/extensions/PREFIX?update_mask=credential_mode -d '{"credential_mode":"CREDENTIAL_MODE_VENDED_CREDENTIALS"}'

    Replace PREFIX with the prefix field from the previous command's output.

    The curl command output contains the service account, similar to the following:

    {
      "name": "projects/PROJECT_ID/catalogs/CLOUD_STORAGE_BUCKET_NAME",
      "credential_mode": "CREDENTIAL_MODE_VENDED_CREDENTIALS",
      "biglake-service-account": "BIGLAKE_SERVICE_ACCOUNT"
    }
    
  3. To ensure that the BigLake service account that you extracted in the previous step has the necessary permissions to use credential vending mode, ask your administrator to grant it the Storage Object User (roles/storage.objectUser) role on the storage bucket.

Limitations

The Iceberg REST catalog is subject to the following limitations:

  • Multi-region buckets, dual-region buckets, and buckets with custom region placement aren't supported.
  • When using credential vending mode, you must set the io-impl property to org.apache.iceberg.gcp.gcs.GCSFileIO. The default, org.apache.iceberg.hadoop.HadoopFileIO, isn't supported.

Configure the Iceberg REST catalog

Cluster

To use Spark with the Iceberg REST catalog on Dataproc, first create a cluster with the Iceberg component:

gcloud dataproc clusters create CLUSTER_NAME \
    --enable-component-gateway \
    --project=PROJECT_ID \
    --region=REGION \
    --optional-components=ICEBERG \
    --image-version=DATAPROC_VERSION

Replace the following:

  • CLUSTER_NAME: a name for your cluster.
  • PROJECT_ID: your Google Cloud project ID.
  • REGION: the region for the Dataproc cluster.
  • DATAPROC_VERSION: the Dataproc image version, for example 2.2.

After you create the cluster, configure your Spark session to use the Iceberg REST catalog:

import pyspark
from pyspark.context import SparkContext
from pyspark.sql import SparkSession

catalog_name = "CATALOG_NAME"
spark = SparkSession.builder.appName("APP_NAME") \
  .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \
  .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \
  .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1/restcatalog') \
  .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'gs://CLOUD_STORAGE_BUCKET_NAME') \
  .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \
  .config(f'spark.sql.catalog.{catalog_name}.rest.auth.type', 'org.apache.iceberg.gcp.auth.GoogleAuthManager') \
  .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.gcp.gcs.GCSFileIO') \
  .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \
  .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
  .config('spark.sql.defaultCatalog', 'CATALOG_NAME') \
  .getOrCreate()

Replace the following:

  • CATALOG_NAME: a name for your Iceberg REST catalog.
  • APP_NAME: a name for your Spark session.
  • CLOUD_STORAGE_BUCKET_NAME: the name of the Cloud Storage bucket that stores the BigLake Iceberg tables.
  • PROJECT_ID: the project that is billed for using the Iceberg REST catalog, which might be different from the project that owns the Cloud Storage bucket. For details about project configuration when using a REST API, see System parameters.

This example doesn't use credential vending. To use credential vending, you must add the X-Iceberg-Access-Delegation header to Iceberg REST catalog requests with a value of vended-credentials, by adding the following line to the SparkSession builder:

.config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials')

Example with credential vending

The following example configures the query engine with credential vending:

import pyspark
from pyspark.context import SparkContext
from pyspark.sql import SparkSession

catalog_name = "CATALOG_NAME"
spark = SparkSession.builder.appName("APP_NAME") \
  .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \
  .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \
  .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1/restcatalog') \
  .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'gs://CLOUD_STORAGE_BUCKET_NAME') \
  .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \
  .config(f'spark.sql.catalog.{catalog_name}.rest.auth.type', 'org.apache.iceberg.gcp.auth.GoogleAuthManager') \
  .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.gcp.gcs.GCSFileIO') \
  .config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials') \
  .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \
  .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
  .config('spark.sql.defaultCatalog', 'CATALOG_NAME') \
  .getOrCreate()

For more information, see the Headers in the RESTCatalog section of the Iceberg documentation.

Dataproc clusters support Google authorization flows for Iceberg in the following releases:

  • Dataproc on Compute Engine 2.2 image versions 2.2.65 and later.
  • Dataproc on Compute Engine 2.3 image versions 2.3.11 and later.

Serverless

Submit a PySpark batch workload to Google Cloud Serverless for Apache Spark with the following configuration:

gcloud dataproc batches submit pyspark PYSPARK_FILE \
    --project=PROJECT_ID \
    --region=REGION \
    --version=RUNTIME_VERSION \
    --properties="\
    spark.sql.defaultCatalog=CATALOG_NAME,\
    spark.sql.catalog.CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog,\
    spark.sql.catalog.CATALOG_NAME.type=rest,\
    spark.sql.catalog.CATALOG_NAME.uri=https://biglake.googleapis.com/iceberg/v1/restcatalog,\
    spark.sql.catalog.CATALOG_NAME.warehouse=gs://CLOUD_STORAGE_BUCKET_NAME,\
    spark.sql.catalog.CATALOG_NAME.header.x-goog-user-project=PROJECT_ID,\
    spark.sql.catalog.CATALOG_NAME.rest.auth.type=org.apache.iceberg.gcp.auth.GoogleAuthManager,\
    spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,\
    spark.sql.catalog.CATALOG_NAME.rest-metrics-reporting-enabled=false"

Replace the following:

  • PYSPARK_FILE: the gs:// Cloud Storage path to your PySpark application file.
  • PROJECT_ID: your Google Cloud project ID.
  • REGION: the region for the Dataproc batch workload.
  • RUNTIME_VERSION: the Serverless for Apache Spark runtime version, for example 2.2.
  • CATALOG_NAME: a name for your Iceberg REST catalog.
  • CLOUD_STORAGE_BUCKET_NAME: the name of the Cloud Storage bucket that stores the BigLake Iceberg tables.

To use credential vending, you must add the X-Iceberg-Access-Delegation header to Iceberg REST catalog requests with a value of vended-credentials, by adding the following line to the Serverless for Apache Spark configurations:

.config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials')

Example with credential vending

The following example configures the query engine with credential vending:

gcloud dataproc batches submit pyspark PYSPARK_FILE \
    --project=PROJECT_ID \
    --region=REGION \
    --version=RUNTIME_VERSION \
    --properties="\
    spark.sql.defaultCatalog=CATALOG_NAME,\
    spark.sql.catalog.CATALOG_NAME=org.apache.iceberg.spark.SparkCatalog,\
    spark.sql.catalog.CATALOG_NAME.type=rest,\
    spark.sql.catalog.CATALOG_NAME.uri=https://biglake.googleapis.com/iceberg/v1/restcatalog,\
    spark.sql.catalog.CATALOG_NAME.warehouse=gs://CLOUD_STORAGE_BUCKET_NAME,\
    spark.sql.catalog.CATALOG_NAME.header.x-goog-user-project=PROJECT_ID,\
    spark.sql.catalog.CATALOG_NAME.rest.auth.type=org.apache.iceberg.gcp.auth.GoogleAuthManager,\
    spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,\
    spark.sql.catalog.CATALOG_NAME.rest-metrics-reporting-enabled=false,
    spark.sql.catalog.CATALOG_NAME.header.X-Iceberg-Access-Delegation=vended-credentials"

For more information, see the Headers in the RESTCatalog section of the Iceberg documentation.

Serverless for Apache Spark supports Google authorization flows for Iceberg in the following runtime versions:

  • Serverless for Apache Spark 2.2 runtimes 2.2.60 and later
  • Serverless for Apache Spark 2.3 runtimes 2.3.10 and later

Trino

To use Trino with the Iceberg REST catalog, create a Dataproc cluster with the Trino component and configure catalog properties using the gcloud dataproc clusters create --properties flag. The following example creates a Trino catalog named CATALOG_NAME:

gcloud dataproc clusters create CLUSTER_NAME \
    --enable-component-gateway \
    --region=REGION \
    --image-version=DATAPROC_VERSION \
    --network=NETWORK_ID \
    --optional-components=TRINO \
    --properties="\
trino-catalog:CATALOG_NAME.connector.name=iceberg,\
trino-catalog:CATALOG_NAME.iceberg.catalog.type=rest,\
trino-catalog:CATALOG_NAME.iceberg.rest-catalog.uri=https://biglake.googleapis.com/iceberg/v1/restcatalog,\
trino-catalog:CATALOG_NAME.iceberg.rest-catalog.warehouse=gs://CLOUD_STORAGE_BUCKET_NAME,\
trino-catalog:CATALOG_NAME.iceberg.rest-catalog.biglake.project-id=PROJECT_ID,\
trino-catalog:CATALOG_NAME.iceberg.rest-catalog.rest.auth.type=org.apache.iceberg.gcp.auth.GoogleAuthManager"

Replace the following:

  • CLUSTER_NAME: a name for your cluster.
  • REGION: the Dataproc cluster region.
  • DATAPROC_VERSION: Dataproc image version, for example 2.2.
  • NETWORK_ID: cluster network ID. For more information, see Dataproc Cluster Network Configuration.
  • CATALOG_NAME: a name for your Trino catalog using the Iceberg REST catalog.
  • CLOUD_STORAGE_BUCKET_NAME: the name of the Cloud Storage bucket that stores the BigLake Iceberg tables.
  • PROJECT_ID: your Google Cloud project ID to use for BigLake metastore.

After cluster creation, use SSH to connect to the main VM instance, and then use the Trino CLI as follows:

trino

Dataproc Trino supports Google authorization flows for Iceberg in the following releases:

  • Dataproc on Compute Engine 2.2 runtime versions 2.2.65 and later
  • Dataproc on Compute Engine 2.3 runtime versions 2.3.11 and later
  • Dataproc on Compute Engine 3.0 is not supported.

Iceberg 1.10 or later

Open source Iceberg 1.10 and later releases have built-in support for Google authorization flows in GoogleAuthManager. The following is an example of how to configure Apache Spark to use the BigLake metastore Iceberg REST Catalog.

import pyspark
from pyspark.context import SparkContext
from pyspark.sql import SparkSession

catalog_name = "CATALOG_NAME"
spark = SparkSession.builder.appName("APP_NAME") \
  .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \
  .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \
  .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1/restcatalog') \
  .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'gs://CLOUD_STORAGE_BUCKET_NAME') \
  .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \
  .config(f'spark.sql.catalog.{catalog_name}.rest.auth.type', 'org.apache.iceberg.gcp.auth.GoogleAuthManager') \
  .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.gcp.gcs.GCSFileIO') \
  .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \
  .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
  .config('spark.sql.defaultCatalog', 'CATALOG_NAME') \
  .getOrCreate()

Replace the following:

  • CATALOG_NAME: a name for your Iceberg REST catalog.
  • APP_NAME: a name for your Spark session.
  • CLOUD_STORAGE_BUCKET_NAME: the name of the Cloud Storage bucket that stores the BigLake Iceberg tables.
  • PROJECT_ID: the project that is billed for using the Iceberg REST catalog, which might be different from the project that owns the Cloud Storage bucket. For details about project configuration when using a REST API, see System parameters.

The preceding example doesn't use credential vending. To use credential vending, you must add the X-Iceberg-Access-Delegation header to Iceberg REST catalog requests with a value of vended-credentials, by adding the following line to the SparkSession builder:

.config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials')

Example with credential vending

The following example configures the query engine with credential vending:

import pyspark
from pyspark.context import SparkContext
from pyspark.sql import SparkSession

catalog_name = "CATALOG_NAME"
spark = SparkSession.builder.appName("APP_NAME") \
  .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \
  .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \
  .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1/restcatalog') \
  .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'gs://CLOUD_STORAGE_BUCKET_NAME') \
  .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \
  .config(f'spark.sql.catalog.{catalog_name}.rest.auth.type', 'org.apache.iceberg.gcp.auth.GoogleAuthManager') \
  .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.gcp.gcs.GCSFileIO') \
  .config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials') \
  .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \
  .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
  .config('spark.sql.defaultCatalog', 'CATALOG_NAME') \
  .getOrCreate()

For more information, see the Headers in the RESTCatalog section of the Iceberg documentation.

Prior Iceberg releases

For open source Iceberg releases prior to 1.10, you can configure standard OAuth authentication by configuring a session with the following:

import pyspark
from pyspark.context import SparkContext
from pyspark.sql import SparkSession

catalog_name = "CATALOG_NAME"
spark = SparkSession.builder.appName("APP_NAME") \
  .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.1,org.apache.iceberg:iceberg-gcp-bundle:1.9.1') \
  .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \
  .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \
  .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1/restcatalog') \
  .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'gs://CLOUD_STORAGE_BUCKET_NAME') \
  .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \
  .config(f"spark.sql.catalog.{catalog_name}.token", "TOKEN") \
  .config(f"spark.sql.catalog.{catalog_name}.oauth2-server-uri", "https://oauth2.googleapis.com/token") \
  .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.gcp.gcs.GCSFileIO') \
  .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \
  .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
  .config('spark.sql.defaultCatalog', 'CATALOG_NAME') \
  .getOrCreate()

Replace the following:

  • CATALOG_NAME: a name for your Iceberg REST catalog.
  • APP_NAME: a name for your Spark session.
  • CLOUD_STORAGE_BUCKET_NAME: the name of the Cloud Storage bucket that stores the BigLake Iceberg tables.
  • PROJECT_ID: the project that is billed for using the Iceberg REST catalog, which might be different from the project that owns the Cloud Storage bucket. For details about project configuration when using a REST API, see System parameters.
  • TOKEN: your authentication token, which is valid for one hour—for example, a token generated using gcloud auth application-default print-access-token.

The preceding example doesn't use credential vending. To use credential vending, you must add the X-Iceberg-Access-Delegation header to Iceberg REST catalog requests with a value of vended-credentials, by adding the following line to the SparkSession builder:

.config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials')

Example with credential vending

The following example configures the query engine with credential vending:

import pyspark
from pyspark.context import SparkContext
from pyspark.sql import SparkSession

catalog_name = "CATALOG_NAME"
spark = SparkSession.builder.appName("APP_NAME") \
  .config('spark.jars.packages', 'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.1,org.apache.iceberg:iceberg-gcp-bundle:1.9.1') \
  .config(f'spark.sql.catalog.{catalog_name}', 'org.apache.iceberg.spark.SparkCatalog') \
  .config(f'spark.sql.catalog.{catalog_name}.type', 'rest') \
  .config(f'spark.sql.catalog.{catalog_name}.uri', 'https://biglake.googleapis.com/iceberg/v1/restcatalog') \
  .config(f'spark.sql.catalog.{catalog_name}.warehouse', 'gs://CLOUD_STORAGE_BUCKET_NAME') \
  .config(f'spark.sql.catalog.{catalog_name}.header.x-goog-user-project', 'PROJECT_ID') \
  .config(f"spark.sql.catalog.{catalog_name}.token", "TOKEN") \
  .config(f"spark.sql.catalog.{catalog_name}.oauth2-server-uri", "https://oauth2.googleapis.com/token") \
  .config(f'spark.sql.catalog.{catalog_name}.io-impl', 'org.apache.iceberg.gcp.gcs.GCSFileIO') \
  .config(f'spark.sql.catalog.{catalog_name}.header.X-Iceberg-Access-Delegation','vended-credentials') \
  .config(f'spark.sql.catalog.{catalog_name}.rest-metrics-reporting-enabled', 'false') \
  .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
  .config('spark.sql.defaultCatalog', 'CATALOG_NAME') \
  .getOrCreate()

For more information, see the Headers in the RESTCatalog section of the Iceberg documentation.

Create a namespace

Spark

spark.sql("CREATE NAMESPACE IF NOT EXISTS NAMESPACE_NAME;")

spark.sql("USE NAMESPACE_NAME;")

Replace NAMESPACE_NAME with a name for your namespace.

Trino

CREATE SCHEMA IF NOT EXISTS  CATALOG_NAME.SCHEMA_NAME;

USE CATALOG_NAME.SCHEMA_NAME;

Replace the following:

  • CATALOG_NAME: a name for your Trino catalog using the Iceberg REST catalog.
  • SCHEMA_NAME: a name for your schema.

Create a table

Spark

spark.sql("CREATE TABLE TABLE_NAME (id int, data string) USING ICEBERG;")

spark.sql("DESCRIBE NAMESPACE_NAME.TABLE_NAME").show()

Replace the following:

  • NAMESPACE_NAME: the name of your namespace
  • TABLE_NAME: a name for your table

Trino

CREATE TABLE TABLE_NAME (id int, data varchar);

DESCRIBE TABLE_NAME;

Replace TABLE_NAME with a name for your table.

List tables

Spark

spark.sql("SHOW TABLES").show()

Trino

SHOW TABLES;

Insert data into the table

The following example inserts sample data into the table:

Spark

spark.sql("INSERT INTO TABLE_NAME VALUES (1, \"first row\"), (2, \"second row\"), (3, \"third row\");")

Trino

INSERT INTO TABLE_NAME VALUES (1, 'first row'), (2, 'second row'), (3, 'third row');

Query a table

The following example selects all data from the table:

Spark

spark.sql("SELECT * FROM TABLE_NAME;").show()

Trino

SELECT * FROM TABLE_NAME;

The following example queries the same table from BigQuery:

SELECT * FROM `CLOUD_STORAGE_BUCKET_NAME>NAMESPACE_OR_SCHEMA_NAME.TABLE_NAME`;

Replace the following:

  • CLOUD_STORAGE_BUCKET_NAME: The name of the Cloud Storage bucket for your Iceberg REST catalog. For example, if your URI is gs://iceberg_bucket, use iceberg_bucket.
  • NAMESPACE_OR_SCHEMA_NAME: The table namespace if using Spark or table schema name if using Trino.

  • TABLE_NAME: The name of your table.

Alter a table schema

The following example adds a column to the table:

Spark

spark.sql("ALTER TABLE TABLE_NAME ADD COLUMNS ( desc string);")
spark.sql("DESCRIBE NAMESPACE_NAME.TABLE_NAME").show()

Replace the following:

  • NAMESPACE_NAME: the name of your namespace
  • TABLE_NAME: a name for your table

Trino

ALTER TABLE TABLE_NAME ADD COLUMN desc varchar;
DESCRIBE SCHEMA_NAME.TABLE_NAME;

Replace the following:

  • SCHEMA_NAME: the name of your schema
  • TABLE_NAME: a name for your table

Delete a table

The following example deletes the table from the given namespace:

Spark

spark.sql("DROP TABLE TABLE_NAME;")

Trino

DROP TABLE TABLE_NAME;

Pricing

For pricing details, see BigLake pricing.

What's next