Dataplex Universal Catalog is a unified, intelligent data governance solution that helps you manage, understand, and use your data assets in your organization. By using AI, Dataplex Universal Catalog simplifies working with data distributed across various systems, letting you focus on gaining valuable insights.
For example, consider a global retail company that generates large amounts of sales, inventory, and customer data and stores it in Cloud Storage, Spanner, and Pub/Sub. When data is distributed across systems in this way, it can be complex and time-consuming to manage governance, ensure quality, and maintain compliance. Dataplex Universal Catalog simplifies performing these processes by providing a central data catalog to discover, profile, validate, track the lineage of, and control access to organizational data assets.
This document describes Dataplex Universal Catalog core features and highlights key use cases.
Dataplex Universal Catalog features
Dataplex Universal Catalog governs data through the following features:
- Metadata cataloging. Retrieve metadata for Google Cloud resources (in BigQuery, Cloud SQL, Spanner, Vertex AI, Pub/Sub, Dataform, Dataproc Metastore), and third-party resources you bring into Dataplex Universal Catalog, for an instant data catalog.
- Data discovery. Scan for structured and unstructured data in Cloud Storage buckets to extract and catalog their metadata.
- Data insights. Use AI to generate natural language questions about your data, to uncover patterns, assess data quality, and perform statistical analyses.
- Data profiling. Identify common characteristics of the column data in your BigQuery tables, for example, typical data values, data distribution, and null counts, which can inform data classification and quality assurance.
- Data quality. Define and measure the quality of the data in your BigQuery tables, by validating data against organizational policies and logging alerts if data doesn't meet quality criteria.
- Business glossary. Manage business-related terminology and definitions across your organization, and attach terms to table columns to promote a consistent understanding of data usage.
- Data lineage. Track how data moves through your systems: where it comes from, where it is passed to, and what transformations are applied to it.
Dataplex Universal Catalog supports an end-to-end data lifecycle, from distributed discovery to business insights. Governance features are also available through BigQuery.
Use cases
You can use Dataplex Universal Catalog to do the following:
Discover and understand your data. Dataplex Universal Catalog provides visibility over your data resources across the organization. It lets you find relevant resources for data consumption needs. It provides context for data resources, which helps you understand the suitability of data resources for your data consumer's needs.
Enable data governance and data management. Dataplex Universal Catalog supplies metadata that can inform and power your data governance and data management capabilities.
Create a central data catalog. Dataplex Universal Catalog stores and provides access to metadata that is automatically harvested from your Google Cloud resources. You can integrate your own metadata from non-Google Cloud systems. You can enrich all metadata with additional business and technical metadata annotations.
Get started
If this is your first time working with Dataplex Universal Catalog, consider following a quickstart:
What's next
- Learn about metadata management in Dataplex Universal Catalog.
- Learn how to search for data assets.
- Learn how to manage entries and ingest custom sources.
- Learn how to import metadata into Dataplex Universal Catalog.
- Learn about BigQuery governance.