How to build a data lakehouse with Hudi Streamer

Hudi Streamer is your all-in-one tool for building up a data lakehouse. Out of the box, it provides a wide range of data source support. You can connect it with Debezium that continuously reads change logs from a Postgres table, or you can read incremental changes from another Apache Hudi table to form a chain of data processing pipelines. Beyond data sources, Hudi Streamer also supports data transformations, managing table services like compaction and clustering, and syncing with multiple data catalogs, such as Apache Hive Metastore, AWS Glue Catalog, Google BigQuery, DataHub, and more through the Apache XTable (Incubating) extension. Read chapter 8 of "Apache Hudi™: The Definitive Guide", which shows you real-world examples of using Hudi Streamer to build a data lakehouse. This is the first book ever written about Apache Hudi, by industry experts: Shiyan Xu, Prashant Wason, Bhavani Sudha Saktheeswaran, and Rebecca Bilbro, PhD. 👉 Get a free copy of the e-book (8 early-release chapters now available!): https://lnkd.in/e8svK5pB #ApacheHudi #DataLake #DataEngineering #DataLakehouse

To view or add a comment, sign in

Explore content categories