O'Reilly reposted this
Hudi Streamer is your all-in-one tool for building up a data lakehouse. Out of the box, it provides a wide range of data source support. You can connect it with Debezium that continuously reads change logs from a Postgres table, or you can read incremental changes from another Apache Hudi table to form a chain of data processing pipelines. Beyond data sources, Hudi Streamer also supports data transformations, managing table services like compaction and clustering, and syncing with multiple data catalogs, such as Apache Hive Metastore, AWS Glue Catalog, Google BigQuery, DataHub, and more through the Apache XTable (Incubating) extension. Read chapter 8 of "Apache Hudi™: The Definitive Guide", which shows you real-world examples of using Hudi Streamer to build a data lakehouse. This is the first book ever written about Apache Hudi, by industry experts: Shiyan Xu, Prashant Wason, Bhavani Sudha Saktheeswaran, and Rebecca Bilbro, PhD. 👉 Get a free copy of the e-book (8 early-release chapters now available!): https://lnkd.in/e8svK5pB #ApacheHudi #DataLake #DataEngineering #DataLakehouse