TL;DR

Streambed is a new tool that streams Postgres write-ahead log (WAL) changes directly to Iceberg tables stored on S3, allowing analytical queries without altering production databases. It supports the Postgres wire protocol for easy integration.

Streambed has been introduced as a new tool that streams PostgreSQL WAL changes directly into Iceberg tables stored on S3, supporting the Postgres wire protocol for querying. This development allows users to offload analytical workloads from their production databases without requiring ETL processes or Spark, potentially improving performance and scalability.

Streambed connects to PostgreSQL as a logical replication subscriber, decoding WAL messages (including inserts, updates, and deletes) and buffering rows per table. It periodically flushes this data as Parquet files to S3, simultaneously updating Iceberg metadata. Updates and deletes are handled via copy-on-write merging within the Parquet data. Additionally, Streambed includes a built-in query server that exposes Iceberg tables over the Postgres wire protocol, enabling users to query these tables with standard Postgres clients such as psql.

The system is designed for ease of use, requiring only Go 1.22+ and CGO. Deployment involves running the sync daemon and query server, which can be configured via environment variables. The architecture leverages decoding WAL messages, buffering data, and writing it to S3 in Parquet format, with Iceberg metadata managing table consistency and schema evolution. The tool also supports one-shot resyncs for backfilling data and cleanup commands to delete S3 objects and state for specific tables.

Why It Matters

This development matters because it offers a simplified, real-time method for analytics teams to access large datasets stored on S3 without impacting production database performance. By supporting the Postgres wire protocol, it allows seamless integration with existing Postgres tools and workflows, reducing the need for complex ETL pipelines and Spark-based processing. This could accelerate data analysis, reduce costs, and improve operational efficiency for organizations managing large-scale data lakes and OLAP workloads.

Amazon

PostgreSQL WAL streaming tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Traditional approaches to offloading analytical queries involve ETL pipelines or Spark-based processing, which can introduce latency and complexity. Recent efforts have focused on streaming change data capture (CDC) directly into data lakes, but many solutions lack support for standard Postgres clients or require significant setup. Streambed builds on the concept of CDC by providing a lightweight, real-time streaming engine that writes directly to Iceberg on S3, with native support for Postgres wire protocol. Its announcement on Hacker News highlights a growing trend toward simplifying data lake analytics and integrating operational databases with analytical engines in a scalable manner.

“Streambed streams WAL changes via logical replication, writes Parquet files to S3, and commits Iceberg metadata, enabling real-time analytics without changing your application.”

— Viggy28, creator of Streambed

“Supports the Postgres wire protocol so you can connect with psql, making it easy to query the data as if it were a regular Postgres database.”

— Hacker News user comment

Iceberg 69227 ARC 6-Foot Rectangular Table, 36" x 72", Graphite/Silver Leg

Iceberg 69227 ARC 6-Foot Rectangular Table, 36" x 72", Graphite/Silver Leg

Versatile for open plan environments

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is not yet clear how well Streambed performs at scale, its compatibility with various Postgres versions, or how it handles schema changes over time. Further testing and real-world deployment will determine its robustness and ease of use in production environments.

Amazon

Postgres wire protocol client

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include widespread testing in different environments, evaluating performance at scale, and potential feature enhancements such as support for additional data sources or improved schema evolution handling. Monitoring community adoption and feedback will also shape future development.

Amazon

Parquet file storage on S3

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

How does Streambed handle updates and deletes?

Streambed uses copy-on-write merging against existing Parquet data to handle updates and deletes, ensuring data consistency in Iceberg tables.

Can I query the streamed data with standard Postgres tools?

Yes, Streambed includes a query server that exposes Iceberg tables over the Postgres wire protocol, allowing use of tools like psql.

What are the system requirements to run Streambed?

It requires Go 1.22+ and CGO, with Docker needed for integration testing. Deployment involves running the sync daemon and query server with configurable environment variables.

Does Streambed support schema changes in Postgres?

Details on schema change support are not fully specified; further testing is needed to confirm how dynamic schema modifications are handled.

Is there any latency involved in the streaming process?

The system streams WAL changes periodically, so some latency is inherent, but exact performance metrics are not yet available.

Source: Hacker News

You May Also Like

Master Focus Modes: Tame Notifications on Any Device

To tame notifications on any device, mastering focus modes is key. Start…

Photo HEIC Vs JPEG: Which Format Should You Use?

Better image quality or compatibility? Discover which photo format suits your needs best in our comprehensive comparison.

Family Sharing: Share Apps, Not Passwords

Stay secure and share apps easily with Family Sharing, but discover how to maximize benefits while protecting your privacy.

Make Shortcuts: Automate Boring Phone Tasks in Minutes

Create custom shortcuts to automate tedious phone tasks in minutes and unlock new ways to save time and enhance your daily routine.