TL;DR

Streambed is a new tool that streams Postgres write-ahead log (WAL) changes directly to Iceberg tables stored on S3, allowing analytical queries without altering production databases. It supports the Postgres wire protocol for easy integration.

Streambed has been introduced as a new tool that streams PostgreSQL WAL changes directly into Iceberg tables stored on S3, supporting the Postgres wire protocol for querying. This development allows users to offload analytical workloads from their production databases without requiring ETL processes or Spark, potentially improving performance and scalability.

Streambed connects to PostgreSQL as a logical replication subscriber, decoding WAL messages (including inserts, updates, and deletes) and buffering rows per table. It periodically flushes this data as Parquet files to S3, simultaneously updating Iceberg metadata. Updates and deletes are handled via copy-on-write merging within the Parquet data. Additionally, Streambed includes a built-in query server that exposes Iceberg tables over the Postgres wire protocol, enabling users to query these tables with standard Postgres clients such as psql.

The system is designed for ease of use, requiring only Go 1.22+ and CGO. Deployment involves running the sync daemon and query server, which can be configured via environment variables. The architecture leverages decoding WAL messages, buffering data, and writing it to S3 in Parquet format, with Iceberg metadata managing table consistency and schema evolution. The tool also supports one-shot resyncs for backfilling data and cleanup commands to delete S3 objects and state for specific tables.

Why It Matters

This development matters because it offers a simplified, real-time method for analytics teams to access large datasets stored on S3 without impacting production database performance. By supporting the Postgres wire protocol, it allows seamless integration with existing Postgres tools and workflows, reducing the need for complex ETL pipelines and Spark-based processing. This could accelerate data analysis, reduce costs, and improve operational efficiency for organizations managing large-scale data lakes and OLAP workloads.

Amazon

PostgreSQL WAL streaming tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Traditional approaches to offloading analytical queries involve ETL pipelines or Spark-based processing, which can introduce latency and complexity. Recent efforts have focused on streaming change data capture (CDC) directly into data lakes, but many solutions lack support for standard Postgres clients or require significant setup. Streambed builds on the concept of CDC by providing a lightweight, real-time streaming engine that writes directly to Iceberg on S3, with native support for Postgres wire protocol. Its announcement on Hacker News highlights a growing trend toward simplifying data lake analytics and integrating operational databases with analytical engines in a scalable manner.

“Streambed streams WAL changes via logical replication, writes Parquet files to S3, and commits Iceberg metadata, enabling real-time analytics without changing your application.”

— Viggy28, creator of Streambed

“Supports the Postgres wire protocol so you can connect with psql, making it easy to query the data as if it were a regular Postgres database.”

— Hacker News user comment

Iceberg 69227 ARC 6-Foot Rectangular Table, 36" x 72", Graphite/Silver Leg

Iceberg 69227 ARC 6-Foot Rectangular Table, 36" x 72", Graphite/Silver Leg

Versatile for open plan environments

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is not yet clear how well Streambed performs at scale, its compatibility with various Postgres versions, or how it handles schema changes over time. Further testing and real-world deployment will determine its robustness and ease of use in production environments.

Amazon

Postgres wire protocol client

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include widespread testing in different environments, evaluating performance at scale, and potential feature enhancements such as support for additional data sources or improved schema evolution handling. Monitoring community adoption and feedback will also shape future development.

Amazon

Parquet file storage on S3

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

How does Streambed handle updates and deletes?

Streambed uses copy-on-write merging against existing Parquet data to handle updates and deletes, ensuring data consistency in Iceberg tables.

Can I query the streamed data with standard Postgres tools?

Yes, Streambed includes a query server that exposes Iceberg tables over the Postgres wire protocol, allowing use of tools like psql.

What are the system requirements to run Streambed?

It requires Go 1.22+ and CGO, with Docker needed for integration testing. Deployment involves running the sync daemon and query server with configurable environment variables.

Does Streambed support schema changes in Postgres?

Details on schema change support are not fully specified; further testing is needed to confirm how dynamic schema modifications are handled.

Is there any latency involved in the streaming process?

The system streams WAL changes periodically, so some latency is inherent, but exact performance metrics are not yet available.

Source: Hacker News

You May Also Like

Disk Is the Contract: Inside Threlmark’s Local-First Architecture

Discover how Threlmark’s file-based, local-first design simplifies data handling, offline use, and collaboration—without a single database layer.

Clipboard Privacy: Don’t Leak Data Between Apps

Just how vulnerable is your clipboard to data leaks between apps? Discover essential tips to protect your privacy now.

The Skills Marketplace, Six Months Later: Predicted vs Actual

Six months into the skills marketplace’s emergence, this report compares initial predictions with actual developments, highlighting growth, fragmentation, and monetization challenges.

Scholarship application organizer for school counselors

A proposed scholarship application organizer aims to help high school counselors manage student scholarship processes more efficiently, with pilot testing planned.