Hedera ETL

Hedera ETL populates BigQuery dataset with transactions and records generated by the Hedera Mainnet (or Testnet, if so configured).

  • Extract: Stream of transactions (and records) are ingested from a GCP PubSub topic

  • Transform: Filters for important fields, formats data types, etc

  • Load: Streaming insert into BigQuery dataset

Overview

  • PubSub topic contains JSON serialized hedera transactions published by Hedera Mirror Node. More details can be found here.

  • Apache Beam pipeline pulls transactions from PubSub and inserts them into BigQuery. GCP Dataflow is used as runner for the pipeline.

  • Deduplication: The above ingestion pipeline gives at-least-once guarantee for persisting transaction into BigQuery. Duplicates, if inserted, are removed using a deduplication task.

Check out the readme here to see how you can get started.

Last updated

Was this helpful?