Hedera ETL

Hedera ETL populates BigQuery dataset with transactions and records generated by the Hedera Mainnet (or Testnet, if so configured).

Extract: Stream of transactions (and records) are ingested from a GCP PubSub topic
Transform: Filters for important fields, formats data types, etc
Load: Streaming insert into BigQuery dataset

PubSub topic contains JSON serialized hedera transactions published by Hedera Mirror Node. More details can be found here.
Apache Beam pipeline pulls transactions from PubSub and inserts them into BigQuery. GCP Dataflow is used as runner for the pipeline.
Deduplication: The above ingestion pipeline gives at-least-once guarantee for persisting transaction into BigQuery. Duplicates, if inserted, are removed using a deduplication task.

Check out the readme here to see how you can get started.

Last updated 2 years ago

Was this helpful?