Learn how to batch records off-chain, compute a Merkle root, and anchor it on Hedera Consensus Service for cost-effective verification.
The Hedera Consensus Service (HCS) enables decentralized event ordering and immutable timestamping for any application. A best practice for data integrity involves anchoring a ‘digital fingerprint’ of your records on-chain, which provides a verifiable audit trail without exposing sensitive information. Merkle roots are cryptographic summaries that enable the efficient verification of large datasets, allowing you to also prove the existence of individual records within a batch. This tutorial demonstrates how to use these tools to verify data on a public ledger like Hedera in a manner that is both highly secure and cost-effective.
When you ran npm install in Step 1, a postinstall script automatically executed scripts/generate-data-internal.js. This script populated the data/ directory with the sample datasets (batch-10.json and batch-100.json) and their corresponding Merkle proofs.Each record in the generated JSON files looks like this:
To ensure the hash is deterministic (always the same for the same data), we “canonicalize” the record before hashing. This means:
Sorting the object keys alphabetically.
Removing all whitespace.
Encoding as UTF-8.
This ensures that { "a": 1, "b": 2 } and { "b": 2, "a": 1 } result in the exact same hash.
Canonicalization is the process of converting data into a standard, unique format. It is essential because different representations of the same logical data (like different key orders or whitespace) would produce different hashes, making verification difficult.
3. Create a Topic for Batch Anchoring and Verification
Run the setup script to create a new HCS topic:
Copy
Ask AI
node scripts/01-create-topic.js
View the code
The 01-create-topic.js script initializes the Hedera client and calls the createTopic helper function to mint a new topic ID.
Copy
Ask AI
const { createTopic } = require('../src/hedera');async function main() { console.log('--- 1. Create HCS Topic ---'); if (!process.env.OPERATOR_ID || !process.env.OPERATOR_KEY) { console.error('Error: OPERATOR_ID or OPERATOR_KEY missing in .env'); process.exit(1); } try { const { topicId, transactionId } = await createTopic(); console.log(`\n✅ Created topic: ${topicId}`); console.log(` Transaction ID: ${transactionId}`); console.log(` HashScan: https://hashscan.io/testnet/transaction/${transactionId}`); console.log(`\n👉 Add this to your .env file:\nTOPIC_ID=${topicId}`); } catch (err) { console.error('Error creating topic:', err.message); process.exit(1); }}main();
Expected Output:
Copy
Ask AI
✅ Created topic: 0.0.98765 Transaction ID: [email protected] HashScan: https://hashscan.io/testnet/transaction/[email protected]👉 Add this to your .env file:TOPIC_ID=0.0.98765
Before anchoring on-chain, calculate the Merkle root locally for the dataset you want to anchor. This example uses the dataset in data/batch-100.json. Run scripts/02-compute-root.js as shown below:
Load Dataset: Reads the JSON file from the data/ directory.
Canonicalize: Standardizes each record to ensure a deterministic hash.
Hash: Computes the SHA-256 hash of each canonicalized record (the leaves of the tree).
Compute Root: Recursively pairs and hashes leaves using computeRoot until a single root hash remains.
View the code
Copy
Ask AI
const fs = require('fs');const path = require('path');const { canonicalize } = require('../src/canonicalize');const { sha256 } = require('../src/hash');const { computeRoot } = require('../src/merkle');const args = process.argv.slice(2);let datasetName = 'batch-10'; // defaultfor (let i = 0; i < args.length; i++) { if (args[i].startsWith('--dataset=')) { datasetName = args[i].split('=')[1]; } else if (args[i] === '--dataset' && i + 1 < args.length) { datasetName = args[i + 1]; i++; // skip the value }}async function main() { console.log('--- 2. Compute Merkle Root (Local) ---'); console.log(`Using dataset: ${datasetName}`); // 1. Load Dataset const filePath = path.join(__dirname, `../data/${datasetName}.json`); if (!fs.existsSync(filePath)) { console.error(`Error: Dataset not found at ${filePath}`); process.exit(1); } const batch = JSON.parse(fs.readFileSync(filePath)); console.log(`1) Loaded ${batch.length} records.`); // 2. Canonicalize & 3. Hash Leaves const leaves = batch.map(record => sha256(canonicalize(record))); console.log('2, 3) Canonicalized and computed leaf hashes.'); // 4. Compute Root const rootBuffer = computeRoot(leaves); const rootHex = rootBuffer.toString('hex'); console.log(`4) Computed Merkle Root: ${rootHex}`); console.log('\nSuccess! You can now anchor this root on HCS in the next step.');}main();
Expected Output:
Copy
Ask AI
--- 2. Compute Merkle Root (Local) ---Using dataset: batch-1001) Loaded 100 records.2, 3) Canonicalized and computed leaf hashes.4) Computed Merkle Root: 1d59720e...Success! You can now anchor this root on HCS in the next step.
Now that you have the root hash, proceed to anchor it on Hedera. This step recomputes the root for safety and then submits a message to HCS.While you could manually use the root hash from the previous step, recomputing it immediately before submission is a best practice. This ensures the anchor reflects the current state of your local dataset and serves as a final integrity check before committing the hash to the public ledger.
With the Merkle root hash on the public ledger, anyone can verify the batch integrity. Running scripts/04-verify-batch.js confirms this by completing the following steps:
Recompute Root: Loads the local dataset and calculates the Merkle root from your local data/batch-100.json exactly as before using computeRoot.
Fetch Message: Queries the Mirror Node REST API for the latest message on the topic using getLatestTopicMessage.
Compare: Decode the message and verify that the on-chain root matches the locally computed root.
Hedera operates a free/public mirror node for testing and development. Production applications should use commercial-grade mirror node services provided by third-party vendors.
A powerful feature of Merkle trees is that they enable proving one item is in the batch without revealing the other items.For simplicity, in this tutorial we use pre-generated proofs in data/proofs-100.json. The script takes the single record’s hash and combines it with “siblings” from the pre-generated proof until it reaches the root. If the calculated root matches the trusted root, the record is proven content.Running scripts/05-verify-single-record.js demonstrates Merkle proofs with the following steps:
Load Proof: Reads the pre-generated Merkle proof for the specific record.
Trusted Root: In a real scenario, this comes from HCS (as in step 6). Here we simulate it with a manifest (data/manifest.json).
Verify: Use the verifyProof function to hash the record with its sibling hashes up the tree. If the final hash matches the trusted root, the record is proven.
If your anchor message exceeds 1 KB (e.g., if you added a lot of metadata), you must use HCS Chunking.
The SDK handles this automatically if you configure it:
Copy
Ask AI
new TopicMessageSubmitTransaction() .setMessage(largeContent) .setMaxChunks(20) // Default is 20 .execute(client);
For this tutorial, our anchor message is ~200 bytes, so no chunking was needed.