⬆️Upgrading

Introduction

This document can be used as a tool to implement an upgrade process in the Hedera Guardian application. It provides detailed step-by-step instructions for upgrading an open-source Hedera Guardian application from the current version to the target version. It includes expanded information and additional guidance for each section of the upgrade process. Please follow the instructions outlined below:

Actors and Participants

The actors that will be involved in the guardian upgrading process are:

  • Guardian Development Team

    • Solution development.

    • Documentation provisioning.

  • Guardian Administrator (customer side)

    • Backup execution.

    • Scripting Execution.

    • Configuration customization.

Theory

Requirements

Depending on how large the upgrades are, there could be a lot of work keeping versions correct. Proper tools, documentation, and methodologies should be created to respond to upgrade needs (How will our customers upgrade their solution? What solutions need to be put in place? Etc.)

Related requirements:

  1. Find a qualified source to create an enterprise-grade version of Guardian;

  2. Consolidate, package, and normalize the solution architecture to match development best practices, supporting existing Hedera environments (currently defined as a local node, testnet, previewnet, or mainnet) deployed on-premises and on clouds;

  3. Cloud Infrastructure: All Guardian source code and secrets should be deployed via Infrastructure as Code in cloud. In particular, the repo should contain all the artifacts and the documentation for the deployment of the Guardian on Amazon Web Services, Google Cloud Platform and Microsoft Azure.

Data Upgrading Process

The upgrading of the Guardian functionalities may include the necessity of applying changes in the database schemas. In this case the Process of Upgrading is split between Developer and Customer.

Data Upgrading process involves the developer team providing the solution for Upgrading while the Customer is the solution executer. The main problem while upgrading a run time operational database is the migration of all data from the previous version schema to the new version.

The migration process guides the team to produce artifacts that will help to correctly define the migration itself and the customer to decide for upgrading and executing the data migration.

In this case the migration that we account for is an homogeneous migration: a migration from source databases to target databases where the source and target databases are of the same database management system. During upgrading the system, the schemas for the source and target databases are almost identical except for changes in some of the fields, collections and documents. For changing data the source databases must be transformed during migration.

1) Data Migration Profiling:

Without a good understanding of the Data model the organization could run into a critical flaw that halts the system and brings Guardian to stop for data corruption and inconsistency. This phase would have β€œData Migration Model” as output. This document outlines all the data that needs to be migrated, the complete mapping between the Data Source and Data Destination and every transformation in terms of:

  • Data type: to cast the source value into the target value based on type transformation rules.

  • Data structure: to describe modification of the structure of a collection in the database model.

  • Data value: to change the format of data without changing the data type.

  • Data enrichment and correlation (adding and merging to one collection).

  • Data reduction and filtering (splitting to several collections).

  • Data views: to allow the maintenance of DAO contracts during Data reduction.

Furthermore, the document should:

  • Map every data to User Functionality (Rest API) that involves that data.

  • Map every data to messages data flows to realize the functionality.

  • Specify data replication in the guardian data sources (only DB Data, Blockchain Data, Multi Service).

  • Break the data into subsets to determine all the data changes that have to be applied together.

The document has to specify the following data parameters:

  • Expected size of your data,

  • the number of data sources,

  • the number of target systems,

  • Migration time evaluation per data size reading, writing, network latency and the expected time per expected data size.

2) Design phase: this phase has the β€œDesign Document” as output.

The type of data migration could be either big bang or trickle:

  • In a big bang data migration, the full transfer is completed within a limited window of time. Live systems experience downtime while data goes through ETL (Extract, transform, load) processing and transitions to the new database.

  • Trickle migrations, in contrast, complete the migration process in phases. During implementation, the old system and the new are run in parallel, which eliminates downtime or operational interruptions. Processes running in real-time can keep data migrating continuously.

The document should contain:

  • the requirements and the timeline for the project. Allocate time for every testing phase and validation phase.

  • Should define the migration type as described above.

  • Should consider security plans for the data. Any data that needs to be protected should have protection threaded throughout the plan.

  • Establish data quality and health checks by determining which data integrity problems could arise from your data set.

  • The Migration process needs to be detailed, taking care of:

    • Target database addressing using environment description.

    • Persistence of in-transit data: To resume at the point where special events happen, the system needs to keep an internal state on the migration progress: Errors, Connection Lost, large window processing of the data, provides process repeatability.

    • Define how to track the items that are filtered out from transformation/migration phases , you can then compare the source and target databases along with the filtered items.

    • For every batch of data define the exact plan and roll back strategy

    • Define Customer test to verify consistency: This check ensures that each data item is migrated only once, and that the datasets in the source and target databases are identical and that the migration is complete.

  • Define roles and responsibilities of the data migration.

  • A Validation phase has to be defined with:

    • Who has the authority to determine whether the migration was successful?

    • After database migration, who will validate data?

    • Which tool will help in data validation: this tool will be the main instrument to verify data consistency. This check ensures that each data item is migrated only once, and that the datasets in the source and target databases are identical and that the migration is complete.

  • Define backup and disaster recovery strategies. Create a DB backup of Mongo: replica set is a very good solution for availability but to provide real backup solution define a dedicated backup Mongo copy.

3) Build the Migration Solution

Break the data into subsets and build out migration of one category at a time, followed by a test. (TOOL) The Developer

4) Build the consistency validation Test

Build the customer check to compare the source and target databases along with the filtered items.

5) Back up

The data before executing. In case something goes wrong during the implementation, you can’t afford to lose data. Make sure there are backup resources and that they’ve been tested before you proceed (MongoDB: Replica set).

6) Conduct a Live Test

The testing process isn’t over after testing the code during the build phase. It’s important to test the data migration design with real data to ensure the accuracy of the implementation and completeness of the application: consistency test. (TOOL)

7) Execute the plan

Implementing what described in step 2. (TOOL)

Migrate data in batches. Migration can take a long time, so batching up the data will prevent any interruption in service. Once the first batch is successfully migrated and tested, you can move on to the next set and revalidate accordingly.

8) Test your migration process

During the first batch of data being migrated, try to analyze all the steps and see if the process is completed successfully or if it needs to be modified before moving on to the next batch.

9) Validation Test

You need to verify that your database migration is complete and consistent. Before you deploy this production-level data, test the new data with real life scenarios before moving it to production in order to validate that all the work done aligns with the overall plan.

10) Audit

Once the implementation has gone live, set up a system to audit the data in order to ensure the accuracy of the migration. (Performance and monitoring)

Migration Consistency

The expectation is that a database migration is consistent. In the context of migration, consistent means the following:

  • Complete. All data that is specified to be migrated is actually migrated. The specified data could be all data in a source database or a subset of the data.

  • Duplicate free. Each piece of data is migrated once, and only once. No duplicate data is introduced into the target database.

  • Ordered. The data changes in the source database are applied to the target database in the same order as the changes occurred in the source database. This aspect is essential to ensure data consistency.

An alternative way to describe migration consistency is that after a migration completes, the data state between the source and the target databases is equivalent. For example, in a homogenous migration that involves the direct mapping of a relational database, the same tables and rows must exist in the source and the target databases.

Tools Comparison

Self scripted tools

These solutions are ideal for small-scale projects and quick fixes. These can also be used when a specific destination or source is unsupported by other tools. Self-Scripted Data Migration Tools can be developed pretty quickly but require extensive coding knowledge. Self-Scripting solutions offer support for almost any destination or source but are not scalable. They are suitable only for small projects. Most of the Cloud-Based and On-Premise tools handle numerous data destinations and sources.

  • Scalability: Small and 1 Location

  • Flexibility: any data

  • Maintenance, error management, Issues during execution

Some reasons for building database migration functionality instead of using a database migration system include the following:

  • You need full control over every detail.

  • You want to reuse functionality.

  • You want to reduce costs or simplify your technological footprint.

On-Premise tools

On-Premise solutions come in handy for static data requirements with no plans to scale. They are data center level solutions that offer low latency and complete control over the stack from the application to the physical layers.

  • Data center migration level.

  • Limited scalability.

  • Secure: give full process control.

CloudBased tools

Cloud-Based Data Migration Tools are used when you need to scale up and down to meet the dynamic data requirements (mainly in ETL solution). These tools follow a pay-as-you-go pricing that eliminates unnecessary spending on unused resources.

  • Based on the cloud.

  • Big Scalability.

  • Has security concerns.

Data Migration Software parameters

Setup: easy set up in your environment.

Monitoring & Management: provides features to monitor the ETL process effectively. Enable users to take reports on various crucial data sets.

Ease of Use: learning curve.

Robust Data Transformation: data transformation feature after the data is loaded into the database. You can just useSQL.

Setup

Monitoring & Management

Ease of Use

Robust Data Transformation

Pricing / Open Source

Custom functionality

Npm/Coding

no

Yes integrated in the solution

Tested Npm tool: migrate-mongo

free

AWS Data Pipeline

yes

yes

yes

yes

$0.60 to $2.5 per activity

Hevo Data

yes

yes

yes (Autoschema mapping)

yes

FREE (1 million events)

Talend Open Studio

yes

no

By GUI

yes

Open Source / Free

MongoSyphon

JSON format configuration files

No

no GUI, SQL, scheduling via cron

early stage tool, SQL

Open Source / Free

Meltano

yes

Airflow

yes

yes

Open Source / Free

Singer

Python

No

No

taps and targets (Meltano provided)

Open Source / Free

AirByte

yes

No

yes

SQL, dbt

Free

Several other tools and pricing both on open source and commercial:

Services Upgradability Service Profiling and data migration mapping

To describe services we introduce β€œServices canvas”. A microservice canvas is a concise description of a service. It’s similar to a CRC (Class-responsibility-collaboration) card that’s sometimes used in object-oriented design. This is a template which allows a synthetic description of the service itself both for developers and stakeholder clarity. It will be compiled by developers and architects, and will be used as input during the delivery of the data migration process.

It has the following section: Service Name, Managed Data, Dependencies, Service API.

Canvas wil be used to describe the development realized in that very release in a way to be introduced incrementally. The Upgrade canvas is built not as a complete Service Canvas, but it must only describe the upgrading of the service/functionalities. In this way it will directly contain the same items really implemented in the release. A complete description of the service could also be provided in a SERVICE CANVAS that is out of the scope of the upgrading, much more difficult to be produced and more design oriented than the document.

Main Parameters

Name

Name of Service

Description

Type of Development

< Creation, Update, Deletion >

Version

< Major, Minor, Patch >

Capabilities

  • Main Service Functionality

Managed Data

Collection Names:

Type of Development: