β¬οΈUpgrading
Introduction
This document can be used as a tool to implement an upgrade process in the Hedera Guardian application. It provides detailed step-by-step instructions for upgrading an open-source Hedera Guardian application from the current version to the target version. It includes expanded information and additional guidance for each section of the upgrade process. Please follow the instructions outlined below:
Actors and Participants
The actors that will be involved in the guardian upgrading process are:
Guardian Development Team
Solution development.
Documentation provisioning.
Guardian Administrator (customer side)
Backup execution.
Scripting Execution.
Configuration customization.
Theory
Requirements
Depending on how large the upgrades are, there could be a lot of work keeping versions correct. Proper tools, documentation, and methodologies should be created to respond to upgrade needs (How will our customers upgrade their solution? What solutions need to be put in place? Etc.)
Related requirements:
Find a qualified source to create an enterprise-grade version of Guardian;
Consolidate, package, and normalize the solution architecture to match development best practices, supporting existing Hedera environments (currently defined as a local node, testnet, previewnet, or mainnet) deployed on-premises and on clouds;
Cloud Infrastructure: All Guardian source code and secrets should be deployed via Infrastructure as Code in cloud. In particular, the repo should contain all the artifacts and the documentation for the deployment of the Guardian on Amazon Web Services, Google Cloud Platform and Microsoft Azure.
Data Upgrading Process
The upgrading of the Guardian functionalities may include the necessity of applying changes in the database schemas. In this case the Process of Upgrading is split between Developer and Customer.
Data Upgrading process involves the developer team providing the solution for Upgrading while the Customer is the solution executer. The main problem while upgrading a run time operational database is the migration of all data from the previous version schema to the new version.
The migration process guides the team to produce artifacts that will help to correctly define the migration itself and the customer to decide for upgrading and executing the data migration.
In this case the migration that we account for is an homogeneous migration: a migration from source databases to target databases where the source and target databases are of the same database management system. During upgrading the system, the schemas for the source and target databases are almost identical except for changes in some of the fields, collections and documents. For changing data the source databases must be transformed during migration.
1) Data Migration Profiling:
Without a good understanding of the Data model the organization could run into a critical flaw that halts the system and brings Guardian to stop for data corruption and inconsistency. This phase would have βData Migration Modelβ as output. This document outlines all the data that needs to be migrated, the complete mapping between the Data Source and Data Destination and every transformation in terms of:
Data type: to cast the source value into the target value based on type transformation rules.
Data structure: to describe modification of the structure of a collection in the database model.
Data value: to change the format of data without changing the data type.
Data enrichment and correlation (adding and merging to one collection).
Data reduction and filtering (splitting to several collections).
Data views: to allow the maintenance of DAO contracts during Data reduction.
Furthermore, the document should:
Map every data to User Functionality (Rest API) that involves that data.
Map every data to messages data flows to realize the functionality.
Specify data replication in the guardian data sources (only DB Data, Blockchain Data, Multi Service).
Break the data into subsets to determine all the data changes that have to be applied together.
The document has to specify the following data parameters:
Expected size of your data,
the number of data sources,
the number of target systems,
Migration time evaluation per data size reading, writing, network latency and the expected time per expected data size.
2) Design phase: this phase has the βDesign Documentβ as output.
The type of data migration could be either big bang or trickle:
In a big bang data migration, the full transfer is completed within a limited window of time. Live systems experience downtime while data goes through ETL (Extract, transform, load) processing and transitions to the new database.
Trickle migrations, in contrast, complete the migration process in phases. During implementation, the old system and the new are run in parallel, which eliminates downtime or operational interruptions. Processes running in real-time can keep data migrating continuously.
The document should contain:
the requirements and the timeline for the project. Allocate time for every testing phase and validation phase.
Should define the migration type as described above.
Should consider security plans for the data. Any data that needs to be protected should have protection threaded throughout the plan.
Establish data quality and health checks by determining which data integrity problems could arise from your data set.
The Migration process needs to be detailed, taking care of:
Target database addressing using environment description.
Persistence of in-transit data: To resume at the point where special events happen, the system needs to keep an internal state on the migration progress: Errors, Connection Lost, large window processing of the data, provides process repeatability.
Define how to track the items that are filtered out from transformation/migration phases , you can then compare the source and target databases along with the filtered items.
For every batch of data define the exact plan and roll back strategy
Define Customer test to verify consistency: This check ensures that each data item is migrated only once, and that the datasets in the source and target databases are identical and that the migration is complete.
Define roles and responsibilities of the data migration.
A Validation phase has to be defined with:
Who has the authority to determine whether the migration was successful?
After database migration, who will validate data?
Which tool will help in data validation: this tool will be the main instrument to verify data consistency. This check ensures that each data item is migrated only once, and that the datasets in the source and target databases are identical and that the migration is complete.
Define backup and disaster recovery strategies. Create a DB backup of Mongo: replica set is a very good solution for availability but to provide real backup solution define a dedicated backup Mongo copy.
3) Build the Migration Solution
Break the data into subsets and build out migration of one category at a time, followed by a test. (TOOL) The Developer
4) Build the consistency validation Test
Build the customer check to compare the source and target databases along with the filtered items.
5) Back up
The data before executing. In case something goes wrong during the implementation, you canβt afford to lose data. Make sure there are backup resources and that theyβve been tested before you proceed (MongoDB: Replica set).
6) Conduct a Live Test
The testing process isnβt over after testing the code during the build phase. Itβs important to test the data migration design with real data to ensure the accuracy of the implementation and completeness of the application: consistency test. (TOOL)
7) Execute the plan
Implementing what described in step 2. (TOOL)
Migrate data in batches. Migration can take a long time, so batching up the data will prevent any interruption in service. Once the first batch is successfully migrated and tested, you can move on to the next set and revalidate accordingly.
8) Test your migration process
During the first batch of data being migrated, try to analyze all the steps and see if the process is completed successfully or if it needs to be modified before moving on to the next batch.
9) Validation Test
You need to verify that your database migration is complete and consistent. Before you deploy this production-level data, test the new data with real life scenarios before moving it to production in order to validate that all the work done aligns with the overall plan.
10) Audit
Once the implementation has gone live, set up a system to audit the data in order to ensure the accuracy of the migration. (Performance and monitoring)
Migration Consistency
The expectation is that a database migration is consistent. In the context of migration, consistent means the following:
Complete. All data that is specified to be migrated is actually migrated. The specified data could be all data in a source database or a subset of the data.
Duplicate free. Each piece of data is migrated once, and only once. No duplicate data is introduced into the target database.
Ordered. The data changes in the source database are applied to the target database in the same order as the changes occurred in the source database. This aspect is essential to ensure data consistency.
An alternative way to describe migration consistency is that after a migration completes, the data state between the source and the target databases is equivalent. For example, in a homogenous migration that involves the direct mapping of a relational database, the same tables and rows must exist in the source and the target databases.
Tools Comparison
Self scripted tools
These solutions are ideal for small-scale projects and quick fixes. These can also be used when a specific destination or source is unsupported by other tools. Self-Scripted Data Migration Tools can be developed pretty quickly but require extensive coding knowledge. Self-Scripting solutions offer support for almost any destination or source but are not scalable. They are suitable only for small projects. Most of the Cloud-Based and On-Premise tools handle numerous data destinations and sources.
Scalability: Small and 1 Location
Flexibility: any data
Maintenance, error management, Issues during execution
Some reasons for building database migration functionality instead of using a database migration system include the following:
You need full control over every detail.
You want to reuse functionality.
You want to reduce costs or simplify your technological footprint.
On-Premise tools
On-Premise solutions come in handy for static data requirements with no plans to scale. They are data center level solutions that offer low latency and complete control over the stack from the application to the physical layers.
Data center migration level.
Limited scalability.
Secure: give full process control.
CloudBased tools
Cloud-Based Data Migration Tools are used when you need to scale up and down to meet the dynamic data requirements (mainly in ETL solution). These tools follow a pay-as-you-go pricing that eliminates unnecessary spending on unused resources.
Based on the cloud.
Big Scalability.
Has security concerns.
Data Migration Software parameters
Setup: easy set up in your environment.
Monitoring & Management: provides features to monitor the ETL process effectively. Enable users to take reports on various crucial data sets.
Ease of Use: learning curve.
Robust Data Transformation: data transformation feature after the data is loaded into the database. You can just useSQL.
Setup
Monitoring & Management
Ease of Use
Robust Data Transformation
Pricing / Open Source
Custom functionality
Npm/Coding
no
Yes integrated in the solution
Tested Npm tool: migrate-mongo
free
AWS Data Pipeline
yes
yes
yes
yes
$0.60 to $2.5 per activity
Hevo Data
yes
yes
yes (Autoschema mapping)
yes
FREE (1 million events)
Talend Open Studio
yes
no
By GUI
yes
Open Source / Free
MongoSyphon
JSON format configuration files
No
no GUI, SQL, scheduling via cron
early stage tool, SQL
Open Source / Free
Meltano
yes
Airflow
yes
yes
Open Source / Free
Singer
Python
No
No
taps and targets (Meltano provided)
Open Source / Free
AirByte
yes
No
yes
SQL, dbt
Free
Several other tools and pricing both on open source and commercial:
Services Upgradability Service Profiling and data migration mapping
To describe services we introduce βServices canvasβ. A microservice canvas is a concise description of a service. Itβs similar to a CRC (Class-responsibility-collaboration) card thatβs sometimes used in object-oriented design. This is a template which allows a synthetic description of the service itself both for developers and stakeholder clarity. It will be compiled by developers and architects, and will be used as input during the delivery of the data migration process.
It has the following section: Service Name, Managed Data, Dependencies, Service API.
Canvas wil be used to describe the development realized in that very release in a way to be introduced incrementally. The Upgrade canvas is built not as a complete Service Canvas, but it must only describe the upgrading of the service/functionalities. In this way it will directly contain the same items really implemented in the release. A complete description of the service could also be provided in a SERVICE CANVAS that is out of the scope of the upgrading, much more difficult to be produced and more design oriented than the document.
Main Parameters
Name
Name of Service
Description
Type of Development
< Creation, Update, Deletion >
Version
< Major, Minor, Patch >
Capabilities
Main Service Functionality
Managed Data
Collection Names:
Type of Development: