Introduction
This document can be used as a tool to implement an upgrade process in the Hedera Guardian application. It provides detailed step-by-step instructions for upgrading an open-source Hedera Guardian application from the current version to the target version. It includes expanded information and additional guidance for each section of the upgrade process. Please follow the instructions outlined below:
Actors and Participants
The actors that will be involved in the guardian upgrading process are:
Guardian Development Team
Solution development.
Documentation provisioning.
Guardian Administrator (customer side)
Backup execution.
Scripting Execution.
Configuration customization.
Theory
Requirements
Depending on how large the upgrades are, there could be a lot of work keeping versions correct. Proper tools, documentation, and methodologies should be created to respond to upgrade needs (How will our customers upgrade their solution? What solutions need to be put in place? Etc.)
Related requirements:
Find a qualified source to create an enterprise-grade version of Guardian;
Consolidate, package, and normalize the solution architecture to match development best practices, supporting existing Hedera environments (currently defined as a local node, testnet, previewnet, or mainnet) deployed on-premises and on clouds;
Cloud Infrastructure: All Guardian source code and secrets should be deployed via Infrastructure as Code in cloud. In particular, the repo should contain all the artifacts and the documentation for the deployment of the Guardian on Amazon Web Services, Google Cloud Platform and Microsoft Azure.
Data Upgrading Process
The upgrading of the Guardian functionalities may include the necessity of applying changes in the database schemas. In this case the Process of Upgrading is split between Developer and Customer.
Data Upgrading process involves the developer team providing the solution for Upgrading while the Customer is the solution executer. The main problem while upgrading a run time operational database is the migration of all data from the previous version schema to the new version.
The migration process guides the team to produce artifacts that will help to correctly define the migration itself and the customer to decide for upgrading and executing the data migration.
In this case the migration that we account for is an homogeneous migration: a migration from source databases to target databases where the source and target databases are of the same database management system. During upgrading the system, the schemas for the source and target databases are almost identical except for changes in some of the fields, collections and documents. For changing data the source databases must be transformed during migration.
1) Data Migration Profiling:
Without a good understanding of the Data model the organization could run into a critical flaw that halts the system and brings Guardian to stop for data corruption and inconsistency. This phase would have βData Migration Modelβ as output. This document outlines all the data that needs to be migrated, the complete mapping between the Data Source and Data Destination and every transformation in terms of:
Data type: to cast the source value into the target value based on type transformation rules.
Data structure: to describe modification of the structure of a collection in the database model.
Data value: to change the format of data without changing the data type.
Data enrichment and correlation (adding and merging to one collection).
Data reduction and filtering (splitting to several collections).
Data views: to allow the maintenance of DAO contracts during Data reduction.
Furthermore, the document should:
Map every data to User Functionality (Rest API) that involves that data.
Map every data to messages data flows to realize the functionality.
Specify data replication in the guardian data sources (only DB Data, Blockchain Data, Multi Service).
Break the data into subsets to determine all the data changes that have to be applied together.
The document has to specify the following data parameters:
Expected size of your data,
the number of data sources,
the number of target systems,
Migration time evaluation per data size reading, writing, network latency and the expected time per expected data size.
2) Design phase: this phase has the βDesign Documentβ as output.
The type of data migration could be either big bang or trickle:
In a big bang data migration, the full transfer is completed within a limited window of time. Live systems experience downtime while data goes through ETL (Extract, transform, load) processing and transitions to the new database.
Trickle migrations, in contrast, complete the migration process in phases. During implementation, the old system and the new are run in parallel, which eliminates downtime or operational interruptions. Processes running in real-time can keep data migrating continuously.
The document should contain:
the requirements and the timeline for the project. Allocate time for every testing phase and validation phase.
Should define the migration type as described above.
The Migration process needs to be detailed, taking care of:
Target database addressing using environment description.
Persistence of in-transit data: To resume at the point where special events happen, the system needs to keep an internal state on the migration progress: Errors, Connection Lost, large window processing of the data, provides process repeatability.
Define how to track the items that are filtered out from transformation/migration phases , you can then compare the source and target databases along with the filtered items.
For every batch of data define the exact plan and roll back strategy
Define Customer test to verify consistency: This check ensures that each data item is migrated only once, and that the datasets in the source and target databases are identical and that the migration is complete.
Define roles and responsibilities of the data migration.
A Validation phase has to be defined with:
Who has the authority to determine whether the migration was successful?
After database migration, who will validate data?
Which tool will help in data validation: this tool will be the main instrument to verify data consistency. This check ensures that each data item is migrated only once, and that the datasets in the source and target databases are identical and that the migration is complete.
Define backup and disaster recovery strategies. Create a DB backup of Mongo: replica set is a very good solution for availability but to provide real backup solution define a dedicated backup Mongo copy.
3) Build the Migration Solution
Break the data into subsets and build out migration of one category at a time, followed by a test. (TOOL) The Developer
4) Build the consistency validation Test
Build the customer check to compare the source and target databases along with the filtered items.
5) Back up
The data before executing. In case something goes wrong during the implementation, you canβt afford to lose data. Make sure there are backup resources and that theyβve been tested before you proceed (MongoDB: Replica set).
6) Conduct a Live Test
The testing process isnβt over after testing the code during the build phase. Itβs important to test the data migration design with real data to ensure the accuracy of the implementation and completeness of the application: consistency test. (TOOL)
7) Execute the plan
Implementing what described in step 2. (TOOL)
Migrate data in batches. Migration can take a long time, so batching up the data will prevent any interruption in service. Once the first batch is successfully migrated and tested, you can move on to the next set and revalidate accordingly.
8) Test your migration process
During the first batch of data being migrated, try to analyze all the steps and see if the process is completed successfully or if it needs to be modified before moving on to the next batch.
9) Validation Test
You need to verify that your database migration is complete and consistent. Before you deploy this production-level data, test the new data with real life scenarios before moving it to production in order to validate that all the work done aligns with the overall plan.
10) Audit
Once the implementation has gone live, set up a system to audit the data in order to ensure the accuracy of the migration. (Performance and monitoring)
Migration Consistency
The expectation is that a database migration is consistent. In the context of migration, consistent means the following:
Complete. All data that is specified to be migrated is actually migrated. The specified data could be all data in a source database or a subset of the data.
Duplicate free. Each piece of data is migrated once, and only once. No duplicate data is introduced into the target database.
Ordered. The data changes in the source database are applied to the target database in the same order as the changes occurred in the source database. This aspect is essential to ensure data consistency.
An alternative way to describe migration consistency is that after a migration completes, the data state between the source and the target databases is equivalent. For example, in a homogenous migration that involves the direct mapping of a relational database, the same tables and rows must exist in the source and the target databases.
Tools Comparison
Self scripted tools
These solutions are ideal for small-scale projects and quick fixes. These can also be used when a specific destination or source is unsupported by other tools. Self-Scripted Data Migration Tools can be developed pretty quickly but require extensive coding knowledge. Self-Scripting solutions offer support for almost any destination or source but are not scalable. They are suitable only for small projects. Most of the Cloud-Based and On-Premise tools handle numerous data destinations and sources.
Scalability: Small and 1 Location
Flexibility: any data
Maintenance, error management, Issues during execution
Some reasons for building database migration functionality instead of using a database migration system include the following:
You need full control over every detail.
You want to reuse functionality.
You want to reduce costs or simplify your technological footprint.
On-Premise tools
On-Premise solutions come in handy for static data requirements with no plans to scale. They are data center level solutions that offer low latency and complete control over the stack from the application to the physical layers.
Data center migration level.
Limited scalability.
Secure: give full process control.
CloudBased tools
Cloud-Based Data Migration Tools are used when you need to scale up and down to meet the dynamic data requirements (mainly in ETL solution). These tools follow a pay-as-you-go pricing that eliminates unnecessary spending on unused resources.
Based on the cloud.
Big Scalability.
Has security concerns.
Data Migration Software parameters
Setup: easy set up in your environment.
Monitoring & Management: provides features to monitor the ETL process effectively. Enable users to take reports on various crucial data sets.
Ease of Use: learning curve.
Robust Data Transformation: data transformation feature after the data is loaded into the database. You can just useSQL.
Several other tools and pricing both on open source and commercial:
Services Upgradability Service Profiling and data migration mapping
To describe services we introduce βServices canvasβ. A microservice canvas is a concise description of a service. Itβs similar to a CRC (Class-responsibility-collaboration) card thatβs sometimes used in object-oriented design. This is a template which allows a synthetic description of the service itself both for developers and stakeholder clarity. It will be compiled by developers and architects, and will be used as input during the delivery of the data migration process.
It has the following section: Service Name, Managed Data, Dependencies, Service API.
Canvas wil be used to describe the development realized in that very release in a way to be introduced incrementally. The Upgrade canvas is built not as a complete Service Canvas, but it must only describe the upgrading of the service/functionalities. In this way it will directly contain the same items really implemented in the release. A complete description of the service could also be provided in a SERVICE CANVAS that is out of the scope of the upgrading, much more difficult to be produced and more design oriented than the document.
Service versioning and compatibility
To describe the compatibility between services in more detail it is possible to provide a square compatibility matrix.
To build this matrix it is possible to start with a dependency matrix detailing all the services dependent from one another in terms of service producers and service consumers. This matrix wonβt be a complete correlation matrix but on the rows it will have just the upgraded and new services while on the columns it will show all services in the application.
Starting from this table it will be easier to infer the dependency between different versions of one service with the dependent ones versions.
For example
Service1 2.1.3 release is compatible with Service2 starting from version 1 until version 2.
Service1 2.1.3 is compatible with only with version 3.2.x of service3 and just bug fixes of that
Service 2.1.3 is backward compatible with with all versions of service6 until 4.x.x
Service 3.2.3 β¦β¦\
This solution is about to provide upgrading delta Online reference.
Here are two tools to implement the complete matrix analysis for microservices:
Data Model Reference
In case of newly introduced data, the data model section of the canvas will be the JSON document file that describes the collection itself.
In case of a data update, the reference Data Model will be the link to the Data mapping document.
The Data mapping document describes the model for the data migration. The document should outline all the data that needs to be migrated, the complete mapping between the Data Source and Data Destination and every transformation in terms of:
Data type: to cast the source value into the target value based on type transformation rules.
Data structure: to describe the structure modification of a collection in the database model.
Data value: to change the format of data without changing the data type.
Data enrichment and correlation (adding and merging to one collection).
Data reduction and filtering (splitting to several collections).
Data views: to allow the maintenance of DAO contracts during Data reduction.
The canvas Itself provides the framework in which the data belongs. Overmore the document should:
Map every data to User Functionality (Rest API) that involves that data.
Map every data to message data flows to realize the functionality.
Specify data replication in the guardian data sources (only DB Data, Blockchain Data, Multi Service).
Break the data into subsets to determine all the data changes that have to be applied together.
Here is how the mapping will look like
The following information is contained in the table:
1) Mapping indicator (Values A: Add, D: Delete, C: Change)
2) Change description (Indicates mapping changes introduced)
3) Key Indicator (Indicates whether the field is a primary key or not)
4) Source Table/Collection Name
5) Source Field Name
6) Source Field Length
7) Source Field Data Type
8) Source Field Description(The description will be used as a meta data for end user)
9) Business Rule to transform data if needed
10) Target Table/Collection Name
11) Target Field Name
12) Target Data Type
13) Target Field Length
14) Description and comments
Methodologies, best practice for microservices upgrading
1) Services should be organized around business domain boundaries:
Architects recommend the use of βseparation of concernsβ: strong internal cohesion in each microservice and loose coupling microservices should be grouped according to their problem domain.
Architects need to have a strong understanding of the relation between impacted use cases and backend data flows in a way to always map use case modification in backend microservices upgrading and know how data modification impacts interservices messages between consumer and produced services and their APIs.
A service here has the sole authority over its data and exposes operations to other services.
2) Keep admin scripts together with the application codebase
Guardian migration consists of a small script that runs as the first step of every first time installation performing a one-time load. Is it possible to write a small function to read and save data in batch into the database running these scripts offline.
Guardian dials with Schema breaking changes
Removing or renaming an element;
Changing any of its non-descriptive properties e.g. type or readOnly status.
Deprecation Notice:
Issued via the deprecated meta-data annotation;
Release Notes;
VC revocation notice is issued into the corresponding Hedera Topic.
Guardian dials with Policy Breaking changes
Removing or renaming a block, changing any of its non-descriptive properties.
Changing used schema version to a new one with breaking changes. (Changes Impact)
Changing workflow sequence, dependencies or bind block.
Introducing new, or changing existing external data sources.
Guardian dials with Breaking changes in general
Removing an API endpoint, HTTP method or enum value;
Renaming an API endpoint, HTTP method or enum value;
Changing the type of the field;
Changing behavior of an API request.
3) Every microservice should always explicitly declare all of its dependencies.
We should do this using a dependency declaration manifest. For NodeJS we have NPM.
A different possibility could be the use of dependency Management tools:
ORTELIUS: Ortelius is an open source, supply chain evidence catalog for publishing, versioning and sharing microservices and other Components such as DB objects and file objects. Ortelius centralizes everything you need to know about a component-driven architecture including component level ownership, SBOMs, vulnerabilities, dependency relationships, key values, deployment metadata, consuming applications and versions.
ISTIO: A completely different approach that has been found during the preparation of the present methodology. The approach suggests the usage of the Service Mesh pattern for microservices. Also this choice represents a viable path but needs rethinking to the platform architecture. Also the Documenting path proposed here will naturally facilitate the assumption of a similar pattern.
4) A microservices app should be tracked in a single code repository and must not share that repository with any other apps.
Versioning:
All microservices should make it clear what version of a different microservice they require and what version they are.
A good way of versioning is through semantic versioning, that is, keeping versions as a set of numbers that make it clear when a breaking change happens (for instance, one number can mean that the API has been modified).
Version Technique
Header versioning: This microservice versioning approach passes version information through the HTTP protocol header βcontent-versionβ to specify a particular service.
5) Microservice apps are supposed to dispose of a service and to handle it gracefully.
Application processes can be shut down on purpose or through an unexpected event. An application process should be completely disposable without any unwanted side-effects. Moreover, processes should start quickly.
An important part of managing dependencies has to do with what happens when a service is updated to fit new requirements or solve a design issue. Other microservices may depend on the semantics of the old version or worse: depend on the way data is modeled in the database. As microservices are developed in isolation, this means a team usually cannot wait for another team to make the necessary changes to a dependent service before going live. The way to solve this is through versioning. All microservices should make it clear what version of a different microservice they require and what version they are.\
6) Microservice apps are expected to run in an execution environment as stateless processes.
In other words, they can not store persistent state locally between requests.
Upgrading Guardian
Guardian is a Microservices Application organized with an API Gateway and the Message System NATS. This architecture is natively thought of as a cloud application so it can be improved by deploying on cloud.
There are several benefits in deploying microservices architectures on cloud thanks to the Application Managers: