Seamless Migration: Moving from On-Premise to AWS Cloud
By Bhuvaneswari Subramani / Mar 28, 2023
1.0 Background
A global financial institution, our customer, acquired a US-based company with technology platforms, full-service travel agency, gift cards, merchandise, and points back business. The acquisition realized that the existing offerings and processes were unable to scale effectively with the expected larger customer base, requiring major expansion of existing on-premise application infrastructure.
To cater to the modern fast-paced business landscape, our customers engaged Intuitive.Cloud to modernize their infrastructure to achieve a 4x-5x increase in scaling capacity. They also wanted to retain their existing application platform architecture by adopting AWS's auto scaling capabilities with control to adjust server capacity to meet the workload changes. Additionally, implementing robust security controls was a top priority for them, as well as optimizing both dynamic and static resources to achieve the right balance between performance and cost.
2.0 Customer’s Challenge
The present state of the datacenter environment was estimated to result in significant capital expenditures and delays due to manual processes in upgrades, maintenance and deployment of new features. This initiative was aimed at avoiding the missed deadlines and, ultimately, optimizing the costs using cloud infrastructure and improving the performance of applications.
3.0 Intuitive.Cloud’s Cloud Migration Approach
Intuitive.Cloud’s strategy for Cloud Migration involves 3 phases viz Discover, Design, and Migrate. This blog traverses through those phases and summarizes the value add brought to the customer at the end of cloud migration.
4.0 Discovery Phase
The discovery phase of Cloud Migration is to gain deeper understanding of organization's IT infrastructure, applications along with dependencies and integrations, and business requirements.
The Intuitive.Cloud team strives to deeply understand our customers' needs and desires by adopting a customer-centric approach. This involves investing enough time to build a relationship with the customers, identifying their business ambitions and where their challenges lie. The team also relies on data to substantiate their understanding of the customers' needs. With this knowledge, our team works towards crafting the best solutions to alleviate customers' pain points. We carefully narrow down the options until we find the simplest and well-architected solution that effectively resolves the issue at hand.
The customer’s on-premises environment is a complex multi-tiered architecture with diverse software applications and components, such as:
- Application servers that have APIs located behind an F5 load balancer
- The application servers communicate with AWS-hosted APIs through a public endpoint to interact with 3rd party applications.
- DFS Replication is established between the application servers using port 42424 and talks to the state servers through same port.
- Highly available SQL Server databases that use a combination of Always On and Failover Cluster Instances.
- Ingress traffic that undergoes filtration through the Palo Alto perimeter firewall
- CrowdStrike, a security solution that safeguards against cyber threats
- Exabeam, a log aggregation tool
- Dynatrace, the primary application monitoring solution. Applications, State and DB Servers are integrated with Dynatrace SaaS solution
- The data center was connected to AWS through a Direct Connect circuit.
5.0 Design Phase
In the design phase, Intuitive.Cloud team envisions processes that connects business objectives and outcomes to enable technologies, identifies key measures, and define a cloud migration strategy along with transparent operating model that delivers sustained, quantifiable benefits to the customer.
The following architecture was designed with emphasis on security, resiliency, scalability and observability along with multi-account
5.1 Well-Architected Security Design
- Identity and Access Management to grant access to cloud resources at fine-grained levels
- Cloud Trail for monitoring all API calls in different AWS accounts for auditing purposes
- Security Group Chaining for establishing communication between Application Servers, State Servers and Database servers
- Enable VPC Flowlogs into CloudWatch and exported to S3, which will be integrated with network monitoring tool chain (Exabeam)
Security Group Chaining
5.2 Well-Architected Resiliency & Scalability Design
- Application tier: This layer of the architecture is responsible for processing user requests and serving responses. In this case, the application tier is deployed across three availability zones (AZs) for high availability and fault tolerance. Additionally, load distribution is implemented to balance the workload across all three AZs.
- State Server: This component is responsible for maintaining session state across multiple servers in a web farm. In this case, there are three nodes: one active and two stand-by. The active node is responsible for handling all session requests while the standby nodes are ready to take over if the active node fails.
- Database: This component stores and retrieves data for the application. In this architecture, the database is deployed across three nodes: one master for read and write operations, and two standby nodes for failover with SQL Always On cluster spread across multiple availability zones for additional resiliency and redundancy
- Horizontally scaled Web tier using AWS Autoscaling: This component automatically scales the web tier horizontally based on the workload. Autoscaling adds or removes web servers to match the demand for the application, ensuring that there are always enough resources to handle user requests.
- Backup using AWS Backup service with native snapshot technology: This component ensures that the data in the application is backed up regularly using AWS Backup service. This service utilizes snapshot technology to create backups of the data, which can be restored in case of data loss or corruption.
5.3 Observability
Observability refers to the practice of monitoring and gaining insights into a system's behavior and performance, often through the use of various tools and techniques. In this implementation, Dynatrace serves as the primary monitoring tool, providing both infrastructure monitoring and application performance monitoring (APM). Additionally, other tools such as Zabbix, Exabeam, and CrowdStrike will be utilized for monitoring and analysis purposes.
5.4 Multi-Account
Using environment-specific AWS accounts is a best practice in cloud computing that involves creating separate AWS accounts for different environments, such as development, testing, staging, and production. This approach provides several benefits, including improved security posture, cost-control, better resource management and regulatory compliance. Intuitive.Cloud recommended separating environments into different AWS accounts, to isolate sensitive data and applications from each other in their cloud environments.
5.5 Networks Design
- Environment specific AWS accounts to be provisioned with dedicated VPCs, connected to both certificate signing requests (CSR’s) using virtual private gateway (VGW) tunnels
- AWS web application firewall (WAF) to be used to filter source specific traffic from the internet to Akamai, Akamai to filter the traffic to the AWS Application Load Balancer (ALB), AWS ALB to distribute traffic between servers
- Security group and NACL’s will be configured to allow explicit traffic to ensure proper security controls
- Existing AWS Direct Connect service to be leveraged to connect AWS resources to on-premises servers
- AWS Transit Gateway is set up to simplify network management by enabling customers to connect their VPCs across multiple AWS accounts, and remote networks through a single gateway.
6.0 Migration Phase
6.1 App and State Server Migration
- For achieving the shortest path to migration, the Intuitive.Cloud team exported Web Server from on-premises and imported into AWS Cloud using VM export / import, cleaned up and optimized it for ready to use on AWS Cloud.
- Designed and developed Infrastructure as Code using AWS CloudFormation template to create static servers (App Server, State Server and Database Server) and to spin up controlled number of servers using Autoscaling group.
- Automated the server build process by pre-backing production class base AMI which has SSCM agent, Duo, Zabbix agent, Crowdstrike agent, and Trend Micro Deep Security agent installed
- The whole suite of PowerShell scripts was developed to register the newly launched server to join the domain, added to DFS replication group, enabled Exabeam, Dynatrace, Trend and other tools. When the instance gets terminated, the script removes the instances from AD entries, DFS replication group entries before removing it from the Auto Scaling Group.
- Backup:
- IIS Logs from Web server are uploaded to S3 every 5 mins
-
Configured AWS native backup for Static Servers like DFS Master, State Servers, Patching Server hosted on EC2 instances and stored AMIs in S3 with 30-day retention period
6.2 Database Migration
To migrate the database to AWS Cloud, the following steps were taken:
- First, the necessary AWS infrastructure was established using a CloudFormation Template with one primary and two standby instances distributed across three availability zones.
- Next, the Always-On Availability Group was configured and the on-premise database was backed up and restored onto the AWS instance. The databases were then configured to participate in the availability group, and synchronization was established between them.
- After migration, the migrated database was verified to ensure that it was functioning correctly. Failover testing was also performed between the primary and secondary instances, and any necessary post-migration tasks were completed to ensure the successful completion of the migration process.
- Backup: Regular backups of both data and log databases were established using SQL Server Management Studio on the primary server EC2 instance. These backups were then scripted using the AWS CLI and scheduled to be transferred periodically to S3. To ensure the reliability of the backups, they were tested to verify that they could be restored in the event of a disaster.
6.3 Best practices
- CloudFormation templates were parameterized to be reused for setting up development, testing, staging and production environments.
- Developed a consistent tagging strategy to identify the application, environment, owner, cost center etc. and tagged all resources appropriately as part of the environment building. This best practice will help the customer to effectively manage and organize their AWS resources, making it easier to track costs, troubleshoot issues, and optimize their infrastructure.
- Assessed the application infrastructure sizing as per customer’s requirement. This assessment included determining the appropriate configuration for achieving a target of 25 transactions per second while maintaining 40% resource utilization capacity, with a focus on memory and network-intensive workloads for both web and state servers.
6.4 AWS Services used
- AWS Direct Connect & Gateway
- AWS Transit Gateway
- Amazon Virtual Private Cloud (VPC)
- Amazon Elastic Compute Cloud (EC2)
- Amazon Simple Storage Service (S3)
- AWS Lambda
- Amazon Route 53
- AWS Key Management Service
- Amazon CloudWatch
- AWS CloudTrail
- AWS Secrets Manager
- AWS System Manager (Parameter Store and Session Manager)
- AWS EBS Snapshots
6.5 Benefits
- Server rebuild improved significantly from weeklong process to minutes by automating 20+ manual steps.
- Business was able to achieve 4x – 5x infrastructure scaling quickly
- With IaC code designed very intelligently customer’s team was able to accelerate the roll-out of development, test, staging, and production environments, while reducing software development lifecycles
- The company was able to avoid significant capital expenditures that would have been required to update their on-premises disaster recovery data center hardware. Instead, they decided to reallocate these funds towards operating their infrastructure on AWS.
- The standard operating procedures were defined and cross-trained for customers to operate independently to provision newer environments on the fly.
7.0 Future State
The Customer's team collaborated with the Intuitive team to perform acceptance testing and prepared to operate on the Cloud. An organization that has successfully transitioned to the cloud is differentiated not only by its strategy and execution, but also by its focus on continuous measurement and improvement. The "lift and shift" cloud migration strategy offer speed and lays the ground for taking advantage of true benefits of the cloud. Intuitive recommends that as organizations mature, they shall adopt cloud-native technologies in the future aggressively to modernize and maximize the benefits of the cloud.
8.0 Conclusion
The mission-critical application migration from the datacenter environment to AWS was completed successfully within the tight deadline, with no compromise in functionality or automation.
- 100% of the associated capital cost was avoided replacing with variable cost.
- 70% of core application and database migration were automated and with that the customer's IT infrastructure staff were trained to set up new environments quickly, which could be built from scratch in just a few hours and destroyed after performance testing on the new code. This has enabled them to deploy code and test new functionalities in a matter of hours, significantly reducing the time previously spent waiting for environment availability, which could be days or even months.
- Applications and Databases were migrated to latest platforms / enterprise software versions
This agile process has allowed the customer to accelerate their development cycle and improve their overall time-to-market.