The Alkira Cloud Area Networking platform is built in the cloud and for the cloud. It offers enterprises a platform which allows them to onboard workloads hosted with public cloud providers and connect them with other cloud workloads, on-prem branches, data centers, and globally-distributed users. So, any enterprise leveraging Alkira will have most of their critical traffic traversing through our infrastructure, and as a result the team here at Alkira takes extraordinary strides to make sure that it is fully secure, highly available and built to recover itself in different failure scenarios.
The Alkira Cloud Area Networking platform is based on a network of globally distributed Alkira Cloud Exchange Points (CXPs), deployed inside hyper-scale public cloud infrastructure. Alkira CXPs are interconnected over high bandwidth, low latency infrastructure. Customers can connect users, branch locations and cloud workloads to the geographically closest Alkira CXP, improving overall application performance by shortcutting the last-mile access over less efficient and less predictable Internet transport.
In terms of availability of the infrastructure, Alkira commits uptime SLA (Service Level Agreement), RTO (Recovery Time Objective) and RPOs (Recovery Point Objective) to its customers. Additionally, Alkira services also offer features to enhance network availability beyond the committed SLAs. In this blog we are going to cover both aspects of the solution and how customers can leverage them.
Availability of the Portal (Control and Management Plane)
There are three elements which makes an infrastructure highly available, first is to have enough redundancy built into the architecture so that there is not a single point of failure, secondly there needs to be proactive monitoring of the application so that you can be notified of the problem ahead of time, and lastly a failover mechanism to switch traffic to backup paths in case of a failure scenario.
The Alkira portal is hosted inside public cloud infrastructure, where there are multiple levels of resiliency and redundancy built-in at different layers of the system. Clustering and load balancers are implemented to have multiple nodes for any given function so that there is not a single point of failure. Nodes required to run applications span across different availability zones and regions to withstand datacenter or a complete regional failure inside the cloud service provider.
In addition to this high level of redundancy, the infrastructure is designed for resilience. Additionally, contingency plans are in place and tested on a regular basis to ensure minimal service impact to our customers in case of an outage within the cloud service provider network.
Data backup is performed for all the nodes on a daily basis so that in case of a failure they can be easily restored within the committed Recovery Time Objective (RTO) and Recovery Point Objective (RPO). There is a Business Continuity Plan and a Disaster Recovery Plan defined which is reviewed and tested annually. Testing is to help ensure that documented plans and procedures are functioning as designed and they get updated immediately in case there are any issues identified during testing. The Business Continuity Plan and Disaster Recovery Plan cover:
- Business impact and criticality analysis
- Procedures for responding to emergencies related to the production environment
- Restoration of lost data
- Continuing security during emergencies; and
- Emergency access procedures
From an architecture perspective, the system is designed in such a way that the access to the application is completely decoupled from the data plane. In case of a problem with the portal, there will not be any impact to customer traffic and their network; the only thing which will be affected is the ability to do any changes or updates to the current network.
Availability of Alkira CXPs (Data Plane)
Alkira Cloud Exchange Points (CXPs) are deployed in different availability zones for redundancy. All the connections from users, branches and workloads will be multihomed to CXP in each availability zone. This way there will not be a single point of failure and even if there is an outage to an availability zone, there will be no impact to the customer’s network as the traffic can continue to flow through the redundant connection to other availability zone seamlessly.
Figure: Cloud Availability Zone Redundancy
This takes care of an availability zone failure inside the cloud service provider and it is a default option for configuring any connector. But this only works when there is an availability zone failure. In case if there is an outage which affects the cloud service provider region, the customer’s network will still be impacted even if you have connections with multiple availability zones.
Region failure is rare but it does happen and can impact the availability of applications and connectivity of users. With Alkira, you can also design your network to withstand cloud region failure as well. Alkira offers a feature to set up CXP failover for cross-region redundancy. In this case, a CXP failover option can be enabled for a connector while configuring it. Once configured, a backup connection is created with a CXP in another region. By default, the backup connections will be disabled and will be enabled when CXP failover is triggered.
Figure: Cloud Region Redundancy
Lastly, since the CXPs can be deployed in any hyperscale cloud service provider, Alkira can offer inter-cloud redundancy with inter-cloud failover to its customers. Enterprises can design their network to host CXPs in different cloud service providers so that in case there is an outage in one of the cloud providers, the user and application traffic can still route through the other provider.
Conclusion
In this blog I have discussed options which an enterprise can leverage out of box to provide protection against different types of failure scenarios. In addition, there are features which can be configured on top of the default options to even enhance and improve the availability of the network even further. However, it depends on the customer’s business requirement on to what extent they would like to leverage these features as there is cost related to these options. For instance, in order to achieve inter-region failover you will need to run two parallel infrastructures, one for each region at all times.
So in conclusion, the good news is that Alkira allows you to design your network with maximum redundancy to withstand all different kinds of failure but you should evaluate them closely against your business requirements.
Learn more about how to build cloud and multi-cloud networks with Alkira at https://www.alkira.com/cloud-networking/
Take a tour of Alkira Cloud Area Networking solution at https://www.alkira.com/virtual-tour/
Request your own personalized demo at https://www.alkira.com/demo