DNS LB for isolated Stacks
If you are using DNS LB data should no longer to be contained to specific region for primary operations. Since DNS LB is used request can hit to any region.
This kind of model global storage or database should be used and data need to access from any region. This can be implemented by asynchronous or synchronous cross region replication.
Intention of DNS LB is to provide high availability. End to End health check should be in place for each service stacks within that region.It is upto the SRE to combine the end to end health of the application together and provide this to DNS LB.
For multi regional topology, DNS LB is configured based on health checks, combined with LB and routing policy.
We can discuss two approaches.
- For each region they do have dedicated VIP’s as backup on another region. If health check fails on primary region due VIP has problem then back VIP is used. If there are more than one backup VIP it can be placed in DNS routing policy such as (Round robin, Weighted Round robin, Geo mapping) over VIP’s backup pool.
- Having backup VIP’s in DNS and assign weights used to guide the traffic across those. Health checks can attach with the routing policy.So unhealthy IP’s gets ejected from the pool and weight round robin re-calculate it send the traffic the healthy VIP’s. Example Region AA has 50% and Region BB and CC has 25% each. If Region AA become unhealthy WRR can re-calculate Region BB 50% and CC as 25%.
Problem with this approach if single stack become unhealthy example (LB1,FE,BE,LB2,MYSQL) entire region will mark as red and traffic has to routed to healthy region.
Monitoring has to be some matured and needs to build on understanding of health check of all layer of each region and give this to DNS LB.
Propagate Layer failure up the stack to LB1— If failure in the region all FE will eventually know that they are unhealthy same has to communicate to LB1. No FE to send the traffic in the region.DNS LB health check to LB1 to fail and region to stop using that region. Every service in that region has to implement it this logic.
Collect health of all layer and report to LB1— The stack of services within a region is declaratively defined and a region is considered healthy only when all services are healthy by collecting status that is collected by an independent health observer service for the application. The observer collects the health status across all services in the stack and sends the combined health status to Load Balancer 1. If the collected status is unhealthy, then the DNS LB will fail a health check and take the region out of pool.
If there are multiple domains are deployed behind example.com such as /video /shopping / text and VIP is shared across all the distinguished path. So DNS LB which would required to understand the health of each service. If example.com/video is unhealthy in that region and collected health check will fail for example.com telling LB1 not to use this region.
Merits
- Any region can serve user request.
- If any DDOS attack happens it can be mitigated across multiple region.
- If failure in any region can be routed to working region.
- SRE can have manual control per region traffic distribution by using DNS LB.
- Seperate VIP’s behind DNS can easily point to completely different deployment, different cloud or on-premises DC.
Demerits
- Need to place health check it can either of collection or propagation across all the stack of the service and each zones.
- DNS TTL delays Failover to healthy region and the times to Failover is not deterministic.
- DNS LB is based on DNS request that do not represent volume of actual traffic. Cannot determine how much regional capacity will be needed to serve traffic represented by these DNS request.
- Application capacity can be stranded within a region. Some apps that are not auto scaled and apps has diurnal traffic.
- Capacity is not available to other regions for use. DNS LB may not have the means to take into account the region capacity when routing traffic.