Building a fast CDN with anycast (BGP based)

What is anycast ?

Actually, it’s not a protocol, it is not a variant of multicast or broadcast. Anycast is just the name given to a unicast IPv4 or IPv6 address that’s announced from several routers inside an AS or WAN.

How does it work ?

Anycast lets the internal routing protocols like OSPF, EIGRP or iBGP handle these different announcements and select the fastest path via their respective selection algorithms, i.e. Dijkstra for OSPF.

What does it offer ?

Anycast provides a “geographically” distributed network that enables fast content delivery. Web-services are, as we know, Domain Name System aware and also dependent. Web is global and without frontiers, contents should be accessible and usable from anywhere, but physics are still a reality on planet Earth. Of course, we discovered traffic black holes some years ago but are still facing latency issues when it’s about distributing contents from the one side of the world to the other. Anycast provides, combined with the Domain Name System, a very fast responsiveness of all web-based applications and contents we might want to deliver. Anycast is often used in Content Delivery Networks (CDNs) like Cloudflare or Akamai.

A possible use case implementation

The easiest and most used case is the Domain Name System. As all common web-based services are DNS dependent, all clients have to accomplish one or several DNS requests. Too bad, your DNS server is on the other side of the world, 500 milliseconds to download 4 bytes of data… Some people would have CTRL+F4 the browser window before the DNS request came back.

There is an easy answer to that issue : anycast. Let’s assume we have 2 routers that are running iBGP together as there are in the same AS. OSPF is used as internal routing protocol. Router 1 will be the edge router, Router 2 will be a backbone router that might be connected to IXs and/or tier 1 ISPs. Also we will need a server that assumes the role of the DNS server connected to that Edge router RT1. An anycast address is typically written as /32 in IPv4 or /128 in IPv6 for the simple reason that it would be present on a node under the form of a Loopback interface.

Note that all these steps make most sense whith similar anycast-configured webservers distributed all around the world. These servers would serve the static content part of websites like pictures, videos and static pages/code. The dynamic parts like DB requests or uploads,etc would be done (in a simple scenario) by a centralized server. Of course, the HTTP requests would be done to the right servers via DNS sub-naming with URLs like static.mydomain.net and dynamic.mydomain.net coded in the webapp variables.

Consider that network config :
 Image

 What we want to accomplish is the following : advertising the loopback on the DNS server to our gateway, which is also our Edge Router. This advertising can be done directly via OSPF, but I choosed iBGP for different reasons. First, with BGP, networking guys can ensure nothing else than the desired prefixes can be sent by the server. I agree with Ivan when it comes to not letting server guys control any edge of the routing unicorn-rainbow black star thing (read his awesome post here : http://blog.ipspace.net/2016/03/sysadmins-shouldnt-be-involved-with.html). At least when something goes wrong, they will not have to be involved, a way better argument than telling them they will not know what they’re doing. Another plus for ibgp vs ospf is that it involves a “session” that has to be established, which is a kind of agreement, set by both sides through route-maps/prefix-filters/whaterveryouneed, which is what we want for an anycast server.

It would be different if we were extending our backbone, which would then be based on physical trust, one device plugged to the others port, which defines that they are part of the same backbone, the same layer of service, which is based on mutual trust.

 Image
 Now, show us some technical stuff to accomplish all these things. Note that I will not give you a complete A to Z solution. Code and commands are changing over time, additionnaly it’s not a setup one can make without thinking by itself.
 
 Linux System configuration

We will need two routers and at least one Debian/Quagga server with IP forwarding activated :

sysctl -w net.ipv4.conf.default.forwarding = 1

sysctl -w net.ipv6.conf.default.forwarding = 1

Note that disabling Router Advertisements and Router solicitations is a best practice in server LAN environment :

sysctl -w net.ipv6.conf.default.accept_ra = 0

sysctl -w net.ipv6.conf.default.router_solicitations = 0

Setup

Your Quagga router config should look like this :

hostname dns-node-1
!
password zebra
enable password zebra
!
interface eth0
 description dns-node-1
 ip address 172.16.0.2/24
 ipv6 nd suppress-ra
!
interface lo
 description dns-ip-1
 ip address 10.0.0.1/32 label lo:0
!
interface lo0
 ipv6 nd suppress-ra
!
router bgp 65500
 bgp router-id 172.16.0.2
 bgp log-neighbor-changes
 network 10.0.0.1/32
 neighbor 172.16.0.1 remote-as 65500
 neighbor 172.16.0.1 update-source lo
 neighbor 172.16.0.1 next-hop-self
 neighbor 2a01:200:1:1::1 remote-as 65500
 neighbor 2001:200:1:1::1 update-source lo
 no neighbor 2001:200:1:1::1 activate
!
 address-family ipv6
 network 2001:200:1:f:10::1/128
 neighbor 2001:200:1:1::1 activate
 exit-address-family
!
ip forwarding
!
line vty
!

Now take a look at the RT1 router config, note that the lab router was a Brocade CER :

!
no spanning-tree
!
vlan 1 name DEFAULT-VLAN 
!
ipv6 prefix-list anycast-ipv6 seq 5 permit 2001:200:1:f:10::1/128
!
ip prefix-list anycast-ip seq 5 permit 10.0.0.1/32 
!
ip router-id 192.168.254.1
ipv6 nd global-suppress-ra
hostname rt1
!
router ospf
 area 0 
 redistribute bgp route-map bgp-to-ospf 
!
ipv6 router ospf
 area 0
 redistribute bgp route-map bgp-to-ospf-6
!
interface loopback 1
 port-name iBGP loopback
 ip ospf area 0
 ip ospf passive
 ip address 192.168.254.1/32
 ipv6 address 2001:200::1/128
 ipv6 ospf area 0
 ipv6 ospf passive
!                                                                
interface ethernet 1/1
 enable
 ip ospf area 0
 ip address 192.168.0.1/31
 ipv6 address 2001:200:f:f::1/126
 ipv6 ospf area 0
 ipv6 nd suppress-ra
!
interface ethernet 1/2
 enable
 ip address 172.16.0.1/24
 ipv6 address 2001:200:1:1::1/64
 ipv6 nd suppress-ra
!
router bgp
 local-as 65500
 neighbor IBGP peer-group
 neighbor IBGP next-hop-self
 neighbor IBGP update-source loopback 1
 neighbor IBGP6 peer-group
 neighbor IBGP6 next-hop-self
 neighbor IBGP6 update-source loopback 1                          
 neighbor ANY peer-group
 neighbor ANY next-hop-self
 neighbor 172.16.0.2 remote-as 65500
 neighbor 172.16.0.2 peer-group ANY
 neighbor 192.168.254.2 remote-as 65500
 neighbor 192.168.254.2 peer-group IBGP
 neighbor 2001:200::2 remote-as 65500
 neighbor 2001:200::2 peer-group IBGP6
 neighbor 2001:200:1:1::2 remote-as 65500
 neighbor 2001:200:1:1::2 peer-group ANY
!
 address-family ipv4 unicast
 redistribute ospf
 no neighbor 2001:200::2 activate 
 bgp-redistribute-internal
 exit-address-family
!
 address-family ipv4 unicast
 redistribute ospf
 no neighbor 2001:200::2 activate                                 
 no neighbor 2001:200:1:1::2 activate 
 bgp-redistribute-internal
 exit-address-family
!
 address-family ipv6 unicast
 redistribute ospf
 neighbor 2001:200::2 activate 
 neighbor 2001:200:1:1::2 activate 
 bgp-redistribute-internal
 exit-address-family
! 
 address-family ipv6 multicast                                    
 exit-address-family
! 
 address-family vpnv4 unicast
 exit-address-family
!
route-map bgp-to-ospf permit 10
 match ip address prefix-list anycast-ip
!
route-map bgp-to-ospf-6 permit 10
 match ipv6 address prefix-list anycast-ipv6
!
end

Finally, the RT2 router config that would actually be a quite simple backbone router :

!
no spanning-tree
!
vlan 1 name DEFAULT-VLAN
!
hostname rt2
router ospf
area 0
!
interface loopback 1
ip ospf area 0
ip ospf passive
ip address 192.168.254.2/32
ipv6 address 2001:200::2/128
ipv6 ospf area 0
ipv6 ospf passive
!
interface ethernet 1/1
enable
ip ospf area 0
ip address 192.168.0.0/31
ipv6 address 2001:200:f:f::2/126
ipv6 ospf area 0
!
router bgp
local-as 65500
neighbor IBGP peer-group
neighbor IBGP remote-as 65500
neighbor IBGP next-hop-self
neighbor IBGP update-source loopback 1
neighbor IBGP6 peer-group
neighbor IBGP6 remote-as 65500
neighbor IBGP6 next-hop-self
neighbor IBGP6 update-source loopback 1
neighbor 192.168.254.1 remote-as 65500
neighbor 192.168.254.1 peer-group IBGP
neighbor 2001:200::1 remote-as 65500
neighbor 2001:200::1 peer-group IBGP6
!
address-family ipv4 unicast
no neighbor 2001:200::1 activate
exit-address-family
!
address-family ipv4 multicast
exit-address-family
!
address-family ipv6 unicast
neighbor 2001:200::1 activate
exit-address-family
!
address-family ipv6 multicast
exit-address-family
!
address-family vpnv4 unicast
exit-address-family
!
end

Service aware script

To cause BGP session shut down when DNS service hangs itself up, we can use this script :

#!/bin/bash

DNSUP=`dig @dns-node-1 localhost. A +short`
if [ "$DNSUP" != "127.0.0.1" ];
then
echo "Stopping Anycast server because DNS service is not working..."
    /etc/init.d/quagga stop
    /etc/init.d/bind9 stop
else 
    echo "Everything's OK."
fi


Just add it to the /etc/crontab :

*/2 *    * * *  root /root/.isDNSAlive.sh > /dev/null 2>&1

Like this it would check every 2 minutes if everthing’s working as expected.

Routing optimisation

Tweaking BGP timers can result in faster route propagation :

timers bgp

also we can set :

neighbor x.x.x.x timers connect

 Sources

https://supportforums.cisco.com/docs/DOC-18319

http://netlinxinc.com/netlinx-blog/45-dns/125-anycast-dns-part-5-using-bgp.html

http://ddiguru.com/blog/118-introduction-to-anycast-dns

Post Scriptum

Don’t see this as all perfect tutorial, maybe there are some not quite correct declarations or some setups I forgot to mention.

Advertisements

One thought on “Building a fast CDN with anycast (BGP based)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s