Blitzed.org

DNS balancing

From Blitzed

Jump to: navigation, search

Contents

[edit] Poor Man's Geographic Load Balancing

[edit] What is geographic load balancing?

It's a general name for a technique whereby requests for a given service are spread out amongst multiple very widely dispersed (e.g. on different continents) locations usually based on the idea that it is better to send users to a server that is "close" to them.

Throughout this page the example service that we will be talking about is IRC, but it can apply also to HTTP and many other Internet protocols.

[edit] Geographic load balancing and IRC

Most IRC networks today have multiple servers. There are many reasons for doing this and only a few of those reasons are technical, but one good technical reason for having multiple servers is that it is better for users in North America to be using North American IRC servers, users in Europe to be using European servers, etc.. It's better to keep traffic from clients local, and link regions with stable hub links.

This used to be very important in the early days of IRC, 10 or more years ago, because the Internet actually wasn't that stable back then. Splits were common across regional boundaries (e.g. when links across the Atlantic would fail) but at least if users were connected to local servers then they could carry on communicating with local people. This was a redundancy issue.

Today this is not so important as a redundancy feature, since now many users are connected via broadband, and links between regions are more plentiful, reliability of the Internet in the early 21st century is in fact quite high and the typical DSL user in Europe can use an IRC server on the West coast of the United States without noticing reliability problems. So global load balancing today is mainly used for performance reasons.

So why worry about this? Well, it is still desirable to have users connect to servers local to them, it is just not so much of a requirement. Today a minor problem we face is that the majority of our users connect to irc.blitzed.org, our main DNS pool, no matter where they are from. This leads to the user being randomly directed to one of our servers, which could literally be on the other side of the world from them. The user will be unlikely to complain since the delay is not noticeable in human terms, but this wastes resources both for the user, their ISP, and for us. The world would be better if (in general) users kept to servers that are close to them.

We should be clear that this isn't a major issue. The first concern always has to be having a working network, having some servers anywhere for users to connect to, but there is no reason why we should not try to improve the spread of the general mass of users that just connect to irc.blitzed.org. Here are some reasons why:

  • IRC networks with multiple servers experience splits.
    This is a fact of life. It could only be avoided by rewriting the ircd, and even then that just turns a split into a period of suboptimal performance. That's the same problem, just in a lesser way.
    We have many channels that are geographically biased. We have channels of mainly Danish people, channels of mainly Dutch people, channels of mainly US people. When a split happens, because most of these people are spread over every server we have, they cannot help but notice the split since now one or more people they were talking to are no longer present.
    If we could keep users on servers that were geographically close, then they wouldn't notice so much if servers from other regions have problems.
  • Modern internet problems still most commonly take place in the links between regions.
    By definition these links are expensive to provision, so there are fewer of them. Because there are fewer of them, any outage is more noticeable and that's why we more commonly see problems between regions.
    Again if users were grouped by region then the most common class of Internet problems would affect our users less.
  • We try to route our servers based on user count, but most of our servers are in Europe.
    This section used to say:
  • Simply by virtue that we have vastly more users in Europe means that we have more European servers. Because the main pool hands off to servers with a fairly even distribution, most of our users end up on European servers. We still need servers in the US (and perhaps also in Asia-Pacific, Australia, Africa, ...) just to serve the local users in those places, but these servers have a high routing cost because they end up holding so few users.
  • Maybe the US region would end up looking like less of a nuisance if it was holding a number of users more in line with the actual number of US users on the network (a very sizable minority).
Since actually implementing this it seems like we have a bunch of US users we weren't aware of. They are the majority in fact, and the US region now really does look like less of a nuisance - it looks like something we should focus more on. This is exactly the kind of thing we were trying to find out
  • Coolness factor.
    Quite simply, we know of no other IRC network that balances its users based on anything but weighted DNS round robin. (DALnet may have done this once, we're not quite sure)

[edit] How we currently balance

DNS round robin with equal weights (one entry for each IRC server). In event of prolonged downtime, the server is withdrawn from the pool.

[edit] Options for more intelligent balancing

There are a couple of places we can do the balancing:

  1. At the IP level.
    We can do BGP tricks which involve announcing server IPs to multiple places, and using load balancing hardware.
    This is not feasible. It requires masses of infrastructure, our own IP allocations, AS numbers and hardware. This is how real companies do this, and if we had the money it is how we would do this, but the point of this page is that we don't have the money.
  2. At our DNS servers.
    The several DNS servers authoritative for irc.blitzed.org answer requests from our users when they try to connect to our network. These servers have access to the nameserver IP that the user is using. At the moment they just hand back an IP address randomly chosen from the pool of available servers, but we could instead make a custom DNS backend that hands back IPs based on other data. The most simple thing to do would be to give back only IPs that are known to be "close" to the IP of the user's own nameserver.
  3. At the IRC server level.
    The IRC servers know the topology of the network and they do see the real IP of the user, so they could somehow redirect users to better servers.
    Unfortunately there is no support within the IRC client protocol to do this. There is a numeric to tell the user to use a different server, but it involves complete disconnection from IRC and then reconnection to another server. In almost all clients it is also only advisory, it does not actually cause anything to happen automatically. Doesn't seem too useful.

The DNS solution seems to be the best, it has the advantage that it is fairly cheap to do (just need some custom DNS servers), and it may be applicable to other services too. mark reckons we should write a backend for PowerDNS. Here's one he made earlier to demonstrate how:

$ dig +norecurse @213.193.225.137 irc.nedworks.org a

; <<>> DiG 9.2.2 <<>> +norecurse @213.193.225.137 irc.nedworks.org a
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39121
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;irc.nedworks.org.              IN      A

;; ANSWER SECTION:
irc.nedworks.org.       5       IN      CNAME   eu.blitzed.org.

;; Query time: 49 msec
;; SERVER: 213.193.225.137#53(213.193.225.137)
;; WHEN: Tue Feb  3 22:16:05 2004
;; MSG SIZE  rcvd: 59

See the PowerDNS backend documentation for more info.

[edit] Getting started

If we have decided on the DNS method, then I (grifferz) have some suggestions:

  • We delegate a subdomain of blitzed.org to this project.
    Let's say irc-geo.blitzed.org. We'll delegate that to a bunch of PowerDNS servers that we will set up -- I'm guessing we can run one on nubian, and maybe Mark can provide one or two more.
  • We only try to categorise users into our major regions.
    Currently we only have servers in two major regions: Europe and North America (it is possible we may add Australia at a later date, historically we have had several servers from there), so it is probably pointless trying to narrow it down any further than that.
    Theoretically we could probably link the DNS backend into the status of our IRC network in order to only ever hand out IPs of servers that are currently linked and working, but that can be for a later version of this system. At first I suggest that we just pass the user a CNAME to the correct region, which is pretty much what Mark's code does already.
    We already maintain DNS pools for the major regions and countries (eu, us, uk, ca, etc.) and we try to ensure that those pools contain only the IPs of linked and working servers by means of a set of scripts that talk to thales. Our scripts advise us when a server has been delinked for over an hour, and then someone from the systems team goes and runs a depool script. Similarly the scripts also warn us about depooled servers that have been linked for over an hour, so that we can repool them. The system is not perfect but it does work and is better than what most networks have. In reality as long as the pool contains more than one IP it is okay if one of them happens to be dead at that moment.

So the things to focus on are getting the servers up, getting code that can work out the region more reliably.

[edit] How to work out where the user is coming from

dg suggests GeoIP. They have a database and a C library for accessing it which is able to give country and continent details for IP addresses. Unfortunately their database is not free; a copy from March 2003 is available but for regular updates you need to pay. Also they have only a C library, but PDNS backends need to be in C++.

mark suggests countries.nerd.dk, a DNSBL that maps IP addresses to their ISO-3166 country codes. From there we just have to map country code to blitzed server region. Unfortunately the bind-compatible zonefile for this is very large and when loaded into PDNS uses 275-300MB of RAM, which is rather a lot for what it is. We could probably strip out half of the content of the zone, but it's still a lot.

mark has now written some code to efficiently hold a list of IP/prefixlen records. This can parse the RBLDNSD-format version of countries.nerd.dk and store it all in only a few megabytes of RAM. Now we can just take regular copies of this zone instead of using the out-of-date GeoIP database.

[edit] How to tell the users where to go

mark is now working on a PowerDNS backend that will answer for a configurable resource record (e.g. irc-geo.blitzed.org) and will return a CNAME based on a file of country -> CNAME mappings. An example might be:

fr eu

to denote that IPs from France will be sent a CNAME for eu.blitzed.org.

Once the backend is done and we have found at least 3 nameservers we can start getting users to test it out. Once we get something that seems to work fairly reliably then we can switch that for the real irc.blitzed.org, and use irc-geo.blitzed.org for the next version of our backend.

[edit] For the future

It seems to me that we need to write an API for telling our backend what servers we have, that way the backend does not need to do remote database queries against thales or whatever to work that out. This API would be used to (de)pool servers. Each backend can then return an answer that contains the top 3 A records for currently linked servers that are "close" to the user's nameserver. With that solution, we also don't need to maintain the region pools anymore; they can all be directed at the DNS backends as well.

The concept of a "close" IP needs to be thought about in greater detail as well. That works fine on the level of continents such as "Europe" or "North America", but doesn't work too well on a smaller scale. When you try to pick "close" servers that are within North America (or Europe, or UK, or ..) then you find that geographic distance no longer bears much relationship to latency. Packets don't travel as the crow flies. (OK well apparently they do since crows follow roads and so do cables! But you know what I mean)

Here's one idea for a possible poor man's solution:

  • User from 24.150.91.34 requests A records for irc.blitzed.org, their nameserver 24.226.1.93 talks to one of our DNS backends.
  • Backends have no data cached for 24.226.1.0/24 (our own cache, not a DNS cache).
  • Our IP->location code tells us that this user is from Canada.
  • We have two servers in North America, the DNS backend sends back A records for 64.49.208.132 and 203.56.139.100. These IPs correspond to the addresses of carrot.tx.us.blitzed.org and lik-m-aid.ca.us.blitzed.org.
  • Before exiting, the DNS backend contacts daemons on those two servers and tells them to measure latency to 24.226.1.93.
  • Daemons on carrot.tx.us.blitzed.org and lik-m-aid.ca.us.blitzed.org contact all our DNS backends and tell them the latency results they just calculated for 24.226.1.93. Let's say that carrot.tx.us.blitzed.org measures RTT of 45ms, lik-m-aid.ca.us.blitzed.org measures RTT of 60ms.
  • Backends create a cache entry for 24.226.1.0/24 (or maybe 24.226.0.0/16?) stating that IPs should be returned in the following order:
    1. 64.49.208.132
    2. 203.56.139.100
  • The cache entry is kept for a few weeks or so without any further tests being done, result is that we will always know the best servers for our regular users without having to test every single time and without the time required for testing holding up the response from the DNS backends. Only the first ever query they do will result in them getting a less optimal randomly ordered list for their region.

The main (possibly critical) problem for this scheme is that I can't think of a way to measure latency to an arbitrary host that runs an arbitrary OS, that may or may not have a firewall, and when I don't have root on all the boxes I am trying to do it from. Use of ICMP is completely out of the question since we can never guarantee that all our servers will be able to use ping (Using ICMP requires root access on most Unix platforms). Timing TCP and UDP will run into problems with firewalls.

Perhaps we could time a TCP connect to port 113 (auth, or ident)? A lot of IRC users (I won't say "most") understand that ident is something that might be useful for IRC, a lot of them have it configured. Worst case is that they have a DROP firewall, in which case we'd get no results and they would continue being sent to a random server in their region; hardly the end of the world. Also IRC server hosts connecting to auth services will not raise any eyebrows with IDS and firewalls.

On the other hand, I think that even the first stage of this project will make a dramatic difference to the spread of our users.

Any other ideas?

Note, this is not what was actually implemented! Some of these ideas might eventually go into IsoDNS though. -- mark

[edit] A few weeks later...

So we went ahead and implemented this as a PowerDNS backend and have had it running on our main irc pool for a few weeks now. If you care about how the backend works in detail then there is some docs for that in CVS as well, but here's a quick rundown:

We delegated geo.blitzed.org to three PowerDNS servers that we set up to run our backend. The backend looks up the IP address of the user's resolver in a data structure obtained from zz.countries.nerd.dk. This gives the ISO country code, which is then fed through a map file to determine what CNAME to respond with. Every RR within geo.blitzed.org can potentially have its own map file, but at the moment we're just using irc.geo.blitzed.org which is what irc.blitzed.org is CNAMEd to:

[andy@fullers andy]$ dig irc.blitzed.org

; <<>> DiG 9.2.3 <<>> irc.blitzed.org
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65132
;; flags: qr rd ra; QUERY: 1, ANSWER: 7, AUTHORITY: 5, ADDITIONAL: 1

;; QUESTION SECTION:
;irc.blitzed.org.               IN      A

;; ANSWER SECTION:
irc.blitzed.org.        3487    IN      CNAME   irc.geo.blitzed.org.
irc.geo.blitzed.org.    3487    IN      CNAME   eu.iso.blitzed.org.
eu.iso.blitzed.org.     1       IN      A       213.193.225.252
eu.iso.blitzed.org.     1       IN      A       62.80.124.155
eu.iso.blitzed.org.     1       IN      A       80.196.158.72
eu.iso.blitzed.org.     1       IN      A       195.22.74.199
eu.iso.blitzed.org.     1       IN      A       195.92.253.3

;; AUTHORITY SECTION:
blitzed.org.            3071    IN      NS      sou.nameserver.net.
blitzed.org.            3071    IN      NS      bos.nameserver.net.
blitzed.org.            3071    IN      NS      iad.nameserver.net.
blitzed.org.            3071    IN      NS      phl.nameserver.net.
blitzed.org.            3071    IN      NS      sjc.nameserver.net.

;; ADDITIONAL SECTION:
sou.nameserver.net.     43071   IN      A       194.196.163.7

;; Query time: 178 msec
;; SERVER: 192.168.0.5#53(192.168.0.5)
;; WHEN: Sat Feb 28 05:50:50 2004
;; MSG SIZE  rcvd: 276

It works pretty well, apart from a couple of strangely allocated IP ranges (mostly from APNIC). We're only trying to send users to a server in the same region (usually continent), but still it has made a dramatic difference to the distribution of users:

 lik-m-aid.ca.us.blitzed.org              301  27%
 |-penguin.uk.eu.blitzed.org              158  14%
 | |-toblerone.hub.eu.blitzed.org           0   0%
 | | `-cgiirc3.ch.eu.blitzed.org            7   0%
 | |-milk.de.eu.blitzed.org                91   8%
 | |-chocolate.be.eu.blitzed.org           60   5%
 | | `-soylent-green.be.eu.blitzed.org      2   0%
 | `-dope.se.eu.blitzed.org               100   9%
 `-carrot.tx.us.blitzed.org               313  28%
   `-porkchop.dk.eu.blitzed.org            58   5%

Here's one from roughly the same time back before we started using this backend:

 penguin.uk.eu.blitzed.org                234  20%
 |-toblerone.hub.eu.blitzed.org             1   0%
 | |-chocolate.be.eu.blitzed.org          140  12%
 | | `-soylent-green.be.eu.blitzed.org      3   0%
 | `-cgiirc3.ch.eu.blitzed.org             32   2%
 |-milk.de.eu.blitzed.org                 145  12%
 |-carrot.tx.us.blitzed.org               161  14%
 | |-porkchop.dk.eu.blitzed.org           133  11%
 | `-lik-m-aid.ca.us.blitzed.org          137  12%
 `-dope.se.eu.blitzed.org                 131  11%

Note the real difference in the user counts for the US servers lik-m-aid and carrot. That's the effect of the US users being kept on the US servers.

We still can't really think of a good way to send users to the absolute best server for them. It is hard to define what is "best", and if "best" is defined as "lowest RTT to user" then we still have no good way to measure it.

Our current backend only returns CNAMEs to things in iso.blitzed.org, leaving our usual DNS setup to return a list of A records for that region. mark and dg are now working on a second backend (Blitzed-specific) that will handle iso.blitzed.org and only ever hand back A records for currently-linked servers.

Each nameserver will maintain TCP connections to each client server, removing the client server from all pools if that connection should break for any reason. It could be a little messy since we have around 9 client servers at the moment, so that will be a total of 27 "bots" on the network just to keep that working, but we can't think of a better way.

If this works out then we might be able to work out some other technique to send users to "better" servers, as this will provide the essential ability to only select from servers that are actually working.

[edit] Stats

[edit] ns0

[edit] ns1

It would be nice if we had some sensible way to check where users are actually from. It's easy to say "check all their IP addresses against the zz.countries.nerd.dk zone then!" but it's not quite that simple; the breakdown of countries is naturally very time-sensitive so a snapshot from any given time isn't particularly useful.

Checking the PowerDNS logs to see who is doing the queries doesn't seem to be that useful either, as for some reason there are vast amounts of queries from Taiwan. I am not sure why there would be vast amounts of queries specifically on irc.blitzed.org from Taiwan when we can't see (m)any Taiwanese users actually on the network, and until that is explained that method can't be trusted. Here's what can be gotten from the logs of #sys:

[andy@fullers andy]$ echo -e "Queries\t\tISO\tRegion\n-------\t\t---\t------";
perl -ne 'print sprintf("\t%3d\t(%s)\n", $2, $1) if (/CNAME (.*).iso.blitzed.org.*\((\d+)\)$/);'
/home/andy/irclogs/Blitzed/#sys.log | sort -n | uniq -c | sort -rn | head -10
Queries         ISO     Region
-------         ---     ------
  68136         840     (na)
  26149         158     (as)
  16859         528     (eu)
  14583         276     (eu)
  14069         826     (eu)
  13683         616     (eu)
   8528         124     (na)
   6109         376     (as)
   4955          56     (eu)
   4613          36     (oc)

Taiwan is ISO code 158.

Personal tools