Blitzed.org

No more netsplits

From Blitzed

Jump to: navigation, search

By the way, it has been pointed out that this is a very bad name for this page, since you can never avoid netsplits. I only chose this name as a joke/marketing gimmick and if it were ever actually developed it shouldn't be suggested that it prevents all netsplits!

Contents

[edit] A not very long time ago, in an IRC channel not so far away...

<Eyecon> grifferz: I know how to attract more channels to blitzed
<grifferz> oh?
<Eyecon> have an ircd which silently deals with netsplits
<Eyecon> that would be a big advantage over other networks
<Eyecon> I think it's worth the effort
<grifferz> it would be a lot of effort  we've discussed it before
<grifferz> you'd need to implement multiple links, turning the tree into a graph
<Eyecon> I know it's a lot
<Eyecon> but it's the biggest benefit
<Eyecon> that I can see
<Eyecon> it would really give blitzed an advantage over other networks
<Eyecon> well I really think it should be seriously considered
<Eyecon> if you want to grow
<grifferz> have to be clever to do it only with server/server protocol
<mark> it really is a lot of work
<grifferz> it's a vast amount of work, near as much as writing ircd from scratch imho
<Eyecon> I think it can be done with server/server
<Eyecon> I don't think it is *that* much
<Eyecon> but it's a lot
<grifferz> Eyecon: but have you any ideas how you will do it?
<Eyecon> nope
<Eyecon> :) 
<Eyecon> but I have some experience from work that will help
<mark> perhaps you could write a wrapper for existing ircd server/server
<mark> but that would be difficult as well
<Eyecon> interesting idea
<grifferz> currently all ircd does is, it gets a msg and sends it off
    down every server connection it has.  also it pings every connected fd
    (client or server) and if they don;t reply within the configured time
    period it breaks their connection
<grifferz> so it is very simple
<mark> just provide every ircd with one virtual hub server
<strtok> you can't do it, it would take too much bandwidth
<Eyecon> it's true that bandwidth usage will go up
<Eyecon> is that a large concern?
<strtok> it will not be transparent for the user
<grifferz> hmm, I'm not sure I agree erik, about the bandwidth usage
<strtok> because you still need to sync
<mark> i think it's possible
<strtok> grifferz: only with the sync
<grifferz> why would the network state be any bigger than it is now?
<grifferz> users may still see weird things like their own msgs coming
    back at them
<grifferz> which is similar to what you see in any network like the internet
<grifferz> (that can ever happen on normnal irc sometimes, I have seen my
    own quits and weird things like that)
<strtok> nno i'm saying, i think you will lose data
<strtok> and lose sync
<strtok> when the failover happens
<mark> not if you ack everything
<strtok> unless you ack everything
<strtok> yeah
<strtok> but
<strtok> :)
<strtok> how do you know if data got ot the other side okay
<strtok> but they just never acked it
<strtok> so it's synced but you don't think it's synced
<mark> that's not a problem
<mark> send it again
<grifferz> well then it is considered lost and must be resent when it fails over
<strtok> this is on failover
<grifferz> so there will be the occasional double message or users seeing
    themselves say things
<mark> look at tcp
<strtok> you make it sound easier than it is
<mark> it has the same problems, and solutions for them
<strtok> i know what tcp is i work with it everyday :P
<strtok> i don't think it will work easily for irc
<mark> everybody works with tcp everyday :P
<mark> (yeah, i know ;)
<strtok> :P
<mark> nobody said it would be trivial
<mark> but i don't think it's impossible
<grifferz> really if one was doing this properly then, all server connections
    would be udp
<grifferz> since it is just reinventing tcp at a higher level
<strtok> yeah
<grifferz> which really would need a rewrite of the ircd I think
<strtok> it has to all be designed from scratch
<strtok> how it syncs, everything
<mark> ok
<mark> when do we start? ;-)
<strtok> every server keeps track of the last SEQ they got from server ABC
<strtok> it's icky
<strtok> :P
<grifferz> dg can write it
<grifferz> in perl
<strtok> i don't think it's worth it!
<strtok> haha
<grifferz> if I had the sapre time.. it would be worth it for blitzed,
    because it would be a big selling point
<strtok> you need a whitepaper first though
<mark> the hard part is the client protocol
<mark> you need to stay compatible
<strtok> that's not that hard
<strtok> clients don't care how the servers talk to eachother
<mark> and most of them have become quite flexible in how the server talks
    to it as well
<mark> s/it/them/
<strtok> i think it would slow down irc
<strtok> because msgs still need to be sent in order
<strtok> so you'd be doing layer4 stuff yourself in the ircd
<mark> existing transport protocols don't suffice
<grifferz> you'd think there would be a library or something to do it wouldn't you
<strtok> maybe a seperate communication thread
<strtok> handles getting the data in the right order
<strtok> and then passes data to the ircd when it knows it's good
<mark> like my wrapper :)
<strtok> the problem is
<strtok> writing the network part would be cool
<strtok> but
<strtok> once you're done with that you don't want to write tha other 10 years
    worth of crap
<strtok> :P
<mark> indeed
<mark> a separate wrapper program might be possible
<mark> the problem is, it doesn't know anything about the ircd/network state
<strtok> it doesn't need to, it just knows about the links
<mark> perhaps
<strtok> and getting messages in the right order
<strtok> you want it to be abstract
<mark> yes
<strtok> so that you've created a graph based transport layer
<mark> that way you won't have to do all the other stuff
<mark> just link every ircd to exactly one wrapper
<strtok> and duplicate messages won't happen because every server keeps track
    of the order of packets arriving from each other server
<mark> yup
<mark> but i'm not sure whether it's possible
<strtok> it's not really a wrapper, it's more of the ircd's network layer
<grifferz> well it's like a bridge
<grifferz> between two incompatible protocols
<strtok> yeah
<mark> there might be some inherent desync property in the existing ircd
    protocols
<mark> and...
<mark> you have to provide every ircd with a tree network
<mark> the latter should be possible though
<mark> you can provide every ircd with a very flat tree
<strtok> it needs be written from scratch
<strtok> getting it to work with existing ircd is crazy
<grifferz> Eyecon: http://achurch.org/irc3/irc3-20020208.txt
<Eyecon> grifferz: somewhat interesting
<Eyecon> I'll have a read
<Eyecon> do we know if anyone has ever implemented it?
<grifferz> no one ver has, I would have heard of it
<grifferz> I heard of people who started though, and ended up making a whole
    new IM network instead
<grifferz> like silc, jabber, gale etc.
<Eyecon> my method though is very different
<Eyecon> and probably contraversial
<Eyecon> but should be easier to add to the current ircds
<Eyecon> see, my method is more of a half-way house
<Eyecon> when connections die, connections are re-made, without users seeing it
<Eyecon> rather than having multiple connections
<strtok> you're not thinking hard enough eyecon
<strtok> :P
<Eyecon> why not?
<strtok> you don't understand what that requires
<Eyecon> in what way?
<strtok> after you left yesterday we had an hour discussion of why it's so complicated
<grifferz> the whole state of the network has changed between a connection
    dying and being re-established
<strtok> yes
<strtok> you have to know what messages were synced and what weren't
<Eyecon> yes, I will cope with that
<Eyecon> look, I don't want to go into it here
<strtok> you have such an ego, go learn how IRC works then come talk about it properly
<strtok> :P
<Eyecon> because I'll have to go through it again
<grifferz> you are going to have to basically diff your state with the new state
<strtok> we know it's possible, just saying it's hard to do
<Eyecon> I'm part way through documenting it
<strtok> i bet you are
<grifferz> and have it cope with you being linked at a different place in the tree
<Eyecon> I agree it's not easy
<strtok> ever heard of desync? even in the current implementation it happens
<Eyecon> what is desync?
<strtok> see what i mean
<Eyecon> erik: I don't have an ego aobut this
<grifferz> if you're not having multiple links then is your idea really worth
    implementing?  it seems all it will achieve is hiding a netsplit
<Eyecon> I know nothing about irc
<strtok> then learn before writing anything
<Eyecon> but I really think it would be a big improvement to solve this problem
<Eyecon> I don't for a minute think I can solve it
<strtok> we agree
<Eyecon> but I hope I can create some ideas that you guys might be able to solve it
<Eyecon> I have some ideas that might help
<strtok> grifferz, mark and i were going over some crazy ideas, like a seperate
    transport layer for the irc
<strtok> based over udp
<Eyecon> grifferz: I think hiding a netsplit is very valuable
<grifferz> why?
<grifferz> my clietn already compresses it down to 1 line
<Eyecon> I don't think you have to solve the entire problem to give users
    90% of what they desire
<strtok> hiding netsplit is unimportant
<strtok> what would be useful is if the split never happened in the first place
<Eyecon> why?
<strtok> and the channel stays synced
<Eyecon> the channel has to stay synced, I agree with that
<Eyecon> but the line never go down?
<grifferz> I'm afraid I have to agree with strtok on that score.. without multiple
    links all you're doing is cosmetically altering the process, which isn't really
    worth it
<Eyecon> I'm not sure that's important
<strtok> you need multiple links
<Eyecon> why do you think that?
<strtok> to keep it flowing
<Eyecon> from a users point of view, what does it matter?
<grifferz> because without multiple links you have nothing different from how
    it is now
<Eyecon> creating a new link is very fast
<Eyecon> not true
<Eyecon> right now, links disapear and you lose traffice
<Eyecon> there is a perception by users that there has been a split
<Eyecon> that is the worst thing
<strtok> not true, links don't always happen right away
<Eyecon> from a user point of view
<strtok> or sometimes they take a while to sync
<grifferz> the only difference between what you propose (as far as I can see so far)
    and how it is now, is that instead of throwing away entire network state and
    then loading it again (now), you will merge the new into the old whenever a new
    link is established
<Eyecon> well how long does it take to create a new link?
<grifferz> so all that is different, is that users won't see a bunch of quits/joins
<strtok> yeah
<strtok> but they still won';t be able to communicate
<grifferz> but my client already squashes those down to 1 line
<strtok> so what's the point
<grifferz> so why would I care?
<strtok> it only hides the fact
<grifferz> I can exchange months of coding time for losing 1 line of info and
    gaining 20 more in oper notices to let me know what happened
<Eyecon> hmm
<strtok> if you';re going to do it, you need multiple links
<Eyecon> maybe I've looked at this from the wrong point of view then
<strtok> and even then you still might suffer the problem of losing all links
<Eyecon> but I'm still not conviced
<strtok> basically it would be a failover link
<grifferz> well what do you see as the benefit?
<Eyecon> when a netsplit occurs, how long does it usually last?
<grifferz> around 30 secs once the server actually knows it split
<strtok> it would last seconds, or it could last an hour
<grifferz> this is in ideal circumstances
<grifferz> and it might take between 90 and 400 secs to notice it is split
<Eyecon> but I remember in the days when we used to use efnet, they used to last
    like 30mins
<grifferz> sure but that was years ago, the whole internet got better in between
<grifferz> what's the average length of netsplit you've ever seen on here?
<strtok> efnet servers take a min or two to sync
<grifferz> there's two kinds really
<Eyecon> not long, but I thought maybe it was longer on larger networkds
<grifferz> there is the one where the route between a leaf and its hub goes bad
    somewhere out in the internet.  it lags for a few minutes and then pings out,
    autoconnects to a new hub, and all is over within minutes
<grifferz> the other is where the breakage is close to the leaf server.  then
    the leaf is completely off the net, it pings out and loses all its client,s
    may not come back ever, the clients go to another server
<grifferz> your plan works only for the first case, but the first case is not
    a big problem
<Eyecon> agreed
<grifferz> nothing can help the second case except for clients with multiple links
    which would be mad
<Eyecon> ok, so back to the drawing board :(
<grifferz> you could make the first case better by giving the leaf two links,
    one to its main hub and another to the second hub that it auto connected to
    as I explained
<strtok> Eyecon: multiple links over udp with crazy made up transport protocol
    for tracking message order
<grifferz> then instead of splitting and then reocnnecting it would never split
<Eyecon> strtok: sorry if I came across as arrogant... I just wanted to try and
    help find a solution to something I thought was a problem that ought to be
    solved...
<strtok> the only problem is, the problem is a lot deeper than you realize
<strtok> and you need to learn the current restrictions of irc links
<strtok> keeping the network synced is hard
<strtok> udp is a good candidate
<strtok> basically you would write an abstract transport layer over udp for a
    graph based irc network

(Those were edited logs, not 100% representative of the conversation that actually happened. But don't go digging through your logs to complain unless you're willing to edit it yourself if you think something important got missed.)

[edit] Thoughts

[edit] What is the problem we're trying to solve?

Imagine the following conventional IRC network where each letter is a server:

  A---------C---------D
  |                   |
  |                   |
  |                   |
  B                   E

There is only a single route between any two servers, and this route will never change while the network remains joined. Packet loss in the internet can break this route, leading to a server pinging out from its hub. If a problem exists between A and C, the network will split into two halves, one of which contains A, B and C and another which contains D and E:

  A---------C         D
  |                   |
  |                   |
  |                   |
  B                   E

Users in either half will see users from the other half all quit and communication with them will be impossible until the network reconfigures itself, perhaps if A connects to D:

  D---------E
  |
  |
  |
  A---------C
  |
  |
  |
  B

This netsplit is annoying and it is the goal of every well-designed IRC network to minimise these events. Traditional strategies involve having multiple high-quality hub servers so that leaves stay connected for a long time and when split always have somewhere else to reconnect to. Even then though, splits still happen, because the underlying internet is designed to route around failure, not make failure impossible.

So, Eyecon suggests we should get rid of netsplits. Easier said than done.

The first idea is to keep the familiar tree-style network, and just have the servers reconnect without showing a mass of quits. Basically the servers would be doing the equivalent of a diff of their last known network state with the new state they get when reconnecting. Initially this seems to be sufficient - it stops users being annoyed by seeing their friends all quit and be lost in a netsplit. But some further thought will highlight a few problems with this.

Firstly, the last known state may itself be incorrect since there is no way to tell at what point communication ceased to happen. The only reason why the ircd itself currently announces a split is because it has sent some PINGs and not heard anything back. Some indeterminate amount of messages have already been lost. This problem is not insurmountable, however.

More seriously is the question of how long exactly should the server "pretend" to users that everything is working? Netsplits can in the worst case last indefinitely. At some point the users need to know that their messages are being thrown away. The current netsplit situation accomplishes this by quitting the users that are on the other side of the netsplit.

One answer to this could be to have the servers acknowledge messages sent between them. In the situation prersented above where C pings out from D, C would keep records of all messages that D has not acknowledged. When C relinks to A it would send on all the messages that did not get delivered. Provided the reconnection happened relatively quickly there would be no need to inform users of what had happened since all they would experience is a momentary delay of messages. If C were unable to relink within a reasonable time then a conventional netsplit could occur.

Question: Assuming servers A, C and D all had some new hypothetical capability "NOSPLIT" which indicates that they keep track of who has seen what messages, is it possible the C could break its link with D and relink to A without the rest of the servers on the network (who are all behaving conventionally) getting confused? What I am getting at is, can this possibly be implemented as a new capability that only alters the format of communications between servers that advertise the capability?

Answer: We don't think so. One reason why not should be obvious from the following diagram:

A---B---D---X---E---F---H
    |               |
    |               |
    C               G

Here assume A to H are "NOSPLIT" servers, with X being a conventional ircd. X does not understand about sequence numbers and such, and would have told servers D and E this when it linked to them, so they will not be relaying any of the info about which servers have seen which messages. If the link between B and D were to break then B may relink to E and resend all messages that it believes to be lost in transit to D.

Maybe this could be worked around. Maybe it doesn't matter if all the messages are resent. All servers will still need a way to update their idea of what the network looks like (for /links), that's probably not a big deal. But all of this together means that perhaps it would just be easier to break backwards compatability between servers and start with a clean slate for a server/server protocol.

Some notes about the pros and cons of single links (tree) vs multiple links (cycle) should be added. See RFC:1324 section 5.4.2 "Trees and cycles".

[edit] Advantages of multiple links

  • ircd will always know it has a backup route that is alive, so as soon as it suspects one link is dead it can move everything over to the other without having to establish the connection.
  • ircd can use all links at once, by finding some method of working out which link will get the message to its destination in the shortest amount of time. (anyone got any ideas on that one?)

[edit] Disadvantages of multiple links

  • It's far more complicated - there is some chance that a singly linked version of this protocol could keep the config style of normal ircd, and most of the normal link procedure, only the actual server/server protocol would be different.

[edit] Links

Includes use of sequence numbers and a "hop list" to implement a graph (i.e., cyclic) network without infinite loops.

  • RFC:1459

The (extremely dated) RFC covering most client/server interaction.

The ircd we'd like to be using in future, and so should be the basis of any serious new development.

Personal tools