MySQL replication
From Blitzed
This page is a step-by-step guide to configuring MySQL replication of Blitzed stuff from lamancha.blitzed.org. At the moment this really only applies to our services database.
These instructions come straight from the How to set up replication section of the MySQL manual, condensed for Blitzed use.
Contents |
[edit] Why you might want to set up replication
- Backup purposes for our various databases.
- If you have software that requires access to our data.
[edit] What you need
[edit] A fairly constant connection to the Internet
MySQL replication can cope with outages and downtime and should just start back up again with no problem as long as you haven't suffered any problems like damaged filesystems, but unless your machine is online for at least a few hours a day then it will never catch up and is of dubious use even as backups.
Bandwidth usage is not high at the moment, as only writes are replicated. Any broadband user probably would not notice it except during large net joins.
[edit] A clue about keeping your system secure
This will be a full copy of our services databases, which could be used to try to crack user passwords or read memos. Our passwords are hashed with a large salt so this would take time, but users tend to pick useless passwords. For this reason we need to trust you and your ability to keep your machines secure. The user that we use for replication has no access to make any changes to any database.
[edit] An up-to-date copy of MySQL
We currently use MySQL 4.1.x. You need to be using at least a recent 4.1.x release for now. 5.x is able to function as a slave of a 4.1.x master, but 4.0.x is not.
You can only be a slave of one master at a time so if you're intending to run a MySQL to slave Blitzed things then it will probably have to be dedicated to that. It can run on a different port if you only have one IP and already run MySQL.
[edit] Blitzed user-defined function installed
Our services currently requires a user-defined function to be loaded from the shared library blitzed_udf.so. You will not be able to replicate our services without loading this function as some of the updates will fail.
Instructions for installing this library can be found in the source code. The library will be compiled as part of a normal compile of services and can later be moved to a suitable location.
[edit] Setting up replication
[edit] Getting the data
At the moment, lamancha is only configured to replicate liveservices and opm. You are probably only interested in liveservices so the first thing you need to do is get a copy of that data.
You can't just copy the files or dump them out of the database as you need to know at what point in the binary log the master is up to, so that replication can be started later on.
There is a LOAD DATA FROM MASTER MySQL command, but we have found it to be unreliable. It is also slow and read locks the whole server while it is going. The following method is not quite as simple, but is quick and safe. If you still want to try LOAD DATA FROM MASTER, see More about LOAD DATA FROM MASTER, below.
You will find it easier to open two shells onto lamancha. On the left is your normal user logging into MySQL. On the right is an su to root. What we are going to do is:
- Log into MySQL as root
- Lock all databases to read-only
- Tar up the files we are interested in
- Look at the position of the master's binary log
- Unlock
If you paste the commands as shown here the process will take only a few seconds and will not be noticeable.
$ mysql -u root -p Password: Your MySQL connection id is 264895 to server version: 4.1.7-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> flush tables with read lock; Query OK, 0 rows affected (0.62 sec) mysql> show master status; +----------------+----------+------------------+------------------+ | File | Position | Binlog_do_db | Binlog_ignore_db | +----------------+----------+------------------+------------------+ | lamancha-bin.052 | 183263 | opm,liveservices | | +----------------+----------+------------------+------------------+ 1 row in set (0.00 sec) mysql> unlock tables; Query OK, 0 rows affected (0.00 sec) |
$ sudo su - # cd /data/mysql/liveservices # tar jcvf /tmp/svs.tar.bz2 . ./ ./admin.frm ./admin.MYI ./admin.MYD ./akick.frm ./akick.MYI ./akick.MYD ./akill.frm ./akill.MYI ./akill.MYD ./auth.frm ./auth.MYI ./auth.MYD ./autojoin.frm ./autojoin.MYI ./autojoin.MYD ./chanaccess.frm ./chanaccess.MYI ./chanaccess.MYD ./chanlevel.frm ./chanlevel.MYI ./chanlevel.MYD ./channel.MYI ./chansuspend.frm ./chansuspend.MYI ./chansuspend.MYD ./exception.frm ./exception.MYI ./exception.MYD ./memo.frm ./memo.MYI ./memo.MYD ./memoinfo.frm ./memoinfo.MYI ./memoinfo.MYD ./news.frm ./news.MYI ./news.MYD ./nick.MYI ./nickaccess.frm ./nickaccess.MYI ./nickaccess.MYD ./nicksuspend.frm ./nicksuspend.MYI ./nicksuspend.MYD ./oper.frm ./oper.MYI ./oper.MYD ./private_tmp_access.frm ./private_tmp_access.MYI ./private_tmp_access.MYD ./private_tmp_akick.frm ./private_tmp_akick.MYI ./private_tmp_akick.MYD ./private_tmp_autojoin.frm ./private_tmp_autojoin.MYI ./private_tmp_autojoin.MYD ./private_tmp_chanaccess.frm ./private_tmp_chanaccess.MYI ./private_tmp_chanaccess.MYD ./private_tmp_memo.frm ./private_tmp_memo.MYI ./private_tmp_memo.MYD ./private_tmp_memo2.frm ./private_tmp_memo2.MYI ./private_tmp_memo2.MYD ./private_tmp_nickaccess2.frm ./private_tmp_nickaccess2.MYI ./private_tmp_nickaccess2.MYD ./private_tmp_quarantine.frm ./private_tmp_quarantine.MYI ./private_tmp_quarantine.MYD ./quarantine.frm ./quarantine.MYI ./session.frm ./quarantine.MYD ./session.MYI ./session.MYD ./stat.frm ./stat.MYI ./stat.MYD ./suspend.frm ./suspend.MYI ./suspend.MYD ./channel.MYD ./nick.MYD ./nick.frm ./channel.frm |
At this point you now have a file svs.tar.bz2 that contains all the files you are interested in, plus the name of the binary log and the position within it from SHOW MASTER STATUS. Keep a note of that info, copy the archive to your machine and then delete it from lamancha.
[edit] my.cnf
Edit your my.cnf so that it has the following:
[mysqld] server-id=slave_id
For slave_id you need to pick a number between 1 and 232-1. It must be unique amongst all servers used by Blitzed as slave or master, so this will require some organisation. Already-used identifiers:
- lug.org.uk
- laudanum.strugglers.net
- ???
- ???
- nubian.blitzed.org
- sirius.otherwize.co.uk
- caffreys.strugglers.net
- lamancha.blitzed.org
- curacao.strugglers.net
- ruminant.blitzed.eu.org
- susa.otherwize.co.uk
- babylon.otherwize.co.uk
Make sure to record your ID here so that other people don't try to use it.
If you are only interested in replicating liveservices and not opm, you will also want the following:
replicate-do-db=liveservices
And these directives are also recommended, to compress traffic and give you extra info about what is going on:
slave_compressed_protocol=1 log-warnings
[edit] Loading the data
Change to your MySQL data directory and unpack the archive that was created in the step above. If you copied the commands from above then you will have to create the liveservices directory for the files to go in too. Make sure all files have the right permissions!
Restart your MySQL and now you will have a copy of the data.
[edit] Configuring replication
Log into your MySQL as root. If this is the very first time you are setting up replication, issue the following command:
mysql> change master to master_host='lamancha.blitzed.org', -> master_user='user_name', -> master_password='replication_password', -> master_log_file='lamancha-bin.052', -> master_log_pos=183263;
Ask other members of the Systems Team for the replication username and password.
If you are just trying to restart a replication that is already configured but for some reason got out of sync, all you need to do is:
mysql> change master to master_log_file='lamancha-bin.052', master_log_pos=183263;
Provided the above command returns without error, you should now be able to type:
mysql> slave start;
Watch your MySQL error log for problems.
[edit] Checking that replication is still working
mysql> show slave status\G
*************************** 1. row ***************************
Master_Host: lamancha.blitzed.org
Master_User: replication_user
Master_Port: 3306
Connect_retry: 60
Master_Log_File: lamancha-bin.052
Read_Master_Log_Pos: 13434360
Relay_Log_File: laudanum-relay-bin.002
Relay_Log_Pos: 548862698
Relay_Master_Log_File: lamancha-bin.052
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_do_db: liveservices
Replicate_ignore_db:
Last_errno: 0
Last_error:
Skip_counter: 0
Exec_master_log_pos: 13434360
Relay_log_space: 548862698
1 row in set (0.30 sec)
You need to check that Slave_IO_Running and Slave_SQL_Running are both Yes. If IO is not running it indicates a problem in the replication protocol itself. SQL not running usually indicates that an error was encountered executing the SQL on your slave.
[edit] More about LOAD DATA FROM MASTER
As mentioned earlier, LOAD DATA FROM MASTER is a more convenient way to kickstart replication, but it has some disadvantages:
- Slower - this command will cause the relevant databases and tables to be populated as if you had done a mysqldump. This method is slower than just copying the files and tranferring them over the network.
- Read locks the server - the entire server is locked such that any queries that modify anything will block until after the command is finished transferring data. For us this has some unpleasant consequences. If you try to kick off a replication of opm this way, you will cause our services and thales to ping out.
- Unreliable - personal experience suggests that the implementation of this command is not perfect. Sometimes it just refuses to work for unknown reasons.
That said, it is a lot more convenient for us humans, and it means you can get away without restarting your slave, so you might want to try it anyway.
For LOAD DATA FROM MASTER, the replication account on the master needs some extra privileges:
- RELOAD
- SUPER
- SELECT on every database/table you want to load
If you forget the SELECT step then LOAD DATA FROM MASTER will appear to work, but will return to prompt much too quickly and no data will actually be transferred. On lamancha SELECT is already granted for the liveservices table (trying to do a LOAD DATA FROM MASTER on opm is not advisable).
If you are going to try this, please check with other systems team members first that the master is likely to be able to complete the operation fast enough -- if the master is extremely loaded then it is possible to cause services and thales to ping out even just for liveservices.
As long as you have your master host, user and password set, you should now be able to issue the command. Once you get a prompt back, try to start the slave. A show slave status\G should now give the same output as above.