Troubleshooting

Migrating Dovecot 1.2 Maildir to Dovecot 2.0 dbox

I am in the process of migrating to a new mail server. Therefore I need to, as painlessly as possible, move users. The details about the setup is another story for another day – promise.

This guide is targeted for Debian systems, but the concepts apply for all other systems as well.

Dovecot 2.0 comes with a nice tool called dsync which eases migration by a great deal. Unfortunately, my current mail server runs Dovecot 1.2 and therefore does not have the tool.

What to do, then.

Basically I have thought up three options for migrating.

  1. Using dsync on both sides
  2. Using rsync, then dsync
  3. Using dsync over sshfs

This post will serve as documentation for my experiments with mailbox migration.

If you are in a hurry, you can skip to the conclusion.

Using dsync on both sides

Being that I run Dovecot 1.2 and thus do no have dsync available I will need to pull down the sources and compile them myself. (I do not want to use dpkg’s as they may intervene with the existing installation.)

I got as far as getting the source compiled, but have not investigated further. Some paths were wrong – I cowardly quitted.

Later experiments with the two other approaches have shown that this, most likely, will not prove successful.

Using rsync then dsync

Next solution was to create a two step migration solution. First I used rsync to copy my Maildir mailboxes to the new server.

rsync -poazuHK -e ssh \ 
     root@oldmailserver.tld:/var/spool/postfix/virtual/ \ 
     /var/vmail.migrate/

You can log in as root here, as the -o (preserve ownership) maps the uname to the uid on the target system. Clever :-)

Then, run dsync for each user in order to import the new emails.

dsync -R -u myaddress@mydomain.tld backup \
maildir:/var/vmail.migrate/mydomain.tld/myaddress/Maildir/

Mirroring does not really make sense here as we have a local copy of the mailbox

This approach is by far the fastest and easiest.

Using dsync over sshfs

Notice: This only works with backup and not mirror.

Why? Dovecot2 log format is incompatible with Dovecot1’s that will timeout with a message about an unknown record type (0x8000) after a mirror operation.

# apt-get install sshfs
sshfs -o uid=`id -u vmail` -o allow_other \
vmail@oldmailserver:/var/spool/postfix/virtual/ \
/var/vmail.lucretia/

Remember the -o allow_other or the dsync will fail because the vmail user will not have access to the mount point.

Then, run dsync for each user in order to import the new emails.

dsync -R -u myaddress@mydomain.tld backup \
maildir:/var/vmail.oldhost/mydomain.tld/myaddress/Maildir/

Ownerships is of the essence here. Do not use root as this user will take ownership of dovecot metadata files causing your source mail server to coredump or just stall.
vmail is not the best option either – but I was lazy. You should take advantage of the fact that the vmail folders are (usually) gid vmail. Putting a migration user in this group and chmodding will probably be preferred, security-wise.

This approach works well when refined (eg. usíng the right uid on both sides), but is pretty slow – about 100kb/s sync. This not really acceptable for 1GB+ mailboxes. But as always, your milage may vary.

Your remote Dovecot will keep on running as nothing has happened – if you get the permissions correct. Unfortunately there are problems with the dovecot transaction log resulting in problems with uid of the Mailbox being inconsistent, resulting in something like this:

Error: Corrupted transaction log file /var/vmail/domain.tld/username/dbox/mailboxes/INBOX/dbox-Mails/dovecot.index.log seq 4: indexid changed 1313910265 -> 1313868319 (sync_offset=0)

Conclusion

My previous attempts have lead me to one conclusion: I need to move the mailbox once.

I chose the rsync+dsync approach and then did the following:

  1. Migrated all users to the new server
  2. Updated DNS
  3. rsync’ed first time
  4. Stopped the Dovecot and Postfix service on the old server
  5. rsync’ed second time
  6. dsync’ed the mailboxes
  7. Turned virtual_mailbox_maps and domains into relay_recipient_maps and domains respectively

If you decrease the TTL for you domain up until the move, you can minimize downtime. If you maintain a local DNS – even better.

This is not the fancy minimal down-time approach I had hoped for, but it has been sufficient for my needs. Feel free to contribute feedback.

Troubleshooting

I got a:

dsync(root): Fatal: Mail locations must use the same virtual mailbox
hierarchy separator (specify separator for the default namespace)

Some google-ing revealed that I needed to setup a namespace separator. The technical explanation for this left to the more Dovecot-savy.

In short, add the following to /etc/dovecot/conf.d/10-mail.conf (or uncomment the relevant ones).

namespace {
  separator = /
  inbox = yes
}

An now it works. migration is just a matter of setting up a cron job now, lower the TTL on the domain and move in day or two.

I got some

Error: Can't rename mailbox INBOX to
INBOX_ff3e01082bcf4e4e352c00002b747e8a:
Renaming INBOX isn't supported.

Using rsync->dsync which I haven’t been able to solve yet. Maybe shutting down the Dovecot service on the remote side would help. Race conditions are likely to occur.

FreeNAS itx setup

As a result of a complete NAS breakdown one of my customers decided to get a new server that had a bit more power than the old one.

I saw this as quite an interesting challenge and got started.

Due to the fact that the rack cabinet that was put up was only ~68cm deep, I had to find a rack chassis that ware to fit these constraints.
It turns out that Travla has some very nice chassis’ with 8 front access hot-swap drive bays for the raid.

Components:

At first, I tried the Jetway NC9C-550-LF mainboard with the 4xSATA daughterboard. But unfortunately, the latter was unsupported, which took the whole idea out of using this board (8xSATA in all). Also the LAN interface was not supported out-of-the-box.

The installation went smooth, and a SoftRAID5 was created using the five disks. The creation was a real pain and took forever.
Initial benchmarks went well, but at deployment a significant slowdown was detected. ~250Mbit LAN usage when transferring large files, and as low as 50Mbit when transferring small files. This was very unacceptable on a Gigabit LAN.

After a switch switch and a NIC switch I turned as a last resort, to what could not possibly be the bottleneck – the server itself!

nas:~# dd if=/dev/zero of=/mnt/storage/zerofile.000 bs=1m count=10000
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 271.362496 secs (38641154 bytes/sec)
nas:~# dd of=/dev/zero if=/mnt/storage/zerofile.000 bs=1m 
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 96.963503 secs (108141308 bytes/sec)

40/100 Mb/s is not very impressive for sequential r/w – especially not on a RAID5!
Guess the bottleneck was the server itself.

After a bit of reading and research, I came across a story quite similar to mine – using the exact same disks on a softRaid5. The problem was misalignment of partitions due to a change of standard disk blocksize since – well I don’t know when, I usually don’t follow hardware evolution that closely.

Next thing, I persuaded the customer to backup the data, so that I could re-create the RAID – only this time as a RAID-Z.

dd if=/dev/zero of=/storage/zerofile.000 bs=1m count=10000 && dd of=/dev/null if=/storage/zerofile.000 bs=1m && rm /storage/zerofile.000 
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 98.727775 secs (106208815 bytes/sec)
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 46.398998 secs (225991087 bytes/sec)

This is a nice improvement! The customer is also satisfied with the speed increase, but then again – who wouldn’t be?

Finally, a photo of the setup.

20110208-154924.jpg

This is a sight that I just had to document. It is a collection of external disks, and the document on top is the index. This index is created by mounting each disk and take a screenshot of the Finder window. A very nice ad-hoc solution if you ask me.

20110208-154905.jpg