Speeding Ahead with ZFS and VirtualBox

In total, I have about 20 virtual hosts I take care of across a few workstations. On one system alone, I keep five running constantly, doing builds and network monitoring. At work, my Ubuntu workstation has two Windows VMs I use regularly. My home workstation has about eight: a mix of Fedora(s) and Windows. A lot of my Windows use is pretty brief: doing test installs, doing web page compatibility checking, and using TeamViewer. Sometimes, a VM goes bonkers and you have to roll-back to a previous version of the VM; and sometimes VirtualBox’s snapshots are useful for that. On my home workstation, I have some hosts with about 18 snapshots and they are hard to scroll through…they scroll to the right across the window… How insightful. Chopping out a few snapshots in the middle of that pile is madness. Whatever Copy on Write (COW) de-duplication they end up doing takes “fuhevvuh.” It’s faster to compress all the snapshots into a new VM of one snapshot.

What would make more sense than this is to actually use proper filesystem snapshotting. And I’m not talking about LVM. LVM has or had a limit of 256 snapshots per pool. I want hundreds at a minimum: every 15 minutes, every hour, every day, every week…at last count I had about 950+ snapshots on my VM host machine.

Start with some reasonable hardware first: I recommend getting an E3 system with ECC memory, two SSDs for boot and cache, and four 2TB or bigger hard drives for vm storage. Configure your Ubuntu install on your SSDs in a RAID1 mirror, and leave two empty partitions, one of 50M and another of 1GB, on them for later. I typically have a partition setup like so:

md0 /boot
md2 /
md3 /home
md5 blank 50MB, for ZIL
md6 blank 1GB for L2Arc

Don’t even bother to install your large drives until after Ubuntu is set up. I won’t bore you with any other Ubuntu details other than I recommend starting with Ubuntu Server and then adding MATE desktop later. That gives you the flexible partitioning and md-adm setup you’ll want to get the formatting right.

Now, Ubuntu’s all setup and you’ve got your tiny fonts setup for your terminal, right? Let’s continue “lower-down.” Get those four large drives installed. Power up. Crack open a sudo -s and we’ll get those serial numbers:

cd /dev/disk/by-id
ls | egrep -v -- ‘-part|wwn’
ata-KINGSTON_SMS200S330G_50026B725306B50A
ata-KINGSTON_SMS200S330G_50026B725306B50F
ata-Samsung_SSD_850_PRO_128GB_S1SMNSAFC58082X
ata-Samsung_SSD_850_PRO_128GB_S1SMNSAFC59200B
ata-WDC_WD1003FBYZ-010FB0_WD-WCAW30CEDD0J
ata-WDC_WD1003FBYZ-010FB0_WD-WCAW31XKCYPE
ata-WDC_WD1003FBYZ-010FB0_WD-WCAW32XKKP4A
ata-WDC_WD1003FBYZ-010FB0_WD-WCAW32XKKP78

cd /dev/disk/by-id

Never been here before? Good. Most of the time you don’t need to, but this is super useful if you ever need to get the serial numbers on your drives without powering down your machines. We’re here because ZFS is going to work more predictably with device IDs like serial numbers than /dev/sdX handles that might change between reboots. What? Yes, drives get assigned names in a way caused by semi-random drive power-up and reporting times to OS…leaving udev to guess names or assign them on first-come, first serve basis.

Don’t sweat. You’ve got X setup so you can copy-paste these things anyhow. Control-shift-C and Control-shift-V, remember?

Let’s install our ZFS. Start with getting dkim installed. Then get ubuntu-zfs installed. It will generate a bunch of spl library installs (that’s Solaris Portability Library) and re-generate your ramdisks to get those .ko objects going.

sudo apt-get install dkim
 sudo apt-get install ubuntu-zfs
 sudo update-grub2
sudo shutdown -r now

Reboot and make sure that you have the ZFS module loading on boot.

 lsmod | grep zfs

ZFS module loading on boot

The actual ZFS command to establish your pool is going to get long. Open up an editor (pluma will be fine) and get your drive serial numbers ready to put into your version of this command:

zpool create tank -f -o ashift=12 \
  mirror ata-WDC_WD1003FBYZ-010FB0_WD-WCAW30CEDD0J \
     ata-WDC_WD1003FBYZ-010FB0_WD-WCAW31XKCYPE \
   mirror ata-WDC_WD1003FBYZ-010FB0_WD-WCAW32XKKP4A \
     ata-WDC_WD1003FBYZ-010FB0_WD-WCAW32XKKP78

zpoolcreatetank

That creates a RAID10 volume named tank.

Next let’s use some of those blank partitions we left to make a ZIL. That’s the ZFS intent log, a write cache on the SSDs.

ZIL

We’ll make this safe and mirror them:

zpool tank add log mirror \
    ata-Samsung_SSD_850_PRO_128GB_S1SMNSAFC58082X-part2 \
    ata-Samsung_SSD_850_PRO_128GB_S1SMNSAFC59200B-part2

Now let’s make a Level-2 block cache:

zpool tank add log mirror \
    ata-Samsung_SSD_850_PRO_128GB_S1SMNSAFC58082X-part2
    ata-Samsung_SSD_850_PRO_128GB_S1SMNSAFC59200B-part2

This level-2 cache is also known as an L2ARC (adaptive replaceable cache). ZFS has default block caching in RAM as part of your zfs driver. This extends that to your SSD for when those blocks need to age out to L2 cache.

Almost done! Let’s check our progress using sudo zpool status -v. Yours should look similar to this:

zpool status

Rest your brain for a minute. [Gets up, pours coffee, throws cat outside.]

The VirtualBox part is not so mysterious, but rather takes advantage of a technique known as link farming. Link farming is what you do when stuff doesn’t all live on the same partition. And putting things in separate partitions is essentially what I’m going to show you here. Like LVM, a volume is a file system, like one that lives on a partition, but in ZFS, your free space is for the whole pool and you don’t need to decide on partition sizes. You create a number of ZFS volumes and their size is only their disk usage.

We want to create a volume for each vm. When we snapshot each vm, we can roll back the vm snapshot and not change the filesystem of any other vm. Handy? Oh yeah. Some small details first:

 

zfs set compression=lz4 tank
zfs set sync=always tank/VMs
zfs create tank/temp
chown mylogin:mygroup /tank/temp
zfs set sharenfs=on tank/temp
zfs set sharesmb=on tank/temp

You can map /tank/temp as shared by a virtual host directory. There can be some complexity when doing nfs and samba shares with zfs. Enable the smb and nfs services first and then the share commands will work. [related]

Do you already have a running VirtualBox guest on another machine that you want to import? Great. I suggest exporting it to an ova appliance file first. Follow these tips to help get good command-line usability from starting and stopping your vms:

  • Enable remote desktop access and assign an easy to remember port. Definitely not the default; use something like 4500 or 8000. For each VM you add, you shall increment that number, allowing you to have rdp windows open to all your vm desktops if needed.
  • Rename your virtual machines to include this port number. This allows you to use virtualbox command-line with confidence: vboxmanage startvm --type=headless <portnum-name>
  • Create some useful aliases for starting your vms:

alias listvms

  • Name your sub-volumes differently for Windows v. Linux guests. You cannot move your windows guests after they get license keys. But Linux doesn’t care.

When you import the machine, suggest a path like /tank/VMs/fedora-19. After the import, make sure the vm is powered down. We’re going to switch some things around.

vboxmanage modifyvm fedora-19 9101-fedora-19
 sudo zfs create tank/VMs/l_9101-fedora-21
 sudo zfs set sync=always tank/VMs/l_9101-fedora-21
 cd /tank/VMs
 rsync -a --remove-source-files 9101-fedora-19 1_9101-fedora-21
 rmdir 9101-fedora-19
 chown mylogin:mygroup /tank/temp/l_9101-fedora-21
 ln -s l_9101-fedora-21 9101-fedora-21

Whew! Lots of commands, but also a lot of benefits:

  • Our vm guest can be snapshotted and not interfere with other vm snapshot sets.
  • We have a reminder what the port is for RDP.
  • We have a reminder that this is a non-windows vm and we can move it safely.

If you created that function I pictured above, you can fire things up with:

 startvm 9101-fedora-19

Here’s a picture of how I have my directory setup on ares:

/tank/VMs

This has worked well for me for most of the year. I can see some spots where I messed up, even. Can you spot them?

Feels like a lot of work. But really, I find this so much less maintenance down the road compared to using LVM. The best part is now: snapshots! We can install zfs-auto-snapshot and our snapshotting will begin automatically.

apt-get install zfs-auto-snapshot
sudo ls /etc/cron.hourly
sudo ls /etc/cron.d

See, the zfs-auto-snapshotter is going to start working within 15 minutes. If you need to tune your snapshots, you tune them per ZFS volume with ZFS volume attributes.

Also check out...

Jed Reynolds
Jed Reynolds has been known to void warranties and super glue his fingers together. When he is not doing photography or fixing his bike, he can be found being a grey beard programmer analyst for Candela Technologies. Start stalking him at https://about.me/jed_reynolds.
  • carlos90its

    Hi Jed, thank you for this article, is really good. Just a little fix needed, the instrucion for the creation of the l2arc is the same as the one for the ZIL in your article.

    Thanks again for sharing your experience 🙂