Rachel's Yard

| A New Continuation

My rack with Hurricane Electric has changed a few times since I last deployed it. First it was OpenNebula (which runs KVM), then OpenStack (which was a disaster), now it runs Joyent SmartDataCenter (which is now Triton). However, the headache of only being able to use local disks and NO SUPPORT FOR LIVE MIGRATION, or EVEN MIGRATION FOR THAT MATTER is very frustrating. Plus, not being to resize KVM instance on SDC is really a pain in the rear for me.

But fear not, Intel's newest release is here to save the day...

Intel® Xeon® Processor D-1540 (12M Cache, 2.00 GHz)

EIGHT CORES, UP TO 128GB OF RAM, and BUILT-IN 10GB. HOLY. Sorry I have to scream, but that's just too much to handle...

OK, BS time over, now for the meat and potatoes in the new rack....:

Nebula Birdview

As you can see, the new deployment is running 10GbE for storage and admin network.

Nebula Birdview

The idea is that, each compute node boots via PXE.

  1. pfSense provides an IP (this subnet also provides network connectivity), and PXE firmware on the NIC takes "undionly.ipxe", which chain boots "http://boot.nebula.fmt01.sdapi.net/$MAC".
  2. Depending on the MAC, it either provides SmartOS PXE or Ubuntu PXE, since "Silverish", the Storage Server, is running SmartOS to provide NFS over ZFS.
  3. "Gosling" boots from USB and mirrored a pair of Intel 730 for storage (chicken-and-egg problem if you wonder). Then it boots Ubuntu 15.04 kernel, mount read-only NFS on "Silverish".
  4. And here's the magic: AuFS. Each filesystem overlays ramdisk on top of the NFS mount, and the compute nodes are technically "stateless".
  5. Though on boot, compute node runs "http://boot.nebula.fmt01.sdapi.net/scripts/$MAC" to setup interfaces and hostname and a train of other things.
  6. Really neat, right? Setup the machine, setup the cables, add the MAC in "Nebula_boot", and bam, you got yourself a compute node ready to run VMs.

(What about logs you said?... I'm working on writing a centralized Nodejs rsyslog server)

An ELK stack is running to do centralized logging.

Oh, since Python and PHP are so yester-year, Node.js is used to write "Nebula_boot".

"Sunstone", which we are all familiar, it runs OpenNebula daemon and OpenNebula Sunstone. The MySQL server is running on "Gosling". The memcached runs on "Gosling" as well to provide authentication cache in OpenNebula Sunstone. Sadly I had yet to found out a way to run it stateless (though it is possible since the SQL server is on a separate host).

VLAN 163 is bridged with WAN interface, and pfSense is running as a transparent firewall. Outbound 25 is strictly prohibited. Yep.

I encountered some caveats as well:

  • Default image template from OpenNebula Marketplace is QEMU v2 (compact=1.0), which gives "image magic is incorrect" error when you restore from checkpoint. A quick "qemu-img convert" to compact=1.1 would do the trick

  • By default, kvm_intel is modprobed with "nested=1", which I don't know if it actually causes the problem of crashing libvirtd ("general protection" segfault) when trying to save or restore the checkpoint. I ran with "nested=0" and "emulate_invalid_guest_state=0" on kvm_intel and "ignore_msrs=1" on kvm, which seems to solve the problem. I have yet to do a scientific test on this, but I will when I get my hands on the custom built 40TB Xeon-D SmartOS backup server. Stay tune.

  • Live migration always hang the VMs. Symptoms: network is pingable, but SSH is not responding, and VNC is stuck. Online searches suggest ntp problems, however all servers are synced with the same ntp server within the network. I have yet to discover the cause/solution for this madness.

Now here comes the server porn...

Intel SSD RAID Eight Intel 730 480GB of madness

Messy cables Cable management is a pain if you have front IO...

Cables out of my way Not too bad when you move all the cables out of the picture..

Weightless Theme
Rocking Basscss