Rachel's Yard| A New Continuation
My rack with Hurricane Electric has changed a few times since I last deployed it. First it was OpenNebula (which runs KVM), then OpenStack (which was a disaster), now it runs Joyent SmartDataCenter (which is now Triton). However, the headache of only being able to use local disks and NO SUPPORT FOR LIVE MIGRATION, or EVEN MIGRATION FOR THAT MATTER is very frustrating. Plus, not being to resize KVM instance on SDC is really a pain in the rear for me.
But fear not, Intel's newest release is here to save the day...
EIGHT CORES, UP TO 128GB OF RAM, and BUILT-IN 10GB. HOLY. Sorry I have to scream, but that's just too much to handle...
OK, BS time over, now for the meat and potatoes in the new rack....:
As you can see, the new deployment is running 10GbE for storage and admin network.
The idea is that, each compute node boots via PXE.
(What about logs you said?... I'm working on writing a centralized Nodejs rsyslog server)
An ELK stack is running to do centralized logging.
Oh, since Python and PHP are so yester-year, Node.js is used to write "Nebula_boot".
"Sunstone", which we are all familiar, it runs OpenNebula daemon and OpenNebula Sunstone. The MySQL server is running on "Gosling". The memcached runs on "Gosling" as well to provide authentication cache in OpenNebula Sunstone. Sadly I had yet to found out a way to run it stateless (though it is possible since the SQL server is on a separate host).
VLAN 163 is bridged with WAN interface, and pfSense is running as a transparent firewall. Outbound 25 is strictly prohibited. Yep.
I encountered some caveats as well:
Default image template from OpenNebula Marketplace is QEMU v2 (compact=1.0), which gives "image magic is incorrect" error when you restore from checkpoint. A quick "qemu-img convert" to compact=1.1 would do the trick
By default, kvm_intel is modprobed with "nested=1", which I don't know if it actually causes the problem of crashing libvirtd ("general protection" segfault) when trying to save or restore the checkpoint. I ran with "nested=0" and "emulate_invalid_guest_state=0" on kvm_intel and "ignore_msrs=1" on kvm, which seems to solve the problem. I have yet to do a scientific test on this, but I will when I get my hands on the custom built 40TB Xeon-D SmartOS backup server. Stay tune.
Live migration always hang the VMs. Symptoms: network is pingable, but SSH is not responding, and VNC is stuck. Online searches suggest ntp problems, however all servers are synced with the same ntp server within the network. I have yet to discover the cause/solution for this madness.
Now here comes the server porn...
Eight Intel 730 480GB of madness
Cable management is a pain if you have front IO...
Not too bad when you move all the cables out of the picture..