I have a bunch of one off scripts and modules I’ve written for a VI Migration I performed last week. The task was to migrate a moderate VI farm (15 hosts 100+ vm) from AMD based server to Intel in a ridiculous 20min window. I played with the idea of setting masks in the vmx, but ultimately that itself would require a reboot. In the end I wrote a series of small scripts that handled the whole thing for me. That has become my MO as of late… get a task write a couple quick functions then a small script that uses those functions. I have to get approval from my client to post the scripts (even though there’s nothing in them they were written on their network) so in the mean time I thought I would share how I navigate the VI API, and create new functionality.
Configuring a PXE server to present the files and information needed for kickstarting your ESX hosts isn’t too difficult a task. It does require some basic unix/linux knowledge, but aside from that, not too bad. I use a CentOS virtual machine with just 256 MB of RAM (you’ll need at least 512 for a GUI, but one isn’t necessary) to act as the PXE server for my ESX hosts. This same virtual machine also serves as a management point, as it has access to the management lan and with the perl toolkit and rCLI installed I can automate much of the work I need to accomplish with the hosts.
I happen to segregate the different types of traffic on the ESX hosts onto different VLANs. This means management (COS/PXE), VMotion, IP Storage, and virtual machine traffic (usually several VLANs by itself) are all separate. It is important that the server (or virtual machine) that you are using is configured with at least one interface on the same VLAN/network that the ESX management network is on. That interface will also need to have a static IP address.
It is also important that DHCP is able to function on this network when the host is in a totally unconfigured state. This means if you are trunking to your ESX hosts you must have the native VLAN set to the same as your management VLAN and port channeling (802.11q / LACP) can not be turned on during the PXE process.
I suppose that’s how you could describe the amount of posting that we’ve been doing around here the last couple of weeks. You’ll have to forgive us…work’s been crazy, to put it mildly…but things should get much more calm very soon.
Since today’s a “lazy Sunday”, I decided to go back and update the perl script that changes vSwitch security policies to use a better, more intelligent, method to update the spec for the switch(es). This shortened the script by about 20 lines and simplified the code significantly.
Hopefully soon I’ll be posting some information about doing kickstarts, to include setting up a PXE server using CentOS/RHEL, kickstarting using NFS, and kickstarting using a custom ESX iso. I’m also working on how to configure a host (networking, storage, ntp, users, vMotion, add to vCenter etc.) using rCLI/SDK scripts.
I dislike having to SSH into each host I am responsible for, and I detest having to enable SSH on ESXi (there should be NO reason for me to have to enable it). Because it’s difficult to script applying the NFS snapshot fix to a lot of hosts using the SSH method (and impossible if you don’t enable it on ESXi), I fooled around with the vifs.pl command that is provided with the rCLI.
I discovered that I can pull certain configuration files for the host using the command, modify them, then replace the configuration file…all without having to SSH to the host! vm-help.com has an excellent list of files available using this method.
All of the commands I use in the below script are available when the rCLI is installed (the rCLI also installs the perl toolkit, so all those “sample” scripts are available to us).
My windows scripting skills are non-existent, so I don’t know how to write a wrapper around the rCLI commands like I can with bash, but these same commands will work if you are using rCLI installed on Windows.
A while back I posted about how to change the amount of RAM assigned to the COS using the SDK, however, at that time I didn’t know of a good way to do so from the command line on the box. After some digging around, testing (and consequentially breaking), I’ve discovered how to change the setting.
Turns out, someone else already knew about this (including Dominic, a.k.a vmprofessional…I swear I’ve read that kickstart file a thousand times before and never noticed the code for this)…apparently my google-fu wasn’t working for me when I was trying this before.
Remember, valid values are from 272 to 800 MB.
# change ESX's config file
sed -i 's/memSize = "[0-9][0-9][0-9]"/memSize = "512"/' /etc/vmware/esx.conf
# regenerate the grub config files
# recreate the initrd file with new settings
echo "Host must be rebooted for new settings to take effect!"
I had the need to change the configuration of my ESX hosts so that the virtual switches had a single active and single standby adapter assigned to them. The reason for the need is rather irritating (the IBM I/O modules that we have in these particular blade centers are not really designed to handle a high amount of traffic), and it was causing some issues during vMotions.
This script allows me set the vmnics assigned to a vswitch to the desired active/standby configuration, and additionally allows me to set the port group’s vmnic active/standby policy. In my setup, I use two vSwitches, one for primary COS, vMotion and IP storage, and a second vSwitch for the virtual machines and secondary COS, each vSwitch has two NICs assigned (remember, they’re blade centers…limited network connectivity). In order to avoid vMotion taking all the bandwidth for storage I wanted to separate their traffic onto different NICs, but still provide redundancy.
The way that I accomplish this is by making the default for the vSwitch have, for example, vmnic0 active and vmnic2 standby. I then adjust the vMotion port group so that it has the opposite (vmnic2 active and vmnic0 standby). Redundancy is still present in the event of a NIC failure, but under normal circumstances, the traffic is separate.
Well this morning I had the distinct privilege of loosing two disks… YIKES! Thank’s to Raid_DP my Aggregate is still online, but if I loose another disk I’m done. As this is not a comfortable position to be in I want to know how long it will take; enter PowerShell!
Using my PoshOnTap module I wrote a quick script to monitor the status of a rebuild. Hopefully you’ll never need it!