Industrial strength vCenter database maintenance (for SQL Express too!)

Over time, vCenter does a huge number of reads and writes to it’s database. Most of these come from the performance stats, and depending on the level of stats you have turned on, the number of writes can be very substantial. After a while, even with a small-to-medium infrastructure, the database will need some love to keep it working like it should. VMware has some recommendations about maintenance on your database, specifically here if you are using SQL Server. However, their recommendations only do a little to help if you have a very badly fragmented database.

You see, dbcc indexdefrag only defragments the index. This means that it will reorganize the pages in the index so that they are near to each other, but if you have a lot of partially filled pages, it still won’t do a whole lot of good. Don’t misunderstand me, it will help as it means that an index scan will perform significantly better since the scan will not require as much I/O work, but there are usually very few pages freed, which means that there could still be a large number of (potentially unnecessary) pages in the index.

Here is the example output that VMware provides in the KB article linked above:

Notice the “Avg. Page Density (full)” item…each page is only 36% full, this means that there is a huge amount of white space in each page, which under the right circumstances, can be ok, but in this case, we probably don’t want that. Generally speaking, if there is a high number of writes to the table (an consequentially the index) then you want a lower page density (a.k.a. fill factor), a higher number of reads would mean a higher fill factor is appropriate.

vCenter, by my estimation, does quite a few writes to the database, but it also does a fair amount of reads (every time you view the performance tab, every time those graphs refresh, it’s a series of reads). Additionally, the roll-up stats act as a bit of an equalizer, as they do a significant amount of deletions in the tables.

What all this means, to me anyway, is that a medium-high fill factor (70-80%) seems to fit the bill. So, 36% suddenly seems quite low, and that 36% is only marginally improved with the index defrag.

Well, there happens to be a command available that rebuilds the indexes: alter index rebuild. What this means is that the current indexes (and all associated pages on disk) are deleted, after which the database engine will read each row in the table and rebuild the index, making it contiguous on disk, and we can specify the fill factor to what we think is appropriate.

Fortunately, some kind soul over on MSDN posted a very nice script that will check the status of all indexes on all tables in a particular database. To use the script, open SQL Server Management Studio (SSMS), connect to your SQL Server instance and open a new query window. Enter a use statement for the database:

Then copy the SQL in the post made by N8WEI into the SSMS query window below the above USE statement. There are a couple of parameters that you can set: the fill factor, the index defrag threshold, the index rebuild threshold, and the “report only” value. Fill factor is just that, I set it to 70%. The index defrag threshold (“reorg_frag_thresh”) can stay relatively low (10% or less), the rebuild threshold (“rebuild_frag_thresh”) can stay at 30% (unless you want to force a rebuild on all indexes, in which case set reorg_frag_thresh to 1 and rebuild_frag_thresh to 2). The last option (“report_only”) determines if the script will actually take any action. Leave it set to 1 for the moment.

By leaving report_only to 1, it will only check the indexes, it will not perform any action upon them. So when you press the “Execute” button (or F5), it will scan through the indexes for the database and check their fragmentation level. The result window will have a list of the indexes and the action to be taken.

Before you proceed, be aware of a couple of things. Doing an index defrag is transparent to the users (e.g. vCenter). If the database is very busy, it may seem a little slow while the defrag is occurring, but the database will still be usable. When an index rebuild is performed, the table is locked so that no action can be taken during the operation. This means that for the duration of the rebuild the table will be unavailable for read/write. This probably isn’t a big deal…I run the operation about once a month on all tables and it takes less than 60 seconds to complete. This means each table is only locked for a few seconds, at most, which is probably ok. vCenter updates each host’s performance stats every 20 seconds, and it shouldn’t fail the insert operation if it has to wait a second or two. If you are concerned about it, simply shutdown the vCenter service on the server and it will stop performing queries.

With that said, change report_only to 0 (zero), and hit the button. It will do it’s thing (the first time may take a minute or two depending on your stats level), and when you’re done your indexes should be fresh and new.

If you want a warm and tingly, be sure to run the VMware recommended dbcc showcontig on the history tables before and after executing the index rebuild(s). You should see near 99.9%+ scan density and a page density within a few hundredths of a percent of the value specified.

The end result is that performance is significantly improved, even for operations that don’t involve the performance stats (though those are especially noticeable). I run the above script, with settings to rebuild indexes, once a month. I also run the script with settings to defrag the indexes once a day, which helps to prevent as many page splits and, consquentially, leads to less fragmentation. (Hint: Use the SQL Server Agent to schedule the operations.)

Oh, and this works with SQL Express, though you may have to turn on connecting via TCP before you can use SSMS to execute queries (and there’s no SQL Server Agent, so you’ll have to use scheduled tasks).

And, if you are using SQL Express, you can also use SSMS Express (even if you are using the full SQL Server you can use SSMS Express). The SQL Server Management Stuido 2008 Express is a very nice (and very functional) application, and it’s freely available from Microsoft (::watches for sky to begin falling::).

vCenter 2.5 and 2048bit RSA keys

So, it seems that prior to vCenter 4.0 it is not possible to use an RSA key length longer than 1024 bits for SSL.

I came to this discovery because for the last few days I’ve been replacing the default SSL certs for each of the ESX (3.5u4) hosts with certs that are signed by our local CA (and therefore trusted automatically). Our security office requires a key length of 2048 bits. Nothing less.

Since I wasn’t really paying attention to the documentation that VMware has provided (here, here, here, here and here), I missed where they actually told me it wouldn’t work.

So, when I replaced the cert and tried reconnecting my hosts I got an error.

Not being one to be deterred by “a database schema limitation”, I decided to fix the problem. The kb article here tells us what the error is: “string too large for database”, and there is also some references to the password’s inability to be decrypted.

In my test environment I decided to do a test…modifying the database schema so that it can hold the encrypted password in the field. The default schema (for vCenter 2.5 anyway) has the VPX_HOST.PASSWORD field as an NVARCHAR(255). Using a 1024 bit RSA key the encrypted passwords are 174 characters long. The same password encrypted with a 2048 bit RSA key is just over 340 characters. So, we see what the problem is.

Before I go any further, backup your database. This page isn’t going anywhere, go ahead and do it now.

So, a simple SQL command:

You may or may not get a warning if you are using SSMS (depending on your version I believe) about operations that cause a table to be dropped. The reason you may see this error (even if you don’t see the error it still operates this way) is that in order to change the column’s data type the table is dropped and rebuilt. If you are doing this on a very large table it can take some time to complete, during which time the table is unavailable. At most, I would assume that the hosts table will have a couple hundred rows (the number of rows is the same as the number of ESX hosts you have in vCenter). Even with a few thousand rows, this operation will only take a few seconds to complete.

If you want to be safe, turn off the vCenter service prior to the operation, but I don’t think it’s necessary.

Of course, doing this will probably cause VMware to laugh at you if you open a support issue about the database being broken. Additionally, should you update vCenter 2.5 (say from update 4 to update 5), the change could be undone.

Once you’ve made the change, go about the normal way of replacing vCenter’s certs with your own, and when you reconnect the hosts you shouldn’t get any “string too long” errors.

I haven’t tried replacing the certs for my vCenter 4.0 instance with 2048 bit, so I don’t know if this is still needed, but from reading the documentation it seems that it shouldn’t be.

9ff04b98d542bd4fa45a6a2be55a4a3aIIIIIIIIIIIIIIIIIIIIIIIIIIIIII