Monitoring for orphaned snapshots left by SMVI

NetApp’s SnapManager for Virtual Infrastructure (SMVI) is a great product, but it’s messy. If it encounters the any error, it seemingly forgets to delete the virtual machine snapshots from the Virtual Infrastructure before dying.

To prevent many orphans (I’ve seen as many as 20 on a single virtual machine) from happening, I created a quick Nagios check that simply alerts when it sees them.

This script is very elementary. It very simply uses a regex to check for any snapshots that match the default SMVI naming convention. For each one it finds, a counter is incremented. If any are found, the script returns an error to Nagios, which causes an alert to be sent.

I’ve set the check to execute once an hour in my environment, as I don’t feel that granularity finer than that is needed…an hour’s worth of change is ok for an SMVI snapshot for me.

5 thoughts on “Monitoring for orphaned snapshots left by SMVI”

  1. Hi,

    I’m in the phase of testing this script but I always receive this error:

    Undefined subroutine &VirtualMachineSnapshotInfo::snapshotInfo called at ./check_smvi_snapshots.pl line 51

    I have installed VMware-vSphere-Perl-SDK-4.1.0-254719.i386

    Do you have an idea what’s the problem here?

    Thanks in advance

    Reply
  2. Hi again,

    I was able to find out why the script failed:

    I changed line 51 from:
    foreach my $childSnapshot (@{$vm->snapshot->snapshotinfo->rootsnapshotlist}) {

    to:

    foreach my $childSnapshot (@{$vm->snapshot->rootSnapshotList}) {

    Now it works fine…
    Johann

    Reply
  3. im a bit confused. how do legit smvi snaps not generate false positives on snaps > 0?
    what exactly in the snap list is telling of an orphaned by smvi snapshot?

    Reply
    • @Nick,

      SMVI runs once a day in my environment. Let’s say that time is at 0200 in the morning. So, at 0800, any SMVI snapshot that is still present is there erroneously.

      If you run SMVI more frequently than that, you will have to add some logic to check for the age of the snapshot and compare that to an acceptable length of time for it to exist.

      Hope that helps,

      Andrew

      Reply

Leave a Reply