cDOT Performance Monitoring Using PowerShell

Performance monitoring is a complex topic, but it’s something that is vital to the successful implementation and maintenance of any system. In the past I’ve had several posts about using Perl for gathering performance statistics from a 7-mode system (using ONTAP 7.3.x, which is quite old at this point), so I thought it might be a good time for an update.

I originally documented some of this information in a response on the NetApp Community site. This post expands on that a bit and documents it externally.

The NetApp PowerShell Toolkit has three cmdlets which we can use to determine what objects, counters, and instances are available, and a fourth cmdlet to actually collect the data.

Finding the Right Performance Object

Performance reporting in the clustered Data ONTAP API is broken out by two things: Object and Counter. In order to monitor something, for example aggregate performance, we need to find the object which pertains to that “something”. We do this using the Get-NcPerfObject cmdlet.

Throughout the rest of this post I will be using the example of aggregate monitoring, specifically how many reads and writes are being done against an aggregate.

For my cDOT 8.3 cluster this returned 358 items, which is a lot of different categories of monitoring! For many things we can help reduce the ones to consider by using the PrivilegeLevel. The most commonly monitored things are going to be at either admin or advanced privilege level, whereas diag is used for very detailed, infrequently needed, counters. To view non-diag objects, we change the command slightly.

This results in just 113 objects returned, a much shorter list to consider. This privilege level also indicates how much permission on the cluster the user collecting the information will need. A user with diag privileges is going to have considerably more permission on the cluster than one with only admin or advanced.

Finding the Counters

Now that we know what objects are available they give us a categorical view of what’s available. To find out what counters are being collected for each one we use the Get-NcPerfCounter cmdlet. Using the aggregate object as an example, we see the following:

Notice that, once again, I removed the counters which are at the diag level. You may want to look at them, but for the most part they are things that only infrequently need to be monitored because they are very low level details.

I included the properties field because it’s very important…it tells us how to read the counter. From the API documentation:

  • raw: single counter value is used
  • delta: change in counter value between two samples is used
  • rate: delta divided by the time in seconds between samples is used
  • average: delta divided by the delta of a base counter is used
  • percent: 100*average is used

Looking at the descriptions, it appears that we want to look at the user_reads, user_writes, and total_transfers counters to determine how much activity is happening on our aggregate. Each of these is a rate counter, which means we need to measure it once, wait some known amount of time (e.g. 5 seconds), then measure again and divide by the number of seconds.

Instances of the Object

Now that we know the objects and counters, and we’ve determined what we want to monitor, we need to find the instances. To do that we use the Get-NcPerfInstance cmdlet.

I excluded root aggregates from this listing using the Where-Object snippet because I’m not interested in those at this time.

Reporting Performance

We now have everything needed to monitor performance: the object, the counters, and the instance. We use the Get-NcPerfData cmdlet to query for information.

Here is what it looks like in action:

Remember that these are rate counters. To determine the values, we simply measure at two intervals and divide…

And the output, remember this is a per second average over the time between polls (5 seconds in this instance):

We can modify this slightly to get a per-second report for an aggregate:

Giving us an easy to read, per second, output of the number of reads, writes, and total transfers for our aggregate…

Performance Monitoring is Fun!

This has been just a short introduction to performance monitoring of a cDOT system using the PowerShell Toolkit. There is a huge number of things that can be monitored, and you can choose to display the information however you like…maybe a real-time report of performance for troubleshooting, intermittent collection to go into a summary report, collection at regular intervals to feed into a trend analysis tool.

Please reach out to me using the comments below or the NetApp Community site with any questions about how to collect performance information from your systems.

5 thoughts on “cDOT Performance Monitoring Using PowerShell

  1. Hey Andrew, great writeup! I’m “Magyk” on the Netapp support forum, the guy who posed the original question. I was talking to the guys at the local Netapp office and they said they knew you and that you were a good guy. I had a quesstion:

    For the -name parameter you’re using “aggregates” as an example. What other options are available for that parameter, or better yet, is there a ways to get a list of options?

    Thanks.

  2. Hi James,

    Thanks for reading! I hope that this response, and the one in the communities, has been helpful.

    The “-Name” parameter comes from the performance object. Use “Get-NcPerfObject” to view a list…there is 358 returned from my cDOT 8.3 system, so it’s quite a few to sort through. To make it a bit easier, show the description property:

    Get-NcPerfObject | Select Name,PrivilegeLevel,Description

    You can also view them from the ClusterShell:

    set -privilege advanced -confirmations off
    statistics catalog object show

    Remember that the user you are connected to the cluster with must have permissions to the object, and just like ClusterShell there are three privilege levels: admin, advanced, and diag.

    Andrew

  3. Thanks for the script. I have few doubts can you clarify?

    Is {read,write,total}_data is given in bytes? and to get actual latency it will be divided by no of ops? Is latency given in micro seconds?

    Thanks

    • Hello Kannan,

      Yes, the data values are given in bytes and latency is in microseconds. Occasionally capacity counters will be in blocks, you can see the units using the Get-NcPerfCounter cmdlet.

      Latency will have a base counter, identified using the Get-NcPerfCounter cmdlet as well.

      Hope that helps!

      Andrew

Leave a Reply

8b102e2e083767e037e42594e6a1ed6aHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH