Cacti: Monitor protocol statistics for NetApp volumes

Update 2011-07-10:  Due to a template export error with Cacti, the import was failing for a lot of people. I apologize for taking so long to fix the templates, however they should be fixed now. Thank you to everyone who pointed out the errors and the fix in the comments.


I have made no secret that I use two applications daily to monitor my infrastructure: Nagios and Cacti. I have created a fair number of scripts (and hopefully publishing more soon) to help Nagios monitor the different parts of the infrastructure, however I haven’t published many of my Cacti scripts previously.

One of the most useful is the config that I use to monitor the different protocol stats for volumes. I created an indexed query so that the single script, and accompanying XML file, are capable of monitoring all the volumes, and I can select which graphs to create for each volume. The polling script is loosely based off of the multi-protocol realtime volume statistics script that I created some time ago.

Download the updated template and script(s) here.

Some examples…

Total Operations, Latency
total_ops
  
total_lat
CIFS Operations, Latency
cifs_ops
  
cifs_lat
NFS Operations, Latency
nfs_ops
  
nfs_lat
iSCSI Operations, Latency
iscsi_ops
  
iscsi_lat

Included in the templates are graphs for FCP and SAN operations, however I have none of those on my filer, so I have no graphs to show you.

These are especially useful for volumes that have multiple types of access happening. For example, one of the systems that we have provides home directories to some users has both NFS and CIFS access enabled. It is extremely helpful to see latency for each of the protocols as it can help diagnose certain errors…for example, our NIS domain had an error at one point that was causing authentication/authorization to be extremely slow, by monitoring the NFS latency, we helped narrow down the problem.

Because it is an indexed script query, you can select volumes to have each type of graph created for. This makes it easy to select the volumes that you want to see, for instance, NFS latency using the standard list of objects that Cacti provides.

The setup is fairly simple…you’ll need the XML file that describes the inputs and outputs that Cacti communicates to the script, the perl script itself, and import the graph templates. After placing the perl script in your $cacti_path/scripts directory, edit it and make sure that the NetApp SDK files are available. I usually put them in perl’s main library path, but if you have them in the directory with your script(s) just make sure that a use lib "/path/to/NetApp/sdk" is in the script at the top.

After placing the na-cacti-volume-stats.xml file in $path_cacti/resource/script_queries directory, the only other modification you should need to make is to put the username and password that will be used to connect to the NetApp(s) in the XML file. I don’t particularly like this, for one, it’s a security risk, and two, it’s very static. You are welcome to modify the perl script so that it handles authentication a different way, but due to how Cacti behaves and the information that it passes (or doesn’t pass, as the case may be), the only way I have found to provide credentials is outside of the Cacti interface.

Anyway, that irritation aside, make sure that the permissions to the file are tightened such that only the Cacti user can access it, which helps to mitigate the security risk. The second part of mitigation is to ensure that the user which connects and polls the NetApp has limited access. You should not (SHOULD NOT!) be using root, or any other user in the Administrators group, to connect. The user you use to connect doesn’t need to modify anything, so they shouldn’t have those role based accesses enabled.

That should get you started. If you have any issues with the script or templates, please let me know in the comments.

31 thoughts on “Cacti: Monitor protocol statistics for NetApp volumes

    • Manuel,

      You can monitor LUN statistics using a script very similar to this. I will work on modifying the script and making some templates. Hopefully I can get it posted sometime soon!

      Thanks for reading,

      Andrew

  1. Andrew, I had a problem when I imported the cacti template, I had a lot of unmet dependencies. I went through and tried to add the other templates that were referenced yet the same happened when I re imported. Is there something I am missing?

    • Greg,

      To be honest, I’m not sure what would cause the unmet dependencies. The templates were exported from an 0.8.7f instance, so if you are using the 0.8.7g, that could explain some of it perhaps.

      I am at VMworld this week, when I return I will re export the templates from the newest version of Cacti. If you have a list of the errors that may prove useful as well.

      Thanks for reading,

      Andrew

  2. Andrew, I’m running 8.7e. I went through and matched up all of the dependencies in the data queries, exported the template and reimported it and it looks like it is good to go.

    Thanks
    Greg

  3. I am using Cacti 0.8.7g. Is there a fix for the unmet dependencies when importing the template? Also when i’m not getting any data from the data queries, is this normal since the queries are run through background perl scripts?

    Thanks,

    Chris

  4. I am also using Cacti 0.8.7g and having the same unmet dependencies issue. The script runs fine from the command line the but the graph for IOPs doesn’t populate.
    Any thoughts?

    Brian

  5. The unmet dependencies are due to a templating bug in an earlier version of Cacti. The author of the templates needs to run a command in the cli directory called ‘repair_templates.php’. Then, re-export them to correct this problem.

    TheWitness

  6. Thanks for the templates – exactly what I was looking for. I also ran into issue with dependencies due to the template bug referenced in comment 14. Got the backend netapp api working, just need the repaired templates.

  7. Hi When I run the perl script I get the error

    root@squid3:/var/www/cacti/scripts# ./na-cacti-vol-latency.pl
    Name “NaServer::S” used only once: possible typo at ΓΌ%_Γ€
    line 467.
    Missing or incorrect arguments!

    na-cacti-vol-latency.pl –hostname|-H
    –username|-u
    –password|-p
    –volume|-v
    –action|-a index|query|get
    –protocol|-P all|nfs|cifs|san|fcp|iscsi
    –operation|-o read|write|other
    –type|-t ops|latency

    Password is optional, if it is not supplied at the command line the script
    will prompt for it.

    root@squid3:/var/www/cacti/scripts#

    Please help me

    • Kurt,

      Please let me know what version of the OnTAP SDK you are using, there was some errors similar to this I experienced with some versions of the SDK.

      Thank you,

      Andrew

  8. Hi,
    thansk for sharing template,
    i am trying to get this done.. but still not able to do it πŸ™

    can someone guide me….

    thanks
    aman

  9. Hi buddy,

    thanx for creating these graphs makes our lives a lot eaiser…. thanx again.

    i only new to cacti… i have downloaded your xml and scripts.
    i have imported the xml for netapp into cacti…. the graph shows but doesnt get and readings showing in it. i think something with the pel script i have to add somewhere on the cacti host… please help….

    thanx you much πŸ™‚

  10. Andrew,

    I was wondering if you could help me out, I’m guessing it’s something simple. I followed your directions. Everything went fine until I went to add graphs for the netapp through the cacti interface. I get this:

    https://picasaweb.google.com/lh/photo/JAq6A3RG-oF-xA-xyj1jhdMTjNZETYmyPJy0liipFm0?feat=directlink

    As you can see, it won’t let me add a graph because the data query is returning 0 rows. If I run the data query from the command line as the cacti user, it works and I get data back from the netapp.

    Any help would be really appreciated.

    Thanks,
    Dan

    • Hi Dan,

      Check the permissions of the script and the perl modules. Sometimes it’s helpful to su to the cacti user and execute the query (which you can see in the XML file) to see what errors are being generated.

      Thanks for reading,

      Andrew

  11. Do you know if the scripts work with ontap 8.0 and using the Netapp Manageabilty NM SDk 4.1?
    The scripts are working great on a Netapp using same SDK and running ontap 7.3.6 but are not working on Netapps running 8.0x
    Any ideas?
    Index returns 0

  12. Hi,
    Thanks for these scripts, they are much appreciated and appear to be working OK for us however I have a couple of basic questions;

    1) What is the latency actually measured in (is it ms) ?

    2) What is configured a reasonable figure for a production instance (ie is 100 good/bad/average ?)

    Thanks

  13. I’ve got everything set up as instructed, and the graphs are being produced, however the values are all -nan or 0.

    A return of running the perl script manually with the correct options passed results in values other than 0.

    Where should I look?

    • I actually think I found it here:

      08/13/2014 03:35:16 PM – SPINE: Poller[0] Host[5] ERROR: Empty result [192.12.12.37]: ‘perl /var/www/localhost/htdocs/cacti/scripts/na-cacti-vol-latency.pl -H *.*.*.* -u \”REDACTED\” -p \”REDACTED\” -a get -o write -P all -t ops proxha’
      08/13/2014 03:35:16 PM – SPINE: Poller[0] Host[5] TH[1] DS[43] SCRIPT: perl /var/www/localhost/htdocs/cacti/scripts/na-cacti-vol-latency.pl -H *.*.*.* -u \”REDACTED\” -p \REDACTED\” -a get -o write -P all -t ops proxha, output: U

      It’s not passing the -v before it gives the command the volume.

  14. Hi ,

    I didnt understand how to setup this.
    I got the following file from NetApp site : netapp-manageability-sdk-5.7
    Now I will download these xml templates.(Cacti_Volume_Stats_V2)

    Can someone direct me on how to proceed.

Leave a Reply

2484b28ce23cf1640dce21fdd0278ff5,,,,,,