Quantcast
Channel: THE SAN GUY
Viewing all articles
Browse latest Browse all 214

Automating Storage Processor % Utilization alerts with EMC Performance Manager

$
0
0

I was tasked with coming up with a way to get email alerts whenever our SP utilization breaks a certain threshold.  Since none of the monitoring tools that we own will do that right now, I had to come up with a way using custom scripts.  This is my 2nd post on the same subject, I removed my post from yesterday as it didn’t work as I intended.  This time I used EMC’s Performance Manager rather than pulling data from the SP with the Navisphere CLI.

First, I’m running all of my bash scripts on a windows sever using cygwin.  These should run fine on any linux box as well, however.  Because I don’t have a native sendmail configuration set up on the windows server, I’m using the control station on the Celerra to actually do the comparison of the utilization numbers in the text files and then email out an alert.  The Celerra control station automatically pulls the file via FTP from the windows server every 30 minutes and sends out an email alert if the numbers cross the threshold.  A description of each script and the schedule is below.

Windows Server:

Export.cmd:

This first windows batch script runs an export (with pmcli) from EMC Performance Manager that does a dump of all the performance stats for the current day.

For /f “tokens=2-4 delims=/ ” %%a in (‘date /t’) do (set date=%%c%%a%%b)

C:\ECC\Client.610\PerformanceManager\pmcli.exe -export -out c:\cygwin\home\scripts\sputil999_interval.csv -type interval -class clariion -date %date% -id APM00400500999

Data.cmd:

This cygwin/bash script manipulates the file export from above and ultimately creates two single text files (one for SPA and one for SPB) with a single numerical value of the most recent SP Utilization.  There are a few extra steps at the beginning of the script that are irrelevant to the SP utilization, they’re there for other purposes.

#This will pull only the timestamp line from the top
grep -m 1 “/” /home/scripts/sputil/0999_interval.csv > /home/scripts/sputil/timestamp.csv
# This will pull out only the “disk utilization” line.
grep -i “^% Utilization” /home/scripts/sputil/0999_interval.csv >> /home/scripts/sputil/stats.csv
# This will pull out the disk/LUN title info for the first column
grep -i “Data Collected for DiskStats -” /home/scripts/sputil/0999_interval.csv > /home/scripts/sputil/diskstats.csv
grep -i “Data Collected for LUNStats -” /home/scripts/sputil/0999_interval.csv > /home/scripts/sputil/lunstats.csv
# This will create a column with the disk/LUN number
cat /home/scripts/sputil/diskstats.csv /home/scripts/sputil/lunstats.csv > /home/scripts/sputil/data.csv
# This combines the disk/LUN column with the data column
paste /home/scripts/sputil/data.csv /home/scripts/sputil/stats.csv > /home/scripts/sputil/combined.csv
cp /home/scripts/sputil/combined.csv /home/scripts/sputil/utilstats.csv
 
#  This removes all the temporary files
rm /home/scripts/sputil/timestamp.csv
rm /home/scripts/sputil/stats.csv
rm /home/scripts/sputil/diskstats.csv
rm /home/scripts/sputil/lunstats.csv
rm /home/scripts/sputil/data.csv
rm /home/scripts/sputil/combined.csv
 
# This next line strips the file of all but the last two rows, which are SP Utilization.
# The 1 looks at the first character in the row, the D specifies “starts with D”, then deletes rows meeting those conditions.
awk -v FS=”" -v OFS=”" ‘$1 != “D”‘ < /home/scripts/sputil/utilstats.csv > /home/scripts/sputil/sputil.csv
 
#This pulls the values from the last column, which would be the most recent.
awk -F, ‘{print $(NF-1)}’ < /home/scripts/sputil/sputil.csv > /home/scripts/sputil/sp_util.csv
 
#pull 1st line (SPA) into separate file
sed -n 1,1p < /home/scripts/sputil/sp_util.csv > /home/scripts/sputil/spAutil.txt
#pull 2nd line (SPB) into separate file
sed -n 2,2p < /home/scripts/sputil/sp_util.csv > /home/scripts/sputil/spButil.txt
#The spAutil.txt/spButil.txt files now contain only a single numerical value, which would be the most recent %utilization from the Control Center/Performance Manager dump file.
 
#Copy files to web server root directory
cp /home/scripts/sputil/*.txt /cygdrive/c/inetpub/wwwroot
 

Celerra Control Station:

CelerraArray:/home/nasadmin/sputil/ftpsp.sh

The script below connects to the windows server and grabs the current SP utilization text files via FTP every 30 minutes (via a cron job).

#!/bin/bash
cd /home/nasadmin/sputil
ftp windows_server.domain.net <<SCRIPT
get spAutil.txt
get spButil.txt
quit
SCRIPT
 

CelerraArray:/home/nasadmin/sputil/spcheck.sh:

This script does the comparison check to see if the SP utilization is over our threshold. If it is, it sends an email alert that includes the %Utilization number in the subject line of the email. To change the threshold setting, you’d need to change the THRESHOLD=<XX> line in the script.  The line containing printf “%2.0f” converts the floating point value to an integer, as bash scripts don’t recognize floating point values.

#!/bin/bash

SPB=`cat /home/nasadmin/sputil/spButil.txt`
SPBcheck= printf “%2.0f” $SPB > /home/nasadmin/sputil/spButil2.txt
SPB=`cat /home/nasadmin/sputil/spButil2.txt`
echo $SPB
THRESHOLD=50
if [ $SPB -eq 0 ] && [ $THRESHOLD -eq 0 ]
then
        echo “Both are zero”
elif [ $SPB -eq $THRESHOLD ]
then        
        echo “Both Values are equal”
elif [ $SPB -gt $THRESHOLD ]
then         
        echo “SPB is greater than the threshold.  Sending alert” 
        uuencode spButil.txt | mail -s “<array_name> SPB Utilization Alert: $SPB % above threshold of $THRESHOLD %” notify@domain.com
else        
echo “$SPB is lesser than $THRESHOLD”
fi

CelerraArray Crontab schedule:

The FTP script is currently set to pull SP utilization files.  Run “crontab –e” to edit the scheduler.  I’ve got the alert script set to run at the top of the hour and half past the hour, and the updated SP files from the web server are FTP’d in a few minutes prior.

[nasadmin@CelerraArray sputil]$ crontab –l
58,28 * * * * /home/nasadmin/sputil/ftpsp.sh
0,30 * * * * /home/nasadmin/sputil/spcheck.sh
 

Overall Scheduling:

Windows Server:

Performance Manager Dump runs 15 minutes past the hour (exports data)
Data script runs at 20 minutes past the hour (processes data to get SP Utilization)

Celerra Server:

FTP script pulls new SP utilization text files at 28 minutes past the hour
Alert script runs at 30 minutes past the hour

The cycle then repeats at minute 45, minute 50, minute 58, and minute 0.

 

12/14/12 Update:

There is an alternate method to gather data for creating alerts if you don’t have EMC’s Control Center. I don’t have scripts written that use this command, however. The Navisphere CLI command to get busy/idle ticks for the Storage processors is naviseccli -h <SP_IPaddress> getcontrol -cbt.

The output looks like this:

Controller busy ticks: 1639432
Controller idle ticks: 1773844

The SP utilization statistics outputted are an average of the utilization across all the cores of the SP’s processors since the last reset. To get the actual point-in-time SP CPU utilization from this output requires a calculation. You need to poll twice, create a delta for the individual counters by subtracting the earlier value from the later, and apply this formula:

Utilization = Busy Ticks / (Busy Ticks + Idle Ticks)



Viewing all articles
Browse latest Browse all 214

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>