HPE Storage Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 1 post ] 
Author Message
 Post subject: System Reporter Sampler Service Watchdog
PostPosted: Tue Apr 27, 2010 3:05 pm 
Site Admin
User avatar

Joined: Tue Aug 18, 2009 10:35 pm
Posts: 1328
Location: Dallas, Texas
Tags: ORA-12541, TNS:no listener, Exiting startloop, 3PAR System Reporter Sampler

We run System Reporter 2.6 on a windows VM that connects to a physical host running oracle.
From time to time, the oracle system will be taken offline, resulting in the "Sampler" service for 3PAR System Reporter quiting until someone notices its not working anymore and then manually restarts the service after the remote database is back online.

I tried to use the built in Windows service recovery tab to auto restart the service when it fails.... however, the keyword there is "fail". When the 3PAR Sampler service stops due to database connection issues, it self terminates cleanly... not triggering a service FAILure, hence WinServe leaves it alone. In order for WinServe to restart the service, it needs to return an error code not equal to 0. Sure enough, when the DB goes offline and the Sampler service quits, it returns a 0. *sigh*

As a workaround I wrote the following quick script, and run it with the task scheduler every 30 minutes on the System Reporter server. This way, System Reporter will automaticaly restart and reconnect to the database as needed. Psservice is a free download from Microsoft, part of the PSTOOLS kit. Change the e:\3PAR_watchdog.log to be whatever location you like.

copy and paste this into a text file names 3PAR_watchdog.bat
Code:
psservice query "3PAR System Reporter sampler" | find "RUNNING"
if errorlevel 1 (net start "3PAR System Reporter sampler" & echo %date% %time% >>e:\3PAR_watchdog.log)


Be advised, there is one gotcha... the psservice command throws up a gui asking you to agree to the license agreement the first time you run it, per windows user. So you cant run this as local system. You will need at minimum a local service account with permissions to start/stop query services. Log in interactively as the newly created service account user you will be using to run the script via the scheduler, run the psservice and click "ok" to the EULA. After that, the script will run properly in the background.

I've suggested to 3PAR via my SE to change the sampler process to use return codes other than zero for errors... but hind sight being 20/20 I think it would be primo if the sampler service locally cached metrics that were unable to post to the DB until the DB comes back online an hour, a day or more later... adding in a option to alert the admin via email if anything is wrong is a bonus too.

_________________
Richard Siemers
The views and opinions expressed are my own and do not necessarily reflect those of my employer.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 1 post ] 


Who is online

Users browsing this forum: No registered users and 20 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt