Monitoring Processes on a Windows Cluster with Operations Manager

Hello once again!

I was working with a customer recently attempting to monitor the various inner workings of a sprawling Enterprise Application when I came across this requirement.

The application was ‘semi-cluster aware’  There was an ‘Application Manager’ service which was cluster aware, however this Application Manager service was able to stop and start processes on the Active cluster node.  The issue was that these processes were the ‘nuts-n-bolts’ of the application.  These processes required monitoring.

Now according to TechNet, this is wholly possible, with a few pre-requisites.  What you need to do is add the Virtual Server (the cluster Virtual Computer Object) to a group, and then from with in the Process Monitor Template you can target the monitor at this group.  I did this, however the results were not as I had hoped.

As you can see, the processes are listed, but left as unmonitored.

05-07-2013 11-50-32

I had a look at the even log on the nodes which made up the cluster and found the following event being logged:

05-07-2013 11-56-43

So, the reason for the processes being monitored it appears is down to the fact that the scripts used to perform the monitoring an non-remotable.  Time for a little background on how monitoring of clusters actually works…..

Under usual circumstances, if we want to monitor a server with Operations Manager we install an agent on that server.  This is known as agent-based monitoring.  If for some reason we cannot do this we can still monitor the server, but using agent-less monitoring.  Now, in a clustered scenario we would install the agent on both cluster nodes which would allow us to see the health of the nodes themselves.  However, in a clustered scenario we also have a Virtual Computer Object.  Since this object is virtual we cannot install an agent on it.  In this case the active node in the cluster will monitor the Virtual Computer Object agentlessly   (if that is even a word!)  Now we can see why we are seeing the events.  The active node is trying to monitor the process running on the Virtual Computer Object (which is is doing agentlessly) and cannot since the monitor/rule is no-remotable.

I decided to export the Management Pack containing the configuration and take a look to see if there is anything that could be done here.  Here is a snippet of the XML:

<UnitMonitor ID=”ProcessMonitoring_5947ddc22bc347feafd540bddbe8c835.ProcessInstanceCountMonitor” Accessibility=”Public” Enabled=”true” Target=”ProcessMonitoring_5947ddc22bc347feafd540bddbe8c835MonitoredProcess” ParentMonitorID=”Health!System.Health.AvailabilityState” Remotable=”false” Priority=”Normal” TypeID=”MicrosoftSystemCenterProcessMonitoringLibrary!Microsoft.SystemCenter.Process.ProcessInstanceCountMonitorType” ConfirmDelivery=”false”>
<AlertSettings AlertMessage=”ProcessMonitoring_5947ddc22bc347feafd540bddbe8c835.ProcessMonitoring.ProcessInstanceCountOutsideRange.AlertMessage”>
<AlertParameter1>$Data[Default=’0′]/Context/DataItem/Item0Context/DataItem/ProcessInformations/ProcessInformation[./ProcessName =’tscmtoris.exe’]/ActiveInstanceCount$</AlertParameter1>
<OperationalState ID=”OK” MonitorTypeStateID=”InsideRangeState” HealthState=”Success” />
<OperationalState ID=”Error” MonitorTypeStateID=”OutsideRangeState” HealthState=”Error” />

As you can see here, we have a unit monitor for one of our processes, and the Remotable property is set to false.  I decided to change this to true and see what the effect was.  I imported the Management Pack post-modifications……..

16-07-2013 11-20-55

The processes are now monitored!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

search previous next tag category expand menu location phone mail time cart zoom edit close