What's responsible for the crashing servers?
I have a Unisys ES7000 Windows 2000 data center server directly attached by four pathways to an EMC DMX disk array. The server boots form the same EMC array.
After restarting a number of Control-M processes, the server disk transfer per second and disk queue length rose to 1121 transfer per second and 23 respectively. As disk activity rose, the applications suffered time out and eventually the server crashed.
EMC does not state any limit for transfer to and from a DMX array. The array is set to a stripe set of two on all metas and the disk drives being accessed via Veritas Volume Manager are set up as volume sets.
Is there any possible reason related to the EMC disk array as to why this server crashed under these conditions? All other counter and logs show no issues.
From some research it was found that EMC has in the past seen a hit on a loop with CRON, not Control-M but a similar scheduling product. The resolution was found to be that CRON was in a scheduling loop and exceeded its thresholds causing the system to hang. I do not believe this is the problem.
Veritas stated that a problem can occur intermittently if 3.0 of Veritas Volume Replicator is installed without first removing 2.7 Veritas Volume Replicator.
Microsoft has several recommendations around system crashes. The resolution on each was to go to the latest service level pack.
Unisys has some documentation on PowerPath and these problems where corrected as part of Powerpath V2.0.4.
Based on the results of this research, I do not feel that the problem is related to the EMC disk array
. Disk activity probably arose in support of all the processes being started.
Editor's note: Do you agree with this expert's response? If you have more to share, post it in one of our
This was first published in March 2004