One of my customers is using SharePoint 2007 with 2 WFE servers and Windows 2003 Network Load Balancer. They asked me to take a look because when WFE1 server is shutdown SharePoint shows the message:
When I tried to browse the site on WFE2 directly from Internet Information Services the related Application Pool is automatically disabled. The Windows event viewer shows:
And before this error a few other errors did occur like:
A process serving application pool …… terminated unexpectedly. The process id was … the process exit code was 0xfffffffffff
The first thing I did was enable the Default IIS website because I could not match this to be a SharePoint related error. Even the default IIS page gives the Service Unavailable error and the default application pool is terminated.
After looking for the error on internet for a while I did find this article from Microsoft http://support.microsoft.com/Default.aspx?kbid=2009746.
I first looked at all KB installed on the server but the one mentioned it the above site was not found. The message is most similar so I decided to reinstall Windows 2003 SP2 and it fixed the problem!!
Still when shutting down WFE1, WFE2 did not start the SharePoint site, Looking at the event viewer application log did display a well known error:
Adding the mentioned user to the DCOM object IIS WAMREG local Activation security fixed this issue. Know when shutting down WFE1 SharePoint sites are stilled displayed true WFE2.
The customer also asked why users are not load balanced when using the Windows NLB. For SharePoint the NLB is configured with the Affinity set to Single, this is because SharePoint can not handle session sharing over multiple servers from one client accessing SharePoint. You can test this by only disabling the IIS service on the server you have connected to for opening your SharePoint application for example on WFE2. When doing this the SharePoint site is not working anymore after refreshing IE. You would expect that you would be pointed to the other server WFE1 but this seems not to happen, why?
This is because NLB does not provide failover for applications or servers only for failure of the server itself - meaning it will not poll a machine to check that a particular service or facility is available and drop the box from the NLB cluster if it does not respond. Therefore requests are still sent to the stopped web site/application pool as they would normally unless a monitoring application is configured to remove servers from the cluster under certain circumstances. NLB provides the services required by monitoring applications to remove servers from the cluster remotely. For example a monitor could be set up to check that a certain web site or application pool responds and remove the server from the cluster if it does not.