Willard
Guest
|
Posted:
Wed Nov 16, 2005 5:58 pm Post subject:
E2K3 Cluster - MSExchangeSA hangs on failover |
|
|
I have a two node (A/P) cluster. When I move the Exchange resources from
Node1 to Node2, it takes longer than expected (~5min) and I think it is
related to the MSExchangeSA servcie not completely stopping. When on look on
Node1 after the resources are failed over to Node2, the status of the
MSExchangeSA service is 'Stopping'. I've waited around 4 hours for this to
change, but it doesn't.
The failover of the resources to Node2 works. When I try to fail the
resources back to Node1, it fails, unless I manually kill the mad.exe process
from Task Manager on Node1 (which changes the status from 'Stopping' to
'BLANK') before trying to fail the resources from Node2 back to Node1. I
notice the same behavior when failing the resources between either node.
I am running Windows Server 2003 SP1 with all current critical updates and
Exchange 2003 SP2 on DELL hardware (6850) connected to a DELL CX300.
Any help is greatly appreciated.
|
|
Willard
Guest
|
Posted:
Thu Nov 17, 2005 1:59 am Post subject:
RE: E2K3 Cluster - MSExchangeSA hangs on failover |
|
|
Sasa
Here is the sequence of events upon initiation of Exchange group failover in
the Event Log
1) At a time of 0:00, 3 to 4 messages about resources coming offline
(MSSearch, MSTransport)
2) At a time of 3:00, 9 consecutive error messages (with event IDs 1004,
1020, 1024 and 1012) indicating 'Resource failed to be taken offline because
of a timeout' where Resource = MTA, IS, SMTP and Virtual Instance
3) At a time of 3:01, a message inidcating the Exchange SA is coming offline
4) At a time of 3:02, multiple messages indicating the SA is coming offline
5) At a time of 3:07, messages indicating the SA is coming online the other
node
This behavior is very consistent and predicatable. From step 1 to step 2,
the following resources have a status of offline pending in the Cluster MMC:
IS, SMTP, MTA and SA. During this same time, the status of the MTA in the
Services applet is 'Stopping'.
Once the steps 3 and 4 are complete, the status of the SA in the services
applet remains in a 'Stopping' state even though it successfully failed over
to the other node. I must use Task Manager to kill mad.exe on the original
node to enable the failback process to work.
I have simplified the environment by removing the additional storage groups
I created (per an article related to one of the event ids listed above),
which did not make a difference.
Due to the delay between steps 1 and 2, the failover process takes in excess
of 4 minutes. Should I expect something better than this?
Hope somebody can make sense of this. Thanks in advance
"Sasa Milovanovic" wrote:
| Quote: | Hi Willard,
eny logs in event viewer? Can you manually take offline mentioned resource?
--
Regards,
Sasa Milovanovic
MCSE:Messaging
sasa.milovanovic(at)exchangemaster.net
www.eugeurope.org
Korisnik "Willard" napisao je:
I have a two node (A/P) cluster. When I move the Exchange resources from
Node1 to Node2, it takes longer than expected (~5min) and I think it is
related to the MSExchangeSA servcie not completely stopping. When on look on
Node1 after the resources are failed over to Node2, the status of the
MSExchangeSA service is 'Stopping'. I've waited around 4 hours for this to
change, but it doesn't.
The failover of the resources to Node2 works. When I try to fail the
resources back to Node1, it fails, unless I manually kill the mad.exe process
from Task Manager on Node1 (which changes the status from 'Stopping' to
'BLANK') before trying to fail the resources from Node2 back to Node1. I
notice the same behavior when failing the resources between either node.
I am running Windows Server 2003 SP1 with all current critical updates and
Exchange 2003 SP2 on DELL hardware (6850) connected to a DELL CX300.
Any help is greatly appreciated. |
|
|
Sasa Milovanovic
Guest
|
Posted:
Thu Nov 17, 2005 1:59 am Post subject:
RE: E2K3 Cluster - MSExchangeSA hangs on failover |
|
|
Hi Willard,
eny logs in event viewer? Can you manually take offline mentioned resource?
--
Regards,
Sasa Milovanovic
MCSE:Messaging
sasa.milovanovic(at)exchangemaster.net
www.eugeurope.org
Korisnik "Willard" napisao je:
| Quote: | I have a two node (A/P) cluster. When I move the Exchange resources from
Node1 to Node2, it takes longer than expected (~5min) and I think it is
related to the MSExchangeSA servcie not completely stopping. When on look on
Node1 after the resources are failed over to Node2, the status of the
MSExchangeSA service is 'Stopping'. I've waited around 4 hours for this to
change, but it doesn't.
The failover of the resources to Node2 works. When I try to fail the
resources back to Node1, it fails, unless I manually kill the mad.exe process
from Task Manager on Node1 (which changes the status from 'Stopping' to
'BLANK') before trying to fail the resources from Node2 back to Node1. I
notice the same behavior when failing the resources between either node.
I am running Windows Server 2003 SP1 with all current critical updates and
Exchange 2003 SP2 on DELL hardware (6850) connected to a DELL CX300.
Any help is greatly appreciated. |
|
|