| Author |
Message |
Steven
Guest
|
Posted:
Thu Nov 25, 2004 1:16 am Post subject:
Cluster Failover causing loss of mail |
|
|
Setup:
Win2003 Active\Passive Cluster
Exchange 2003
Hitachi SAN for storage
When I failover from one node to another the SMTP resource fails when
coming offline, but recovers when it tries to bring it online on the
other node. When the other cluster node comes online the server spits
out a load (100's) of DSNs like the below:
Your message did not reach some or all of the intended recipients.
Subject: subject
Sent: 11/20/2004 3:21 AM
The following recipient(s) could not be reached:
Smith, John on 11/23/2004 9:41 PM
Could not deliver the message in the time limit specified.
Please retry or contact your administrator.
<servername.domain.com #4.4.7>
After talking to some people, the original messages were never
delivered (2nd time I have lost data in 7 years...arrgh). Looking at
the Sent time, that was the last time a failover occured, so on that
date and time a good number of messages got sucked into a blackhole
and when the cluster was failed over again (3 days later), they got
spit back out as DSN's. I am trying to figure out why these messages
are being sucked into a blackhole during a failover. I suspect that the
failure of the SMTP resource during failover is part of the problem, but
the cluster logs are not giving me anything that is really useful. Has
anyone seen this before or have any suggestions?
Thanks,
Steve
|
|
| Back to top |
|
 |
Rodney R. Fournier [MVP]
Guest
|
Posted:
Thu Nov 25, 2004 2:13 am Post subject:
Re: Cluster Failover causing loss of mail |
|
|
I have not experienced this. Where are you SMTP logs? On the SAN, right, but
which LUN? Do the logs share the disk with anything, like the Exchange
databases?
http://www.gaparks.com/ndr.htm says that a 4.4.7 means:
"This message usually indicates an issue on the receiving server. Verify the
recipient addresses, as well as whether the receiving server is configured
to receive messages correctly. Resending the message places it in the queue.
If the receiving server is available, message delivery succeeds."
Cheers,
Rod
MVP - Windows Server - Clustering
http://www.nw-america.com - Clustering
http://msmvps.com/clustering - Blog
"Steven" <evetsleep@yahoo.com> wrote in message
news:%23QUkgol0EHA.1564@TK2MSFTNGP09.phx.gbl...
| Quote: | Setup:
Win2003 Active\Passive Cluster
Exchange 2003
Hitachi SAN for storage
When I failover from one node to another the SMTP resource fails when
coming offline, but recovers when it tries to bring it online on the
other node. When the other cluster node comes online the server spits
out a load (100's) of DSNs like the below:
Your message did not reach some or all of the intended recipients.
Subject: subject
Sent: 11/20/2004 3:21 AM
The following recipient(s) could not be reached:
Smith, John on 11/23/2004 9:41 PM
Could not deliver the message in the time limit specified.
Please retry or contact your administrator.
servername.domain.com #4.4.7
After talking to some people, the original messages were never
delivered (2nd time I have lost data in 7 years...arrgh). Looking at
the Sent time, that was the last time a failover occured, so on that
date and time a good number of messages got sucked into a blackhole
and when the cluster was failed over again (3 days later), they got
spit back out as DSN's. I am trying to figure out why these messages are
being sucked into a blackhole during a failover. I suspect that the
failure of the SMTP resource during failover is part of the problem, but
the cluster logs are not giving me anything that is really useful. Has
anyone seen this before or have any suggestions?
Thanks,
Steve |
|
|
| Back to top |
|
 |
Guest
|
Posted:
Tue Nov 30, 2004 9:19 pm Post subject:
Cluster Failover causing loss of mail |
|
|
I also experienced this. We have a 4 node cluster, 3
active 1 passive, 1 node failed over and we failed it
back about 2 weeks later and users recieved mail that was
sent 2 weeks earlier. I could not find any trace of where
these messages sat during this time period.
| Quote: | -----Original Message-----
Setup:
Win2003 Active\Passive Cluster
Exchange 2003
Hitachi SAN for storage
When I failover from one node to another the SMTP
resource fails when
coming offline, but recovers when it tries to bring it
online on the
other node. When the other cluster node comes online the
server spits
out a load (100's) of DSNs like the below:
Your message did not reach some or all of the intended
recipients.
Subject: subject
Sent: 11/20/2004 3:21 AM
The following recipient(s) could not be reached:
Smith, John on 11/23/2004 9:41 PM
Could not deliver the message in the time limit
specified.
Please retry or contact your administrator.
servername.domain.com #4.4.7
After talking to some people, the original messages were
never
delivered (2nd time I have lost data in 7
years...arrgh). Looking at
the Sent time, that was the last time a failover occured,
so on that
date and time a good number of messages got sucked into a
blackhole
and when the cluster was failed over again (3 days
later), they got
spit back out as DSN's. I am trying to figure out why
these messages
are being sucked into a blackhole during a failover. I
suspect that the
failure of the SMTP resource during failover is part of
the problem, but
the cluster logs are not giving me anything that is
really useful. Has
anyone seen this before or have any suggestions?
Thanks,
Steve
.
|
|
|
| Back to top |
|
 |
Bob Christian
Guest
|
Posted:
Wed Dec 01, 2004 2:23 am Post subject:
Re: Cluster Failover causing loss of mail |
|
|
This sounds almost like the SMTP Queue, PickUp, and BadMail directories have
not been moved off of the local system. I am not sure if this has to be
performed in Windows 2003/Exchange 2003, but it has to be done with Legato
AAM and Veritas VCS.
One of the best pieces of information regarding these directories and
methodologies in moving them is located at:
http://www.msexchange.org/tutorials/SMTP_Virtual_Server_Uncovered.html
Bob
"Steven" <evetsleep@yahoo.com> wrote in message
news:%23QUkgol0EHA.1564@TK2MSFTNGP09.phx.gbl...
| Quote: | Setup:
Win2003 Active\Passive Cluster
Exchange 2003
Hitachi SAN for storage
When I failover from one node to another the SMTP resource fails when
coming offline, but recovers when it tries to bring it online on the
other node. When the other cluster node comes online the server spits
out a load (100's) of DSNs like the below:
Your message did not reach some or all of the intended recipients.
Subject: subject
Sent: 11/20/2004 3:21 AM
The following recipient(s) could not be reached:
Smith, John on 11/23/2004 9:41 PM
Could not deliver the message in the time limit specified.
Please retry or contact your administrator.
servername.domain.com #4.4.7
After talking to some people, the original messages were never
delivered (2nd time I have lost data in 7 years...arrgh). Looking at
the Sent time, that was the last time a failover occured, so on that
date and time a good number of messages got sucked into a blackhole
and when the cluster was failed over again (3 days later), they got
spit back out as DSN's. I am trying to figure out why these messages
are being sucked into a blackhole during a failover. I suspect that the
failure of the SMTP resource during failover is part of the problem, but
the cluster logs are not giving me anything that is really useful. Has
anyone seen this before or have any suggestions?
Thanks,
Steve |
|
|
| Back to top |
|
 |
Steven
Guest
|
Posted:
Wed Dec 01, 2004 11:33 pm Post subject:
Re: Cluster Failover causing loss of mail |
|
|
A good suggestion. I double checked and all the SMTP directories are on
a shared cluster resource disk (on SAN disk). I then looked at perhaps
the logical disk that contains the queue data is not a dependancy of the
System Attendant. They are....still scratching my head. Opened a case
with MS to see if we can figure it out. So far nothing. Thanks for the
suggestions though.
-Steve
Bob Christian wrote:
| Quote: | This sounds almost like the SMTP Queue, PickUp, and BadMail directories have
not been moved off of the local system. I am not sure if this has to be
performed in Windows 2003/Exchange 2003, but it has to be done with Legato
AAM and Veritas VCS.
One of the best pieces of information regarding these directories and
methodologies in moving them is located at:
http://www.msexchange.org/tutorials/SMTP_Virtual_Server_Uncovered.html
Bob
"Steven" <evetsleep@yahoo.com> wrote in message
news:%23QUkgol0EHA.1564@TK2MSFTNGP09.phx.gbl...
Setup:
Win2003 Active\Passive Cluster
Exchange 2003
Hitachi SAN for storage
When I failover from one node to another the SMTP resource fails when
coming offline, but recovers when it tries to bring it online on the
other node. When the other cluster node comes online the server spits
out a load (100's) of DSNs like the below:
Your message did not reach some or all of the intended recipients.
Subject: subject
Sent: 11/20/2004 3:21 AM
The following recipient(s) could not be reached:
Smith, John on 11/23/2004 9:41 PM
Could not deliver the message in the time limit specified.
Please retry or contact your administrator.
servername.domain.com #4.4.7
After talking to some people, the original messages were never
delivered (2nd time I have lost data in 7 years...arrgh). Looking at
the Sent time, that was the last time a failover occured, so on that
date and time a good number of messages got sucked into a blackhole
and when the cluster was failed over again (3 days later), they got
spit back out as DSN's. I am trying to figure out why these messages
are being sucked into a blackhole during a failover. I suspect that the
failure of the SMTP resource during failover is part of the problem, but
the cluster logs are not giving me anything that is really useful. Has
anyone seen this before or have any suggestions?
Thanks,
Steve
|
|
|
| Back to top |
|
 |
Bob Christian
Guest
|
Posted:
Thu Dec 02, 2004 6:56 am Post subject:
Re: Cluster Failover causing loss of mail |
|
|
It could also be the MTA location as well.
Please do let us know once you and MSFT figure it out.
THANKS,
Bob
"Steven" <evetsleep@yahoo.com> wrote in message
news:Ozh9qv81EHA.3324@tk2msftngp13.phx.gbl...
| Quote: | A good suggestion. I double checked and all the SMTP directories are on
a shared cluster resource disk (on SAN disk). I then looked at perhaps
the logical disk that contains the queue data is not a dependancy of the
System Attendant. They are....still scratching my head. Opened a case
with MS to see if we can figure it out. So far nothing. Thanks for the
suggestions though.
-Steve
Bob Christian wrote:
This sounds almost like the SMTP Queue, PickUp, and BadMail directories
have
not been moved off of the local system. I am not sure if this has to be
performed in Windows 2003/Exchange 2003, but it has to be done with
Legato
AAM and Veritas VCS.
One of the best pieces of information regarding these directories and
methodologies in moving them is located at:
http://www.msexchange.org/tutorials/SMTP_Virtual_Server_Uncovered.html
Bob
"Steven" <evetsleep@yahoo.com> wrote in message
news:%23QUkgol0EHA.1564@TK2MSFTNGP09.phx.gbl...
Setup:
Win2003 Active\Passive Cluster
Exchange 2003
Hitachi SAN for storage
When I failover from one node to another the SMTP resource fails when
coming offline, but recovers when it tries to bring it online on the
other node. When the other cluster node comes online the server spits
out a load (100's) of DSNs like the below:
Your message did not reach some or all of the intended recipients.
Subject: subject
Sent: 11/20/2004 3:21 AM
The following recipient(s) could not be reached:
Smith, John on 11/23/2004 9:41 PM
Could not deliver the message in the time limit specified.
Please retry or contact your administrator.
servername.domain.com #4.4.7
After talking to some people, the original messages were never
delivered (2nd time I have lost data in 7 years...arrgh). Looking at
the Sent time, that was the last time a failover occured, so on that
date and time a good number of messages got sucked into a blackhole
and when the cluster was failed over again (3 days later), they got
spit back out as DSN's. I am trying to figure out why these messages
are being sucked into a blackhole during a failover. I suspect that the
failure of the SMTP resource during failover is part of the problem, but
the cluster logs are not giving me anything that is really useful. Has
anyone seen this before or have any suggestions?
Thanks,
Steve
|
|
|
| Back to top |
|
 |
|
|
|
|