Exchange CCR Mailbox Server Backup using Third Party Backup Tools such as Backup Exec, Arcserve, Data Protector etc
Possibly one of the most significant of these issues is Backup of your CCR Cluster when using Third Party backup tools such as Symantec BackupExec, CA Arcserve, HP Dataprotector etc.
As some of you may be aware, and others may not, the recommended backup model for Exchange 2007 Continuous Replication Servers is to take a backup of the Replica (off-line) Storage Groups & Databases using VSS. The logic behind this is pretty simple, backup then will not have adverse effects on your production Exchange performance & can be done at any time without causing degradation of performance to the users.
This all makes sense, so far, but brings up the next question - what about log file truncation on the production server? Well your backup software is supposed to Truncate those logs (on both live SG & Replica) which it has successfully backed up.
That takes care of the theory, so now lets look at how this is working (or rather isnt) in the production Environment:
Scenario 1: Exchange Server 2007 CCR Cluster with CA Arcserver 11.5 sp3
We struggled for months to get this backup working smoothely, multiple support calls to CA led to calls to HP & MSFT to try resolve the issues.
The Symptoms:
- VSS Errors when taking backup caused backup to fail
- Backup from Node 2 Succeeds but Fails from Node 1
- Small Storage Group Backups Succeed but Large SG Backups Fail
- SAN Based Backup takes 20+ Hours to complete 100GB Data Backup (supposed to be 45mins or so)
- Well after much R&D from CA & their Microsoft Support representitives (almost 3 months) we managed to clear off ALL, but the last symptom using various Registry Fixes, INI & DLL Edits, Replacing HBA's on the Servers, it was a nightmare - most of the time CA Support had no idea what they were doing but none the less they stuck with us & solved the issues one by one so big props to them.
It became apparent that we were making no headway on the last symptom though, SAN Backup was running at 50 MB per minute !!!!! Its supposed to run at 2 500 MB per minute. Now backup was "working", I mean it never failed, just took 20+ hours to complete.
Anyway, we lived with this for a while - had CA on the blower, HP too & MSFT for just in case once again to try to resolve this problem.
In the mean time, I moved onto another client who was having some Exchange CCR Problems of other natures to assist him.
Scenario 2: Exchange 2007 CCR Cluster with Symantec Veritas Backup Exec
For this client, backup seemed to be working fine (succeeding) even if at a snail's pace (50MB per min) on the RTM version of Exchange 2007. The major problems came after we updated his Servers to Exchange Server 2007 SP1. I suspected this to be the result of a change implemented in SP1 "for security reasons" which (through registry) disables the ability to take Online Streaming Backup of Exchange Databases. (puzzling or what?)
Long story cut short, this clients backup turned to shambles & it wasn't the fault of the above mentioned registry key. His Backup Exec started to display erratic backup behaviour: - Sometimes Backup Completes, Sometimes it Fails
- Sometimes its just one SG that fails the job, Sometimes its ALL
- Backup Speed is way below Normal for Network Backup (50MB/Min as opposed to 3-600MB/min)
The above symptoms were applicable no matter which CCR node was the Active & which the replica.
I checked out the backup setup & implemented some changes in the user accounts & backup accounts, ensuring that the Backup Exec Server & Agent Services ALL used a common account to Logon, Ensuring the Bakup Service Account had the necessary Exchange & Server Level Permissions to complete the backup, etc.
All pretty standard stuff, but since it wasnt all correctly configured to start with - I thought perhaps it would make a difference. Well NO! No difference at all.
The Common Factors:
In both the above scenario's, testing was done using Windows Native NTBackup solution to take SG & IS level Backups of the Exchange Mailbox Servers to Disk and this appeared to work perfectly fine. It was Reliable, Speedy & presented no real challenges aside from not being SAN Capable or able to handle the Tape Libraries / Autoloaders.
This lead me to believe it was something to do with CA & Symantech's products, something in the way they called the VSS Writers or parsed data to the backup server from the agents.
One thing could not be denied, both clients had bought Ferarri's that either wouldnt Drive faster than 50MB/min or would break down half way to their destination- this was unnacceptable.
The Solution:
After many months of having resources allocated to these clients, trying to solve the issue, I took a decision & informed the responsible people to try something - Tell the Backup Agent to backup from Active Server not Replica Server, test the performance then move the cluster to the replica & run the same test again.
Something pretty simple, but for some reason had not yet been tried. The results were astounding:
- Backup Completed Successfuly without any errors BOTH times
- Backup Speeds were back to Normal on BOTH Nodes
The issues were solved, change the Backup Job to take Active Node Backup on your schedule & things will be fine. This was a common solution for both clients - now, this leads me to believe that the problem is with MSFT, which means next step is to raise a support call & have them troubleshoot the same & provide me the solution so that I may return my clients backup procedures to the "Recommended" backup procedure.
If anyone out there has had similar issues & been able to solve them without using this work around, I would really like to hear about it. Post a comment & I will get back to you!




Comments