There is a “well known” bug in VMware Site Recovery Manager 5.8 (SRM), which puts your DR plan at risk. It will only affect if you have vCenters connected in Linked mode. Well, let me put it this way: when you have Site Disaster – you will not meet your RTO.
Luckily for us we caught this bug during our latest DR testing.
If you have two vCenters in Linked mode and would like to confirm this bug please bring down vCenter in you Production site down, log into vCenter at DR site and try to run recovery. You will see this:
Additionally in SRM log you will see the following errors:
2014-11-29T10:01:05.750-05:00 [03060 error 'HttpConnectionPool-000000'] [ConnectComplete] Connect failed to fqdn-prodvcenter:80>; cnx: (null), error: class Vmacore::Http::HttpException(HTTP error response: Service Unavailable)
VMware Engineer confirmed this bug and said currently they don’t have a fix. Removing Linked mode between vCenters is the workaround.
1. On the recovery site vCenter Server, point to Start -> All Programs -> vCenter Server Linked Mode Configuration.
2. Click Next, select Modify Linked Mode configuration and click Next.
3. Ensure that the checkbox Isolate this vCenter Server instance from Linked Mode group is selected and click Next.
4. Click Continue to isolate the vCenter Server.
5. When the wizard has completed, check that the Site Recovery Manager service is still running and start it if necessary.
It seems VMware under a lot of pressure from Microsoft to shorten release cycle for their products. I can’t believe QA team missed such a huge bug.
Update: VMWare published KB