HPE Storage Users Group

A Storage Administrator Community




Post new topic Reply to topic  [ 12 posts ]  Go to page Previous  1, 2
Author Message
 Post subject: Re: VMware Fault Tolerance
PostPosted: Fri Dec 05, 2014 3:10 pm 

Joined: Wed Dec 03, 2014 10:39 am
Posts: 5
Schmoog wrote:
For true zero RTO, you need a distributed application a la web server farm, microsoft exchange DAG, etc.

I have to agree with this.
I have done a ton of research looking into this, and have implemented some of the solutions. (latest 6-12mo ago) Zerto, Double-Take, VMware Metro Cluster(stretched VMware cluster), MS clusters within VMware to name a few. None of them provided a zero RTO.

If you find a product that truly provides a zero RTO, please let all of us know.

The best I was able to do with a dumb application was ~10-20 seconds (from outage detection to app start up). Most of that time was from the windows 2008 VM, and application starting up. That was with a VMware stretched cluster, .6 ms latency between brocade directors, with 40gbs bandwidth between them(20gbs - FC, 20gbs - enet), at 50 & 57 miles away (dual fiber paths). Stretched layer 2 networking. And the cost...ouch. $1mil/yr - dark fiber lease. $500k network gear. 500+ man hours, etc. Total cost was ~$5mil. Both FC and Ethernet networks plugged into a Cisco dwdm 15454. Storage was dual HP XP 24k front ended by a FC device that keep the storage 100% synchronized (160TB).

With a smart application, you can put it behind a load balancer, and have a VM at each site...but how would it keep it's database in sync? which side does the db live on? Can it stay up all the time? How fast does it failover to a secondary system?

Multi-CPU FT is coming. but again.. If 1 OS or app gets corrupted, they both are. It only protects against hard VMware Host outages.

I remember hearing about an application that can mirror the entire system, even active memory between servers. I can't remember what app it was...maybe oracle.

A co-worker just pointed out that Tandem and possibly mainframe can provide a zero RTO.
He also pointed out that your hardware, OS, and all layers of the Application ALL need to support it. And even then, if you are in the middle of a transaction and something happens, there is a still a chance that it gets interrupted, or lost.

In my opinion, you should go back to whomever requested this, and reset their expectations.

Also, there might be some cloud options available. But I don't know if any of the providers will guarantee uptime 100% with zero RTO.

Good luck.

Scott


Top
 Profile  
Reply with quote  
 Post subject: Re: VMware Fault Tolerance
PostPosted: Thu Dec 11, 2014 4:49 pm 

Joined: Thu Dec 04, 2014 4:06 am
Posts: 48
ScottB

Thanks so much for the detailed response... very interesting read.

This week I've been really really busy trying to set the expectation back to be being realistic (let the machine reboot) but impossible to do so... The guy before me unfortunately promised the world, sold the dream and then buggered off! Damn it...! not the first time I've been in this situation! I've managed to get some leeway though... that being, to run with a sandpit environment first, instead of going straight to production (Jeez! can you believe this guys design!?!?)... to use as a proving ground and then my plan would be to reset the expectation with concrete evidence! Even after explaining the constraints of FT currently, no one wants to listen!

So, ScottB... can I please (please) delve into your experience a bit more... i'm left just a little bit confused theoretically by the downtime element of 10-20 seconds you mention... Forgive me if I'm being naive, but if I had a vMSC with FT running over the top (using PP/RC), doesnt FT guarantee a seamless consistent vLockStep copy of the VM on the other side and therefore no downtime whatsoever? I get the fact that the VM performance itself could well be horrendous, given we are replicating every CPU cycle across the link (and back), but this would just affect the end application performance, right? The VM itself, no matter how poorly its serving the end application to users, should still seamlessly come up and be available without downtime, no? i.e. its exact state at site failure point, active memory pages, process, threads the whole lot... no reboot at all?

Hit me back when you can... the time is appreciated much.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ]  Go to page Previous  1, 2


Who is online

Users browsing this forum: No registered users and 213 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group | DVGFX2 by: Matt