Wednesday, December 21, 2005's credibility suffering from service outages

This week,'s favorite example of software on-demand--suffered an outage of something like three to six hours, knocking out service for possibly thousands of customers. According to the vendor,
On Tuesday December 20th, some users experienced intermittent access (between approximately 9:30 am and 12:41 pm ET & 2:00 pm and 4:45 pm ET) on one of the companyƂ?s four global nodes. The root cause of the intermittent access was an error in the database cluster. addressed the issue with the database vendor. By Tuesday afternoon EST, the system was running normally for all users.
What concerns me though, is not this single outage. It's that this is just the worst case incident in what is apparently a less-than-rare occurance for customers. According to CNet:
Salesforce touts an "uptime" rate of greater than 99 percent. Outages are "a rare occasion," according to [ spokesperson Bruce] Francis. He said Salesforce's systems are as reliable or more reliable than other comparable systems, including the type that companies run on their own servers.

Yet several Salesforce customers that contacted CNET about Tuesday's glitch said outages happen more frequently than they had expected. About once a month, Mission Research experiences Salesforce outages that typically last an hour or so, [Charlie] Crystle, [CEO of customer Mission Research] said. Another customer, an East Coast consulting firm, has been struck by outages about a half a dozen times over the past year, according to the firm's vice president, who requested anonymity. Frustration levels are rising.

"I'm really, really angry about this because (Salesforce is) out there marketing themselves as something they're just not living up to," Crystle said.
It doesn't need to be this way. A large part of what Google and Yahoo provide is really software on-demand--little applications. When was the last time you went to Google or Yahoo and found service unavailable for more than a few seconds?

Readers of the Spectator know that I'm actually a proponent of the trend toward software on-demand. I like its promise to simplify system implementation and maintenance, especially for small and mid-size businesses, relieving the customer of having to worry about things like backups, recovery, disaster planning, and service level maintenance.

But the trend toward software on-demand is going to be set back several years if on-demand vendors can't maintain the service levels they promise and that customers expect.

Are the problems of typical of other software on-demand vendors, or is an anomaly? If you have insights, post a comment to this post or email me.

Update, Dec. 22. There's further discussion going on in the comments section for this post.

Related posts
Software on demand: attacking the cost structure of business systems offers development sandbox set to strike out with AppExchange? looks to hook Siebel staff struggling at Cisco


Anonymous said...

The problem is that if service providers are going to need the infrastructure of Google and Yahoo to be able to provide reliable service then there are not going to be very many capable players.
The other distinction I see is that the data stored by Google and Yahoo is not mission critical to its users. And, in most cases the service is provided for free so it is not like anyone can get too upset!
If you look into Google's infrastructure, they customized the OS, wrote their own file system, built their own hardware, all to get the scalabilty and reliability that is needed.
I think creating a reliable hosted service for mission critial data is going to be a seriously difficult and expensive challenge for anyone who steps up to the plate.

Frank Scavo said...

Darrel, I would suggest that the requirement to maintain reliability of a hosted service is not as great a challenge as you suggest.

The fact that Google and Yahoo can do it for applications that are not mission critical to users argues precisely my point. It is a free service, yet the providers are able to maintain high reliability. Why shouldn't be able to do the same for users that are paying a subscription fee?

Yes, Google's infrastructure is customized for high reliability--but that is mainly so Google can use inexpensive hardware (Intel boxes) and operating systems (Linux) to scale cost effectively. The reliability requirement does NOT require such innovation. Individual corporations maintain similar reliability for mission critical applications all the time. Heck, we did it 20-30 years ago with mainframes. appears to have the right architecture for reliability, with clusters on four nodes. Ideally, when there is a failure at one node, the workload should roll over to the other three nodes. We know that is running Oracle as its primary database platform, and Oracle provides such failover capabilities of course. So, why is not utilizing such capabilities is a mystery to me.

But back to the expense issue. I would suggest that a better use of resources would be to implement failover capabilities as I have outlined above, rather than take out full page ads in the Wall Street Journal.

Anonymous said...

Google's choice to use inexpensive hardware was one about price versus performance, not about expensive and reliable versus cheap and unreliable. Lamborgini's are extremely powerful but not particularly reliable. Google's customized infrastructure effectively allows you to glue a bunch of Honda Civic's together to beat the Lamborgini in a race. That's not an easy feat.
This gets us back to the multi-tenant architecture. That's where I think Salesforce are having their problems. You talked the other day about Microsoft using a separate server for each sale of Microsoft CRM. It's an expensive model but one where you will not have outages across all of your customers as SalesForce did.
I think as the Virtual Server market matures, products like VMWare's GSX Server and Microsoft's Virtual Server will allow multiple isolated server instances to run on commodity hardware. Then the multi-tenant architecture starts to be more feasible. The customer gets their own server instance and the hosting provider can choose where to host that instance based on the required resources.