Hosting: We Need More Transparency, Education

Filed Under

Published

February 19, 2013

7 Comments

It’s come to the forefront again as more than a few services that power WordPress sites specifically have experienced some downtime in the last week.

As Jason Cohen from WP Engine rightly states, 100% uptime is “perfection unattainable”:

Nobody wants downtime, and customers who are paying for a hosting solution have a right to ask and be informed about a company’s history of downtime.

In particular, people want to know who has the least uptime, or if there is a company that has achieved 100% uptime. The reality is that 100% uptime, while the goal that every company sets its sights on, is a perfection unattainable.

Specifically, ServerBeach San Antonio, Page.ly, Zippykid, and WordPress VIP all had a few blips. Although all of them admitted publicly their challenges and what they are going to do to remedy it, only one of those four services has a real-time uptime dashboard:

As you can see, Automattic has performance mapped out on their status site. This is not unlike what many other hosting services have, such as Amazon.com and their AWS solution:

The first question is simple: As more and more players jump into the WordPress Managed Hosting scene, should there be greater accountability and transparency around their hosting uptimes?

The second question is a much more technical one in nature and one that was succinctly brought to the surface by Brian Krogsgard in response to Cohen’s post about uptime:

I think the fatigue a lot of people feel (in general, not WP Engine specific) is that the reality of downtime doesn’t often mesh with the sales pitch.

Hosts sell uptime, and inevitably have downtime. Setting the expectation of “always up” and then failing is where disappointment for a customer sets in. How is a regular customer supposed to know it’s impossible? And at the same time, how can a host compete by saying, “We’ve got downtime, just like everybody else!” without losing business? Especially when the standards is to toss in all those 9s for % uptime.

Perhaps a little question mark to explain what 99.9% vs 99.99% uptime means (like hours down per month/year) would benefit, along with a link to an educational article like this, so that customers can be confident that their provider is a leader in the industry, but still fallible.

Without education, 99.9% and especially 99.99% looks like 100% to the average person. So therefore, even hitting a goal of 99.99% is a failure during the 0.01% in the customer’s mind.

I know it’s a catch 22, but maybe a nudge in the right direction, by education on the sales page, can help change expectations, and result in happier customers.

And this couldn’t be more of the truth. It gets even dicier (or at least confusing) when you see things that support a customer’s perspective of 100% uptime, like via ServerBeach’s SLA:

100% Network Uptime

PEER 1 guarantees that the PEER 1 network will be available 100% of the time, excluding Maintenance, as defined below. Customer is eligible for a credit for Network Downtime for any breach of this guarantee, which can be verified by PEER 1’s technical support team. “Network Downtime” is defined as an inability to transmit and receive data caused by failure of network equipment managed and owned by PEER 1, excluding Maintenance, but including managed switches, routers, and cabling.

Or even Pagely’s:

100% Network Uptime

Host guarantees 100% network uptime for our public Internet network, excluding scheduled maintenance. In the event that our network does not experience 100% network uptime in a given month (excluding scheduled maintenance), Host will credit 20% of your monthly service fees for each 3 hours of network downtime experienced up to 100% (for all Service Credits in a given month) of the monthly service fees for those Services affected. Notwithstanding the foregoing, you recognize that the Internet is comprised of thousands upon thousands of autonomous systems that are beyond the control of Host. This SLA and the 100% Network Uptime Service Commitment cover the provision of access by Host to the global internet “cloud”. Routing anomalies, asymmetries, inconsistencies and failures of the Internet outside of the control of Host can and will occur and such instances shall not be considered any failure of the 100% Network Uptime Service Commitment.

100% Infrastructure Uptime

Host guarantees that the critical infrastructure systems will be available 100% of the time in a given month, excluding scheduled maintenance. In the event that critical infrastructure systems do not experience 100% availability in a given month (excluding scheduled maintenance), Host will credit 20% of your monthly fee for each 3 hours of downtime up to 100% (for all Service Credits in a given month) of your monthly fee for those Services affected. Critical infrastructure systems include all power and HVAC infrastructure, including UPSs, PDUs and cabling. Critical infrastructure systems do not include any software or services running on server, nor do they include any server hardware.

Whether you understand this or not and are able to read through the fine-print and understand where the host’s explicit responsibilities start (and end) and how they will qualify coverage and/or reimbursement is one thing – while the rest of the very noobish world will simply see it as a “100% uptime guarantee,” which we know isn’t “true.”

And, not to mention that it can be difficult to even find this type of information to begin with – heck, even WP Engine’s SLA is tough to find and is only available via PDF on a publicly-avaliable Dropbox:

3.2 Service Availability Level Goals. WPEngine shall use reasonable efforts to achieve the target Service Availability Goal of 100% except during scheduled Service Maintenance.

Despite the odd placement and reference link via support at least WP Engine states it fairly well that they shall use reasonable efforts to achieve the target of 100%, which doesn’t mean that they’ll get it, but, hey, they will “try.”

I think WP Engine speaks on behalf of all providers – they are going to do their best, right? For their customers and for themselves, naturally.

But going back to Krogsgard’s comment, the education is still lacking for the most people – what does this mean in plain english? What does this mean in terms of compensation and reimbursement for downtime? What does it mean when:

I can’t connect to my blog but you tell me it’s up and that it’s my internet?

It’s those types of things that provide confusion and ultimately bite the rear of many providers. Sure, my local Comcast or Time Warner connection might fail a few hops away while my managed hosting service is both technically and practically “up” but it still feels like it’s down. And, if I don’t know the wiser, I’m going to blame you!

This is a call for more transparency, greater accountability, and even better education for everyone involved. We shouldn’t have to base our experience or find results of uptime on 3rd party pages, like ServerBear:

What are your thoughts? Is this just another regurgitation of a long-standing conversation between the customer and the provider and not worth anymore blog posts?

This post has no intention of “calling out” any particular player or citing bad form either – in fact, I believe most of the companies here are doing it right. It’s just a matter of providing that additional level of education that may prove to be helpful for more people.

There are 7 comments

February 19, 2013 #

strebel

Hosting uptime is an important metric for sure. It does not tell the whole story, but is part of a larger dialog between customers and providers. Sharing the right information with prospective and current customers is key to building and maintaining the trust we have established.

We have been working on some public facing reports that cover uptime, as well as other interesting stats around usage and security that we glean from our vantage point managing thousands of WordPress instances.
- February 19, 2013 #
  
  John Saddington
  
  sweet deal. are you using any 3rd party service or product like automattic to provide some of those metrics?
  
  we obviously are taking good notes from companies like you and the others to see what might work the best when/if we move forward in our implementation.
  
  appreciate the insight!
February 19, 2013 #

Ryan Hellyer

Someone should write an app. which logs the uptime of the major hosts home pages. Their homepage is not an ideal way to test it, but it’s still gotta be a decent metric. If someone can’t keep their own site online, then there’s no way they’re gonna keep yours online.

Or perhaps such a thing already exists?
- February 19, 2013 #
  
  Ryan Sullivan
  
  I think ServerBear pretty much does that: http://serverbear.com/
February 19, 2013 #

Jonathan Tittle

All providers strive for 100% up-time; none of us want to experience downtime any more than our clients do, though downtime, in some way, shape or form, is inevitable. It’ll happen at some point in time whether due to human error, hardware failure, or network instability.

With that being said….

The issue isn’t with the client/customer, it’s with the providers and the legal jargon they request that their clients read over and accept. When a client can’t make heads or tails of what you’re trying to tell them, how can you expect them to really understand? If all they see is 100%, 100%, 100% when “skimming” (and yes, that’s about all most clients do, they skim) your TOS/AUP/RUP/[insert 10x documents], can you really point the finger at them for being upset that their site isn’t available at least 100% of the time?

As providers, we need to educate, but we also need to make clear the point we’re trying to get across. If you *guarantee* 99.99% up-time, or even 100%, then define it and break it down. Just make it easier for clients to understand. Clients shouldn’t need a legal degree to decipher what you’re trying to tell them and unfortunately, with many providers, that’s pretty much what’s required. Be straight-forward and use English, not legal terms unless it is 100% absolutely required to get the point across (and in most cases, it’s not).

Everyone wants to avoid a legal battle so they rev up the TOS all while forgetting that that TOS is pretty much useless if clients are unable to understand a single thing that’s in it. They shouldn’t have to ask you to translate it for them. Heaven forbid some providers actually have to go to court over a misunderstanding and then be asked to explain what exactly their TOS means, in plain English, without needing a lawyer. How many providers do you think can honestly do that? I’m banking on very, very few.
February 19, 2013 #

Ryan Sullivan

I definitely think being transparent about uptime is a huge attraction for someone like me who manages a whole bunch of WordPress sites.

That’s one of the main reasons I’ve been SUPER impressed with Site5. They’re the only shared host I know of that not only displays realtime uptime statistics for every server in their network, they also let you CHOOSE the server where your account is setup based on those reports.

That’s pretty stellar IMO!

It would be really cool to see some of the managed WordPress hosts have that level of transparency.
February 19, 2013 #

Vid Luther

All great points John, and the rest of the commenters. We’ve been trying to get something up and running to educate people about our uptime, and customer happiness for a while now. The problem for us, is finding the time to deliver a great product, do the marketing, and also build these anciliary tools.

I can’t speak for others, but I’ve been pretty open about the fact that we’re making a major architectural change. So, putting up a status page when things are in so much flux would just make you think we’re designing christmas trees on web browsers.

Overall, as a hosting company we should all have pages that show this level of detail, but it takes a while to make these pages, especially during a time of transition.

As an aside, keep in mind that most of the major companies didn’t have a status page with this level of detail until 3-4 years of them being in existence. A nice looking status page can only be built once you have incident response procedures ironed out, and know what to do when x breaks.

Either way, we’ll have a status page with lots more detail, up before July.

How do you stay on top of your WordPress game?