Step 1 – Admit You Have a Problem

When you can’t update your profile photo, send a Tweet, or even sign on to Twitter, it’s frustrating. We know that, and we’ve had too many of these issues recently.

Step 2 – Have a Plan

We are working on long-term solutions to make Twitter a more reliable and stable platform. It’s our number one priority. The bulk of our engineering efforts are currently focused on this issue, and we have moved resources from other projects to focus on it.

These are just a couple of the things mentioned in a Twitter Blog entry yesterday aptly named Reliability  by @mgrooves

We have already heard how over the period of the World Cup that Twitter saw record setting traffic and tweets per minute.  Understandably, that is the kind of stuff that sends servers huddling into corners and shaking in their 1’s and 0’s.

So what is the solution?  Well @jeanpaul posted today in the Twitter Engineering Blog – Twitter & Performance: An update – and explained some of the steps they are taking to stabilize things and specifically addressed a significant database issue that caused serious problems for users across the board recently:

On Monday, our users database, where we store millions of user records, got hung up running a long-running query; as a result, most of the table became locked. The locked users table manifested itself in many ways: users were unable to sign-up, sign in, update their profile or background images, and responses from the API were malformed, rendering the response unusable to many of the API clients. In the end, this affected most of the Twitter ecosystem: our mobile, desktop, and web-based clients, the Twitter support and help system, and

You can read their attempt to restart things on the site – basically it took 12 hours to get that massive database up and running again – of course that equals outage and Twitter Whale feeding.  That in turn equals frustrated Twitterers.

Is it fair of us to ask or expect Twitter to be 99.99 percent reliable? Is it an unrealistic request when you look at the scope of the data that is moving across the Twitter network?

Maybe. Maybe not.

You see this seems to be the nature of our instant satisfaction, instant access and expectations of technical services.  Once we gain reliability on a service or product we expect it to be there even if there is no obligation on the providers part to ensure 100 percent availability.  Can you show me where at in the Twitter Service Agreement it said that when you signed up? Nope, me neither.

Twitter has become an invaluable tool for hundreds of thousands of people.  The issues they are facing are the products of their success and who would want to have the alternative of not being successful? Yeah, me neither.

I do have to give them credit for taking the bull by the horns and working on a plan to get to the other side of these challenges.  Part of which is moving into their own data center later this year.

That will hopefully allow them to provide a service that is as dependable as what we expect from them.