Final Report on Wednesday night power event.
The power event we experienced at our Bothell facility on Wednesday night was in layman's terms, a "brownout". This explains why some servers were affected, and others were not.
The source of the event was our UPS. This is obviously a concern as one of the two primary functions of a UPS is to prevent spikes or drops in voltage (the other being a buffer between power sources.) Our UPS has been well-maintained with twice a year preventative maintenance since we acquired it in late 2000, so this can not be a maintenance issue. We replaced batteries in it following a very similar event on August 25th, 2004 - triggered by a lightning strike near our facility and the resulting surge damaged our batteries. We have been using the vendor of this UPS for ongoing maintenance and support since digital.forest opened for business in 1994. They have been quite responsive to our needs when dealing with events like this. However, we would prefer to avoid events like this.
We applied lessons learned over the past 10.5 years and have selected a superior UPS system for our new Seattle facility. The entire power system of the new facility is several orders of magnitude better than the one in Bothell. The inbound mains, and backup power systems are much more "industrial" in nature - the power distribution systems are a significant improvement - and the UPS is about 3 times the size and capacity of our Bothell unit. The Seattle UPS also has built-in redundancy in the form of multiple, independent battery cabinets that can be run in serial or parallel depending upon need. The campus of buildings where we are moving to was purpose-built for datacenter facilities with significant power infrastructure. Our new neighbors in Seattle are all industrial datacenter operators, either corporate or telecommunications providers. An example list of them includes Microsoft, Washington Mutual Bank, AboveNet, Savvis, OnFiber, Qwest, ELI, AT&T, Level3, etc. This is a significant change from the "office park" surroundings in Bothell.
The UPS in Bothell will be decomissioned at the end of March when we vacate the old facility and not re-used.
Finally, a question we have heard from many clients is "Why didn't the generator kick in?" The answer is fairly simple. The generator serves as a backup power system should power from the Public Grid fail. The UPS is further down the path of power. Electricity flows in this manner: Public Grid and/or Generator -> Transfer Switch -> UPS -> Servers. That is of course simplified, there are transformers, bypasses, breakers, etc mixed in, but put simply the UPS is "downstream" from the Generator. We have built a redundant power system, but a component failure is always a possibility no matter how it is done. In this case we had a voltage drop that lasted less than 1/10th of a second, but it was enough to cause problems for about 45% of the servers in our datacenter.
We would like to thank you for your patience and understanding, and we appreciate your ongoing business and support. We continually strive to improve our services and facilities, and this is proven by our commitment to our new datacenter.
posted by Chuck G. at 01:26 PM on Friday, February 11, 2005
Categories: Miscellaneous