On Thursday, April 2nd between 23:00 hrs PST and 23:59 hrs PST we will be performing maintenance on one of our distribution switches. This maintenance will only affect servers in row 6 of the data center. There will be a 30 second outage for these servers when the maintenance is performed.
digital.forest remains committed to providing our customers with the highest level of service, the greatest degree of protection, and the most transparent communications. If you have any questions or concerns about the above maintenance, please contact your account manager. Our account management staff is available Monday through Friday from 08:00 hrs PST until 17:00 hrs PST at 877-720-0483 Option 2.
posted by Kyle at 10:18 AM on Monday, March 30, 2009 Categories:Network
Above: The brand new batteries going into UPS1 today.
As posted previously, we are performing some scheduled maintenance on our UPS systems today. Two tasks are scheduled: Performing a battery swap on UPS1, and replacing capacitors on UPS3.
The first task is the battery replacement. UPS batteries should be replaced at regular intervals to ensure continued safe operation and our UPS systems are no exception to that rule. UPS1's batteries were installed in 2005 and are due for retirement.
We will be running the facility on generator power throughout the maintenance interval.
Above: Facilities Manager Kevin Teker performs a preflight check on the Generator.
We will post updates through the day. Up to the minute info can be found on Twitter @digitalforest.
Update 8:25 AM PDT: As of 8:05 AM PDT we are running on generator power.
Update 8:30 AM PDT: As of 0:25 AM PDT UPS1 & 2 are in Bypass Mode. They will be shut down and the old batteries removed over the next few hours.
Above: Battery technicians from the UPS manufacturer prepare to remove the old batteries from UPS1.
Above: Old batteries coming out, new batteries in the background going in soon.
Above: The old batteries are out, now the new ones are going in.
Once the new batteries are installed we'll test UPS1 under an artificial load. We've brought in a load bank for that purpose. It is basically a heat generator, taking in electricity and outputting heat via elements. It will be stationed outside the building and cabled to the UPS system. The UPS will be brought online, and this load added to it to ensure that it works properly before we transfer the live datacenter load onto it. The load bank itself can be seen here:
Above: The portable load bank.
Our generator has a built-in load bank as well, though it is much larger than this unit since it has to simulate the entire facility (datacenter, HVAC, etc) load whereas this portable unit only has to simulate the full load of one UPS system. We expect to be testing the UPS under this artificial load in the early afternoon. Stay tuned for updates.
Above: Facilities Manager Kevin Teker checks the new batteries with the Fluke Multimeter. They all tested out perfectly at around 12.6 V.
Here is a quick movie taken in the Generator Room. The little microphone on the digital camera cannot truly capture how teeth-rattlingly loud it is in here:
That is a 16 Cylinder, twin-turbo, 1850 horsepower, 50.3 liter Diesel engine.
The new batteries are now completely installed in UPS1. We're setting up the portable load bank to test the UPS under load.
Above: UPS1, all the new batteries are installed. Testing begins soon.
Update 12:30 PM PDT: Transfer tests have been completed successfully under artificial load. Next step is to prepare for live load transfer.
Above: As a safety precaution while the UPS technicians attach the load directly to the bypass switch's bus bars they open the UPS system's main breaker and lock it out.
Above: Preparing the load bank for artificial load testing.
Update 12:45 PM PDT: UPS1 (and UPS2) are back online and carrying live datacenter load. The technicians will enjoy a well-deserved lunch break and we'll commence on the UPS3 maintenance afterwards.
Above: Maintenance complete! UPS1 is back online.
The work scheduled for UPS3 involves replacing some capacitors. The UPS manufacturer has recalled these as similar units in other facilities have presented some issues. Since uptime is very important to us, we have elected to have this replacement performed as soon as possible. The sequence of events will closely mirror the maintenance this morning: The UPS will be bypassed, and then shut down. The technicians will open up the unit itself and remove the old capacitors, replace it with new ones, and then reassemble and test. We will conduct a simulated load transfer test with the artificial load bank. Once convinced that the UPS is operating properly we'll bring it back online.
When all the maintenance is complete we will transfer power back to the electrical grid and after a cool-down period the generator will shut down and be placed back into automatic startup mode.
Above: Maintenance on UPS3 begins. New capacitors are in the boxes at the bottom left of the photo. The technicians are removing cables in order to extract the old capacitors. Note: the blurry techs are a result of shooting the photographs without a flash. I learned a long time ago that UPS technicians do not appreciate flash photography when they're working around potentially high voltages. I'm happy to produce blurry pictures in exchange for their confidence and comfort! The technicians pictured are Steve Cretsinger (left) and Craig Kraft (right) from MGE.
Update 1:50 PM PDT: We've shut off the clean agent fire suppression system in DC1 and the UPS room. The new batteries shipped with a coating of dielectric grease on their terminals and we've noted that it is cooking off under the UPS' ~165,000 watt load. Just to be on the safe side we've disabled the system to prevent a false detection and discharge. The UPS manufacturer feels that it should be clear in a few hours.
Meanwhile the UPS3 maintenance continues.
Above: The old capacitors from UPS3 lie in the foreground as the technicians prepare to install the new ones.
Update 3:00 PM PDT: New capacitors are installed. The UPS is being reassembled. Our ETA for completion is now about 4:00 PM PDT.
Above: The technicians wrap up the installation of the new capacitors.
Above: UPS3 maintenance is finished. ETA for bringing it back online is about 3:50 PM PDT.
Update 3:45 PM PDT: UPS3 is now back online. We'll transfer back to the electrical grid momentarily.
Update 4:00 PM PDT: All Scheduled Maintenance activities are completed, digital.forest's datacenter has resumed normal operations.
posted by Chuck G. at 07:56 AM on Thursday, March 19, 2009 Categories:Facility Maintenance
On Saturday, March 28th between 03:00 hrs PST and 05:00 hrs PST one of our upstream network providers will be performing maintenance on the edge router that we connect to. This will cause the connection to this router to be down for approximately 15 minutes while this maintenance is done. During the period our other upstream network providers will carry our traffic without interruption to services.
While there will be no loss of traffic during this maintenance, there will be periods of latency due to sub-optimal routing while the BGP sessions re-converge.
digital.forest remains committed to providing our customers with the highest level of service, the greatest degree of protection, and the most transparent communications. If you have any questions or concerns about the above maintenance, please contact your account manager. Our account management staff is available Monday through Friday from 08:00 hrs PST until 17:00 hrs PST at 877-720-0483 Option 2.
posted by at 03:06 PM on Monday, March 9, 2009 Categories:Network
******SERVICE IMPACTING UPS SYSTEM MAINTENANCE******
On Thursday, March 19th starting at 07:00 hrs PST and ending at 18:30 hrs PST, we will perform maintenance on our UPS systems. This maintenance will include replacing the batteries in UPS 1 and upgrading the capacitors in UPS 3.
******Impacted Services******
Our FileMaker 3, 4, 5, and 6, Lasso 3, 4, 5, and 6, and legacy Mac shared web hosting environments will be taken offline for 30 minutes in the morning at approximately 08:00 hrs PST and again for 30 minutes at approximately 18:00 hrs PST.
******************
For the duration of this maintenance our datacenter electrical load will be transferred to generator power.
This combined maintenance window has been selected based on factors to provide the greatest degree of protection for our clients. These factors include the availability of the most senior technicians from our UPS vendor and the ability to mobilize parts and additional service personnel in the very unlikely event of a component malfunction or failure during the maintenance. Additionally, by servicing both UPS systems during the same maintenance window, we eliminate an additional operation of our maintenance bypass switch thus limiting exposure to a potential voltage drop to datacenter critical equipment.
The schedule for this maintenance is as follows:
07:00 Datacenter Temperature Reduction: We will force the temperature of the datacenter toward the lower portion of the ASHRAE allowable envelope prior to transferring power to the back-up generator. When the transfer takes place our HVAC systems will power-cycle causing a small (2 to 4 degree Fahrenheit) thermal inclination in the datacenter. By lowering the overall space temperature we can be assured that equipment temperatures will not be adversely impacted by the 10-minute restart period of the HVAC systems.
07:00 to 07:30 Generator pre-flight evaluation: Check fluids, connections, air inlets/exhaust, fuel supply, fuel lines, filters and separators. Log results and announce startup to internal staff.
07:30 Generator start-up and warm-up: Start the generator and monitor performance stats, check for fluid leaks, supply artificial load and evaluate voltage, amperage, frequency and engine stats against established baselines.
07:45 FileMaker 3, 4, 5, and 6, Lasso 3, 4, 5, and 6, and legacy Mac shared web hosting environment shutdown: We will begin shutting down the shared hosting environment at this time. The shutdown will be complete by 08:00 hrs and server environment restarts will begin shortly after 08:00 hrs.
08:00 Transfer from grid power to generator power and wrap power around UPS: Manually operate the ATS and UPS maintenance bypass to force the datacenter electrical load to the generator. At this time our HVAC system will power cycle as described above.
08:00 to 12:00 Replace batteries in UPS 1: The UPS system must be completely powered off for life safety while the batteries are removed and replaced. During this operation UPS 3 will remain online and provide UPS power to clients with A+B power.
12:00 to 12:30 Power on UPS system and perform artificial load testing: Following the removal and replacement of components, the UPS systems will be powered on and connected to an artificial load (not datacenter equipment, servers or other infrastructure) for testing. With this equipment we will test the transfer process and mechanism as well as simulate a load of 80% of the UPS system capacity.
13:00 Transfer datacenter load back to UPS 1: Once the process and mechanism has been validated we will transfer the datacenter load back to UPS 1 though power will continue to be supplied to the UPS from the back-up generator.
13:30 to 17:30 Replace capacitors in UPS 3: The UPS system must be completely powered off for life safety while the capacitors are removed and replaced. During this operation UPS 1 will remain online and provide UPS power to clients with A+B power
17:30 to 18:00 Power on UPS 3 and transfer datacenter load: Once the capacitors have been replaced and charged we will transfer the datacenter load back onto UPS 3.
17:45 FileMaker 3, 4, 5, and 6, Lasso 3, 4, 5, and 6, and legacy Mac shared web hosting environment shutdown: We will begin shutting down the shared hosting environment at this time. The shutdown will be complete by 18:00 hrs and server environment restarts will begin shortly after 18:00 hrs
18:00 Transfer Datacenter live load back to grid power: Once all testing has been completed and both UPS systems are online and functioning within specifications, we will transfer the datacenter live load back to grid power.
18:00 to 18:30 Generator Cool-Down and Post-Flight Evaluation: Following the operation of our generator we will perform the same evaluations we performed during the pre-flight as well as allow the generator to cool prior to shutting down.
One to two weeks following this maintenance, we will perform a battery preventive maintenance on UPS system 1 to validate the condition of the new batteries and replace any questionable jars. This preventive maintenance will follow the same procedure as above but require significantly less time to complete. A notice will be posted to our support blog prior to this future maintenance.
digital.forest remains committed to providing our customers with the highest level of service, the greatest degree of protection, and the most transparent communications. If you have any questions or concerns about the above maintenance, please contact your account manager. Our account management staff is available Monday through Friday from 08:00 hrs PST until 17:00 hrs PST at 877-720-0483 Option 2.
On Saturday, March 7th at approximately 10:50 hrs PST one of our upstream providers lost power to their equipment in the Westin Building. The power outage at the Westin Building lasted for approximately 90 minutes. Following the outage service was restored and all routes returned to normal.
At no time during this event did digital.forest stop passing traffic.
digital.forest remains committed to providing our customers with the
highest level of service, the greatest degree of protection, and the
most transparent communications. If you have any questions or concerns
about the above event, please contact your account manager directly. Our
account management staff is available Monday through Friday from 08:00
hrs PST until 17:00 hrs PST at 877-720-0483 Option 2.
******UPDATE 12:34 hrs PST******
We have confirmed that the network event is the result of a partial power loss in a facility at the Westin Building in Seattle. It appears that power has now been restored and we have reestablished network connectivity.
We will continue to monitor closely and provide any updates here.
******NON-SERVICE IMPACTING EVENT*****
At 10:41 hrs PST our AboveNet circuit to the Westin Building in Seattle lost connectivity. Our other carriers continue to carry our network traffic normally.
We're researching the cause of the issue with our upstream providers and will provide an update once we have complete information.
posted by at 12:25 PM on Saturday, March 7, 2009 Categories:Network