digital.forest Technical Support
News archive: June 2007

As part of our ongoing process of upgrading the software and configuration on our FreeBSD hosting servers, Souari and Butternut will be taken down on Sunday evening for an operating system upgrade. We expect the maintenance to begin around 9pm Sunday evening, and should last less than two hours.

posted by Bill D. at 03:29 PM on Friday, June 29, 2007
Categories: Hosting Servers

Tonight during our scheduled maintenance window we will be making some changes to our BGP configuration in our continued efforts to better balance traffic and reduce latency. During the window we will be resetting each of our BGP peers. As each peer is reset the other peers will take the traffic so downtime should be no more than a few seconds.

The maintenance will occur between 11:00 pm this evening and 1:00 am tomorrow morning.

posted by Kyle at 03:32 PM on Monday, June 18, 2007
Categories: Network

Tonight during our scheduled maintenance window we will be making some changes to our BGP configuration in our continued efforts to better balance traffic and reduce latency. During the window we will be resetting each of our BGP peers. As each peer is reset the other peers will take the traffic so downtime should be no more than a few seconds.

The maintenance will occur between 11:00 pm this evening and 1:00 am tomorrow morning.

posted by Kyle at 02:57 PM on Friday, June 15, 2007
Categories: Network

Tonight during our scheduled maintenance window we will be making some changes to our BGP configuration in order to better balance traffic and reduce latency. During the window we will be resetting one of our BGP peers. When this peer is reset the other peers will take the traffic so downtime should be no more than a few seconds.

The maintenance will occur between 11:00 pm this evening and 1:00 am tomorrow morning.

posted by Kyle at 09:58 PM on Thursday, June 14, 2007
Categories: Network

One of the distribution switches in our facility, specifically in the rack-colocation area in rows 11 & 12 in DC1, has been showing errors and causing some network issues for servers in that area today. Our switch vendor believes that this is being caused by a bad gigabit port which uplinks that switch to our network core. The other possibility is a bad switch engine. Thankfully the former of these has some built-in redundancy. The latter is an easy card swap, and we have spares. So in a few moments we will be manually failing the gigabit uplink to the redundant port. It is unlikely that this will be any more noticeable than the intermittent issues that servers have seen on this network segment, namely some dropped packets and retransmissions. If that does not improve things, we'll replace the switch engine blade later tonight during a maintenance window.

Update 5:00 PM: The card reset seems to have solved the issue. We'll keep an eye on things over the next few days to be sure. We'll also order a replacement card for the switch. Thanks for your patience.

Above: Network Manager Kyle Murray pulls the problem gigabit card from the switch.

posted by Chuck G. at 08:28 AM on Monday, June 11, 2007
Categories: Emergency Maintenance, Network

Tonight during our scheduled maintenance window we will be making some changes to our BGP configuration in order to better balance traffic and reduce latency. During the window we will be resetting each of our BGP peers. As each peer is reset the other peers will take the traffic so downtime should be no more than a few seconds.

The maintenance will occur between 11:00 pm this evening and 1:00 am tomorrow morning.

posted by Kyle at 04:19 AM on Thursday, June 7, 2007
Categories: Network

We have both our HVAC and Controls Systems vendors on site right now performing an investigation of our issue over the weekend. They will be shutting off the systems for short periods of time in order to perform diagnosis of various system components. Datacenter temperatures may rise as a result. Recover times should be brief however, as the outside temperature here in Seattle is 30F/17C cooler than it was on Sunday with cloudy skies, showers and 53F.

posted by Chuck G. at 04:36 AM on Tuesday, June 5, 2007
Categories: Facility Maintenance

As part of our changes to the FreeBSD hosting servers, Silverpine will be offline late tonight for an upgrade to the Apache web server software. We estimate that the downtime will be less than 2 hours, during which the current configuration will be backed up, and the new configuration will be installed. There will be substantial modifications to the virtual host configuration files. In the vast majority of cases, everything should continue to function after the upgrade as it has until now. However, we ask that you test your sites thoroughly, and report to us as soon as possible if they find any problems so that we can correct them.

posted by Bill D. at 07:50 AM on Monday, June 4, 2007
Categories: Hosting Servers

digital.forest is in the beginning stages of a significant change in the way our FreeBSD hosting servers will function. This change is intended to increase the security of your sites, and to improve our ability to quickly determine the source of a problem, such as an insecure web form being used to send spam.

We are taking care to minimize the impact these changes will have on you, but there may be some cases where problems will arise. In such a situation, we will assist you in making any changes necessary to get your site working with the new configuration.

The first stage of the changes is upgrading the server to the latest version of the FreeBSD operating system. This stage has been completed on Silverpine. Souari and Butternut will follow by the end of this month. We do not anticipate that customers will notice any changes from this upgrade.

The second stage is upgrading Apache, the web server software, to the latest version. This is a more significant change, from a hosting standpoint. Silverpine is scheduled to be upgraded tonight. There will be a period of downtime, estimated to be less than 2 hours, during which the current configuration will be backed up, and the new configuration will be installed. There will be substantial modifications to the virtual host configuration files. In the vast majority of cases, everything should continue to function after the upgrade as it has until now. However, customers will need to test their sites thoroughly, and report to us as soon as possible if they find any problems so that we can correct them.

The third stage will be activating two key configuration changes for each virtual host: suexec and suPHP. Once these are activated, customer CGI and PHP scripts will execute under their own usernames, rather than under the server's username. We will change the ownership on all files and folders in your home directory to your own username, so that if your site needs to write files into your directory, it can still do so. We will again need people to test their sites carefully and make sure everything is working as expected.

The fourth and final stage will be activating suexec and suPHP for the default server, outside the virtual hosts. Before this stage is completed, any customers who are using the server's built-in SSL certificate, rather than one purchased for their own domain, will need to work with us to make changes to their site to work with an alternate secure URL. You'll still be able to use our certificate, but we'll need to set up a subdomain specific to your virtual host to permit you to do so.

We will be moving very slowly and carefully on the Silverpine changes so that we can refine the process for the remaining servers. Your testing and feedback will be a great help in getting any unanticipated problems handled as quickly as possible.

posted by Bill D. at 07:49 AM on Monday, June 4, 2007
Categories: Hosting Servers

The source of our HVAC system issue today was a malfunction in our fire suppression system. The fire system is one that is designed to suppress fires without damaging computer equipment. It works by sealing the datacenter and flooding it with a gas which smothers the fire. The first stage in the process, "sealing the room", is done by shutting the air conditioning by using motorized dampers in the ducting.

For reasons we do not know yet, these dampers closed due to some malfunction.

We were able to open them manually and then bypass the fire supression system's control over the HVAC system. By this time however, temperatures in the datacenter were out of the normal range. We elected to shut down as many systems as possible to accelerate the cooling and minimize the heat load.

We will have our Fire Suppression system vendor out to diagnose and correct this malfunction as soon as possible.

The timing of this event coincided with what is predicted to be the last of a streak of relatively hot days here in the Seattle area. We doubt this is linked to the cause, but it certainly contributed to the overall problem and the time taken to recover.

We will continue to update the support blog post below this one with breaking news as it happens. Thank you for your patience during this event.

posted by Chuck G. at 07:41 AM on Sunday, June 3, 2007
Categories: Emergency Maintenance

We are experiencing an HVAC-related issue right now. Our HVAC maintenance vendor is en-route. We may be shutting down systems to lower the datacenter temperature. Please stay tuned for more information.

Update: 12:35 PM Vendor is on site, the system is running again, but datacenter temperature is high. We are shutting down as many systems as possible to reduce the heat load on the facility in order to get the temperature back down as swiftly as possible.

Update: 12:45 PM It appears the HVAC was shut down by a malfunction in our fire suppression system. We have bypassed that system's HVAC controls for the time being to prevent it from reoccurring.

Update: 12:50 PM Temperatures are beginning to come back down.

Update: 1:30 PM We have additional staff on-site now to handle phone calls and server shut-downs/startups.

Update: 2:50 PM We will start bringing shared servers online very soon.

Update: 4:30 PM Datacenter temperature has stabilized and normal. We have restarted most servers, but some still remain down due to miscellaneous problems. If any of these appear to have longer-term issues, we will begin to post individual lists and updates separate from this entry.

Thanks for your patience and understanding.


posted by Chuck G. at 05:27 AM on Sunday, June 3, 2007
Categories: Emergency Maintenance