First I must apologize for not posting an update yesterday. I spent the day on the phone, or answering emails about the situation.
First the Update: It looks like our expected FULL recovery time is mid-day Thursday (7/19), though we may begin to restore some directories as early as tonight. We have a list compiled via our help desk and emails of directories to make priority in recovery. The plan will be to create new directories in your home login areas with the recovered data. We WILL NOT overwrite any changes made by you since Saturday without your prior written permission. This will allow you to integrate the recovered data as needed.
Many of you asked if this has any affect on email. The answer is no, email is handled by different servers.
Many of you have asked how this happened, and why we do not have backups, or backups of our backups. The answer is that we do, but the backup systems (two of them in fact) failed prior to the disk trouble on butternut. To make a very long story short, our tape backup system suffered a mechanical failure last week, destroying a tape set in the process. Our "backup" backup system was then employed. It uses a large (1 terabyte) disk array which is one of a set we use to backup our colocation clients. It started backing up our hosting servers last Wednesday. The backup software reported an error on Friday night that required us to rebuild the catalog from source data. This is a very time-intensive process that can take many days. Of course, less than 24 hours later is when the drive in butternut failed. So, in the end two backup systems failed prior to the loss of one of butternut's drives.
Currently we are pursuing multiple avenues for data recovery, and have excellent faith at least two of them. As soon as we have good data to restore we will post an update here, and will not leave until butternut is fully restored.
posted by Chuck G. at 10:44 AM on Tuesday, June 17, 2003
Categories: butternut.forest.net