One of our network peers is experiencing problems on their backbone. Our BGP session with them started failing at 4:18 PM PDT. We shut down that interface in order to prevent their issues from causing our customers any problems. We are in contact with their Network Operations Center and will re-enable our connectivity to them as soon as their issue(s) are resolved. At this time they have no ETA for a fix.
Our other circuits are taking our traffic, so this issue should be mostly invisible to our clients.
Update: 5:40 PM PDT We have a very solid understanding of what happened a little over an hour ago with regards to the upstream network event. One of our neighbor networks experienced a routing problem, we saw the routing table from them shrink from roughly 225,000 entries down to 84 entries over a period of several minutes. Unlike a link failure, this was a circuit whose performance was rapidly degrading, but still "up". This prevented normal, automatic fail-over procedures from working. We shut down the interface that connects our two networks, and things failed over gracefully at that point.
That circuit will remain shut down until we have confirmation from them that their network has stabilized.
We will update this post when we have more news.
Update: 8:50 PM PDT Our upstream peers problems have been resolved. We will be bringing up our BGP session with them tonight at 10:00 PM.
Update: 10:03 PM PDT BGP session is up. Peer is taking traffic and route table is fully populated.
posted by Chuck G. at 09:48 AM on Wednesday, August 29, 2007
Categories: Network