A network configuration change made during last night's scheduled maintenance has caused a minor issue with clients behind our shared firewall service. A reboot of the firewall will clear this issue and allow proper functioning again. We will reboot the firewall at noon PDT today. Downtime should only be a few seconds while the firewall restarts.
This should have no affect on the rest of our clients or network.
Update: 11:15 AM PDT The firewall reboot has been cancelled as it is no longer required. We have addressed the issue with other means.
For the terminally curious here are the details:
Last night just after midnight, as part of our plan for dramatically increasing our levels of network redundancy, we migrated one of our upstream fiber connections to our second boundary router. We also finished enabling Spanning Tree Protocol on all of our Ethernet switches to recognize redundant trunks we will be deploying in the coming weeks.
In this case it was the gigabit Ethernet connection from XO Communications (AS 2828), that we moved from our original Cisco 6509 router, over to our secondary Cisco 6509 router.
When we did this, all network operations appeared to acknowledge the change via iBGP and OSPF protocols as expected.
Unfortunately our managed firewall device did not. We began to get calls from some clients concerning reachability of certain servers around 6:30 this morning. By 7:30 we had isolated the problem to the change made last night, and actually shut down the gigabit connection to XO to guarantee connectivity to shared firewall clients while we worked out how to address this problem with minimal downtime for the affected clients.
We planned a config change and reboot of the firewall for noon today, but in the meantime we were able to forestall that action by redistributing static routes between the firewall and the two different routers via OSPF.
That action was completed at 11 AM PDT today, and should prevent any such future routing issues like we experienced last night.
Please be aware of the following:
This did not mean that servers were "down." The firewall remained up, and all servers behind it were reachable via normal network channels. The issue was that if OUTBOUND traffic from those servers was destined for the XO connection, then the firewall had the incorrect routing information and was unable to send it. XO carries approximately 20-30% of our outbound traffic.
posted by Chuck G. at 10:24 AM on Thursday, October 27, 2005
Categories: Emergency Maintenance,
Managed Firewall Services,
Network