Wednesday, November 14, 2012

Rebooting Routers and Random Search Engine Fodder

I frequently tell junior networking folks that rebooting routers without a good reason is a sign of weakness. All too many people who got their start as Windows sysadmins have learned from experience that the way to fix any weird problem is to reboot. In a Windows environment it usually works.

In the networking world, that doesn't cut it. One of the defining characteristics of a good network engineer is a belief in a "culture of availability", in which you don't take down services in a haphazard, unplanned fashion just to see what happens. Reboot-just-to-see-what-happens isn't a part of that culture.

Unfortunately, software bugs sometimes make reloads necessary, but I prefer to identify them before rebooting. Twice in the last month though, I've run into bugs that required a reboot without much research beforehand. The first one involved a brand-new 2900-series router that needed a reboot before the SRST EULA would show up as accepted: annoying! I opened a TAC case on this one and was advised to try a reboot before anything else; never got a bug ID either.

The second one involved an end-of-support voice gateway that I had to shift from MGCP to H.323 during part of a M&A transition. After making the changes, calls coming in on the PRI would connect, then disconnect immediately before matching any dial-peers: everything in debug isdn q931 looked normal, yet debug voip ccapi showed nothing. The console would log this message:

vnm_dsprm_voice_connect: mismatch htsp

Google was remarkably unhelpful on this, with only a few matches and none that were enlightening. I'm guessing that "vnm" stands for "voice network module", and I know that "htsp" shows up in a lot of voice module debugs, but I don't know much more. The "dsp" part seemed promising, but the gateway had no DSP farm configuration. I'd tried various combinations of "shut/no shut" on the T1 controller, voice port, and D channel with no luck.

With nothing else to go on and end-of-support equipment that had been up for about three years, I decided to give in to weakness and try a reload, just to see what would happen. It immediately started working. Bah.

So, I'm posting this here so that search engines will find it and other people can try a reload if they run into the same situation.