Monday, May 6, 2013

Convert Hex to Decimal in IOS

Lots of IOS commands produce output in hex that I sometimes want to convert to decimal. Common ones for me are stuff like:

show ip cache flow
show ip flow top-talkers

and various debug commands. For example:

Router#sh ip cache flow | i Fa1/0.6
Fa0/1.6  10.5.188.158   Tu101*  10.5.24.15  06 0DA7 6D61  345

I have no idea what the port numbers in columns six and seven are. Fortunately, if the IOS device has the TCL or Bash shells available, we can quickly convert them.

Method 1: Tcl Shell

Most routers have the Tcl shell available:

Router#tclsh
Router(tcl)#puts [expr 0xda7]
3495

Router(tcl)#puts [expr 0x6d61]
28001


You could write a callable Tcl script to make this permanently available from normal EXEC mode too.

Method 2: Bash Shell

The Bash shell came out in one of the early IOS 15.0 versions, so you may or may not have it available. You need to explicitly enable it by entering "shell processing full" in global configuration mode.

Router#sh run | i shell
shell processing full < required to enable Bash in IOS 15+
Router#printf "%d" 0xda7
3495
Router#printf "%d" 0x6d61
28001



Friday, April 12, 2013

Building a Ghetto WAN Emulation Network

I wanted a way to do some controlled tests of WAN acceleration products, using a production network. You can buy or rent commercial WAN emulators, but for my purposes it seemed like an improvised solution would suffice. I had a couple of Cisco 2800 routers, a switch, and an ESXi box in my lab that I could press into service, so I built a test network that looks like this:


R1 acts like the WAN router at a branch site. It has a QoS policy with a "shape average" statement on its "WAN" interface to change the bandwidth to whatever we want to test.

R2 simply NATs the test traffic onto an IP address in the production network, since I didn't feel like configuring a new production subnet just for the test.

The ESXi box is where the fun part lives: I created two vSwitches and connected one physical NIC to each. I then spun up a simple Ubuntu 12.04 VM with eth0 and eth1 connected to each of the two vSwitches, giving me a separate network connected to each Cisco router. I then enabled routing on the Linux VM and created the appropriate static routes to enable the test and production networks to communicate. Finally, I used the "netem" WAN emulator built into the Linux kernel to inject delay, jitter, and packet loss into the network. Voila -- a network de-optimizer!

For testing the WAN accelerators, we'll just install one in the test VLAN and one between the Linux router and R2.

Here are the basic steps required to set up the Linux de-optimizer, in case you want to try to build your own:

1) Basic Ubuntu 12.04 VM. I used 4GB RAM and 24GB disk, but you could get away with less.

2) sudo apt-get install openssh-server Install SSH server so you can still get to the VM from the test network when you break the rest of the test environment. Do this before you break something... ask me how I know. Don't forget to enable SSH on your lab routers too...

3)  Turn off Network Manager so it doesn't mess with your static addressing and routing config: edit the file /etc/NetworkManager/NetworkManager.conf and change "managed=false" to "managed=true"

4) Configure static addressing on your two NICs by editing /etc/network/interfaces:

auto eth0
iface eth0 inet static
  address 1.1.1.2
  netmask 255.255.255.0
 
auto eth1
iface eth1 inet static
  address 2.2.2.2
  netmask 255.255.255.0


5) Reboot or restart networking.

6) Enable IPv4 routing:

sudo sysctl -w net.ipv4.ip_forward=1

7) Configure static routes for the production and test networks:

sudo route add -net 0.0.0.0 netmask 0.0.0.0 gw 2.2.2.2
sudo route add -net 192.168.222.0 netmask 255.255.255.0 gw 1.1.1.1 

In this case 1.1.1.0/24 is the network connecting to R1, 2.2.2.0/24 connects to R2, and 192.168.222.0/24 is the test VLAN.

You may also need to delete other routes if they were autoconfigured.

8) After testing that everything works, add some latency:

sudo tc qdisc add dev eth0 root netem delay 50ms 5ms

Do a search for "Linux netem" for a wider array of commands to change delay, jitter, and packet loss.

With this setup, the routing configuration and WAN emulation settings will NOT persist after a reboot, so you can always reboot if you screw something up. Start over at step 6.



Friday, March 29, 2013

Friday Distraction: Who's Leaking >/24 to Global BGP?

[It occurred to me after finishing this that I should have done everything based on ASN, but play time is over for the day...]

An interesting conversation with my friend @denise_donohue led to this question: what providers are leaking prefixes longer than /24 to the global Internet?

Following my continuing theme of "fun stuff you can do by combining IOS and Bash", I ran a two step process via one of my BGP routers to get the answer:

$ ssh routername 'show ip bgp prefix-list GT24' > /tmp/gt24.txt

$ grep "^*" /tmp/gt24.txt | awk '{print $1}' | sed 's/\*>i//g' | awk -F. '{OFS=".";print $1,$2 ".0.0"}'  | sort -u | xargs -i whois {} | grep netname | sort -u

Here's the breakdown:

 Extract just valid BGP prefixes from the router output:

grep "^*" /tmp/gt24.txt | awk '{print $1}'
 
Extract just the prefix itself and substitute ".0.0" for the last two octets, normalizing to the parent /16, then remove duplicates:

| awk '{print $1}' | sed 's/\*>i//g' | awk -F. '{OFS=".";print $1,$2 ".0.0"}'  | sort -u

Send those prefixes one-by-one to the "whois" command, extract the "netname" field, and remove duplicates again:

| xargs -i whois {} | grep netname | sort -u
 
Note that this takes a while to run because of the time it takes the Whois server to respond.

The prefix-list that I used to get the output from the first step is as follows:

ip prefix-list GT24 permit 0.0.0.0/0 ge 25

Note that I used this as an argument to "show ip bgp", not as part of the config!

Now, this obviously isn't entirely accurate, because it only shows the providers that are leaking long prefixes that aren't being filtered by any of my providers, but it's interesting. I also searched based only on the parent /16, so there could be lower-level providers that I'm missing.

Some of them are clearly the same provider tagged with different whois records (e.g., "TBROAD" and "TBROAD-KR").



netname:        AFRINIC-NET-TRANSFERRED-200909

netname:        ASI

netname:        ASIANET

netname:        AquaRaySARL-2

netname:        BIDCMain
(about 100 more)




etc. Run it yourself to get the full list from your router's perspective!

Saturday, March 9, 2013

Quick Tip: Improvised File Transfer

The Python module "SimpleHTTPServer" is traditionally a quick and dirty way to test web code without the overhead of installing a full webserver. You can also use it as a quick way to transfer files between systems, if you have Python available:

jswan@ubuntu:~$ python -m SimpleHTTPServer 8080
Serving HTTP on 0.0.0.0 port 8080 ...

This causes Python to make all the files in the current directory available over HTTP on port 8080. From the client system, you can then use a browser, curl, or wget to transfer the file.

No, it's not secure, and yes, it may violate data exfiltration policies (and as an aside, Bro detects this by default). But I've used it fairly often to move files between Windows and Mac or Linux in situations where SCP isn't available and I don't feel like setting up a fileshare.

I've also used this in conjunction with the "time" command to test the effects of latency on various network protocols, and as an improvised way to test WAN optimization software.


Sunday, March 3, 2013

An Operational Reason for Knowing Trivia

I've been largely out of touch with the IT certification scene lately, but I'm sure that people are still complaining incessantly about the fact that they need to memorize "trivia" in order to pass certification tests. Back when I was teaching Cisco classes full-time, my certification-oriented students were particularly bitter about this. Of course, this is a legitimate debate and the definition of "trivia" varies from person to person.

When I saw this article about CloudFlare's world-wide router meltdown, however, I immediately felt a bit smug about all those hours spent learning and teaching about packet-level trivia. If you don't want to read the article, here's the tl;dr:

  • their automated DDoS detection tool detected an attack against a customer using packets sized in the 99,000 byte range.
  • their ops staff pushed rules to their routers to drop those packets
  • their routers crashed and burned
So at this point you should be saying what some of the commenters did: huh? IP packets have a maximum size of 65,536 bytes, because the length field is 16 bits long.

In order for this meltdown to happen, they had to have a compounded series of errors:

  • the attack detection tool was coded to allow detection of packet sizes that can't actually occur: no bounds checking.
  • the ops staff didn't retain the "trivia" that they learned in Networking 101, and thus couldn't see the problem with the output generated by the detection tool.
  • the router OS didn't do input validation, and blew up when attempting to configure itself to do something crazy.
There's lots of blame to go around here, and my intent isn't to add to that; rather, my point is to tell everybody who dutifully memorized and retained stuff like the maximum IP packet size to feel good about yourself for a few minutes! And next time you write networking code: do input validation and bounds checking.

Sunday, January 27, 2013

Baby Bro, Part 3: Containers and Loops

Bro has four main container types, which I'm going to cover in somewhat nontraditional order:
  • tables
  • sets
  • vectors
  • records
Tables
A table is a collection of indexed key-value pairs: the same idea is referred to as a dictionary, associative array, or hash table in other languages. Here's a simple example that pairs letters with their place in the alphabet:


1
2
3
4
5
event bro_init()
    {   
    local letters = table([1] = "a", [2] = "b", [3] = "c");
    print letters;
    }   

Running it, we get this:

jswan@so12a:~/bro$ bro tables.bro
{
[3] = c,
[1] = a,
[2] = b
}


 Note that the output isn't in the same order as the script; in Bro, like in most other languages, hash tables are unordered.

Iterating over a table with a "for" loop returns the key, again like other languages:


1
2
3
4
5
6
7
8
9
event bro_init()
    {
    local letters = table([1] = "a", [2] = "b", [3] = "c");
    
    for (key in letters)
        {
        print letters[key];
        }
    }

And the output:

jswan@so12a:~/bro$ bro tables.bro
c
b
a


Because we printed only the value associated with the key, we never see the key in the output.

Sets
It's common in programming to need a data type that allows one to make a collection of distinct objects, without containing multiple instances of identical objects. This is the mathematical notion of a set. Bro has a native set type. Consider an example where you have a large list of IP addresses that you got from some other source: an intel feed, a firewall log, a web server log, etc. You want to get a unique set of addresses that have appeared, but you don't care how many times they appeared. This is the perfect use for a set:


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
event bro_init()
    {   
    local a1 = 1.1.1.1;
    local a2 = 2.2.2.2;
    local a3 = 3.3.3.3;
    local a4 = [fe80::abcd:1];
    local a5 = 2.2.2.2;
    local a6 = 2.2.2.2;
    local unique = set(a1,a2,a3,a4,a5,a6);
    print unique;
    }   

And the output:

jswan@so12a:~/bro$ bro sets.bro
{
1.1.1.1,
3.3.3.3,
2.2.2.2,
fe80::abcd:1
}


Note that 2.2.2.2 appears only once in the set, despite having been included three times as different variables.

Vectors
A vector is Bro's version of a one dimensional array. Having spent most of my recent programming time in Python, I was surprised to find that Bro vectors work like hash tables for iteration: when you loop over a vector, you get the index into the vector rather than the object itself. Here's an example:


1
2
3
4
5
6
7
8
event bro_init()
    {   
    local animals = vector("cat","dog","dinosaur","rat");
    for (animal in animals)
        {   
        print animal;
        }   
    }   

The output:
jswan@so12a:~/bro$ bro vectors.bro
0
1
2
3


If you want to iterate a vector and get the object, you have to specify the index:

1
2
3
4
5
6
7
8
event bro_init()
    {
    local animals = vector("cat","dog","dinosaur","rat");
    for (index in animals)
        {
        print animals[index];
        }
    }

jswan@so12a:~/bro$ bro vectors.bro
cat
dog
dinosaur
rat


Bro's vectors work pretty much exactly like a table with "counts" as keys (a count is yet another native Bro type that we haven't discussed yet; it's the same as an integer but it's always unsigned; it can't be a negative). In fact, some of the earlier Bro documentation doesn't even show vectors as valid types, so I wonder if they are actually implemented internally as tables.

Records
The last Bro container type is the record, which is sort of the meat and potatoes of Bro's wonderful logging tools. Records are discussed in detail in most of the other beginner-Bro material out there, so I'm not going to cover them here.

This will probably be the last "Baby Bro" post that I do with just the raw language features demonstrated inside the bro_init() event. Any further Bro posts will probably be using Bro in its intended context. Hope this was helpful!

Thursday, January 24, 2013

Cisco Ironport WSA with WCCP and IP Spoofing

Recently I had to set up a transparent proxy with the Cisco Ironport Web Security Appliance (WSA) using WCCP on a Catalyst 6500 with a Sup720, with IP spoofing and web cache ACLs enabled. Like with many technologies, this turned out to be pretty simple but I couldn't find it documented all in one place. Perfect blog fodder!

The network topology looked like this (simplified, but not by much):



Normally when you set up a transparent proxy with WCCP, the IP address of the proxy server is used as the source of the HTTP requests. The problem in this topology is that I wanted the real source address of the client to appear in the firewall logs. The IP spoofing feature on the WSA allows this to happen, but it requires configuring bidirectional WCCP redirection on the Cat6k. If this had been a Cisco ASA firewall, we could have enabled WCCP there and saved some trouble, but in this case the network was using a firewall from another vendor that didn't support WCCP.

One important thing to realize about WCCP on the Catalyst 6500 with the Sup720 is that WCCP egress redirection is done with software switching rather than in hardware, so if you find yourself wanting to use the command "ip wccp redirect out", you're virtually guaranteeing that you're going to redline the CPU on your supervisor. Thus, we want to do only ingress redirection.

The 6500 configuration is as follows:

! this ACL prevents web traffic to internal servers (not shown in diagram)
! from being inspected
ip access-list extended ACL_WCCP
 deny   ip any 10.0.0.0 0.255.255.255
 deny   ip any 192.168.0.0 0.0.255.255
 permit ip any any


! define the web cache, referencing the ACL above 
! this WCCP service handles only standard HTTP traffic
ip wccp web-cache redirect-list ACL_WCCP
! a second WCCP service is used for reverse-path redirection, which
! is required for IP spoofing to work
ip wccp 90


interface Vlan 10
 description to client networks
 ip wccp web-cache redirect in

interface Vlan 30
 description to firewall cluster 
 ! WCCP service group 90 is applied inbound on the return path
 ! from the Internet so that IP spoofing will work
 ip wccp 90 web-cache redirect in

If you have multiple layer 3 interfaces going to client networks, you need to enable WCCP on all of them if reverse-path redirection is enabled. If we weren't using reverse redirection (i.e., if we weren't using IP spoofing), this wouldn't be the case: we could simply leave WCCP disabled on interfaces whose traffic shouldn't be proxied. With reverse redirection, though, the return traffic is always sent to the proxy server; if the proxy server sees the return traffic but not the egress traffic, it gets dropped. If you need to use IP spoofing and still have certain types of web traffic bypass the proxy, you would need to do this with ACLs applied to the WCCP service, rather than simply by leaving WCCP disabled on certain interfaces.

On the WSA, you create two WCCPv2 services under the Network-->Transparent Redirection menu, one with service group 0 (the default), and one with service group 90 (matching the the IOS configuration above). On group 90, you enable "redirect based on source port". For both groups, you enter the IP address of the SVI as the upstream router address (the IP address of VLAN 20 in this case). The WCCP service on the WSA then registers automatically with the Sup720.

Under the Security Services--> Web Proxy Settings menu, you enable transparent mode with IP spoofing for transparent connections only.

That's it: now you have a transparent proxy, with IP spoofing so that your firewall logs show accurate client IP addresses. Handling HTTPS traffic or HTTP traffic on non-standard ports is beyond the scope of this post.

Saturday, January 19, 2013

Baby Bro, Part 2: Conditionals, Address Types

Bro has native types for addresses and networks, making it much easier to work with network data. Today's Baby Bro script shows global variable definition, the use of the address and subnet types, and a simple conditional:



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# declaring global variables
# no need to put quotes around addr or subnet variable definitions
global ipv4_host:addr = 1.1.1.1; 
global ipv4_net:subnet = 1.1.0.0/16;
event bro_init()
    {   
    if (ipv4_host in ipv4_net)
        {   
        # addr and subnet types are autoconverted to strings with fmt 
        print fmt("%s is in network %s",ipv4_host,ipv4_net);
        }   
    else
        {   
        print fmt("host %s is not in network %s",ipv4_host,ipv4_net);
        }   
   }   

Running this from the CLI, we get the expected output:

jswan@so12a:~/bro$ bro addr_net_types.bro
1.1.1.1 is in network 1.1.0.0/16


Bro also has several interesting built-in functions for working with network data that we'll explore in upcoming posts. For now, we'll take a look at the mask_addr function, which allows you to use Bro as an improvised subnet calculator. You can run a Bro micro-script from the CLI with with  the -e option, just like the -e flag in Perl or the -c flag in Python:

jswan@so12a:~/bro$ bro -e "print mask_addr(10.18.32.199,14);"
10.16.0.0/14
jswan@so12a:~/bro$ bro -e "print mask_addr(10.18.32.199,31);"
10.18.32.198/31


Great for those late-night subnetting sessions after too many microbrews!

Just in case you were wondering: all of this works natively for IPv6, with some changes to the syntax:

jswan@so12a:~/bro$ bro -e "print [fe80::1db9] in [fe80::]/64;"
T
# T is the way Bro outputs "True" in a Boolean test

We'll look at some more IPv6 stuff in an upcoming post.

Tuesday, January 15, 2013

Baby Bro, Part 1: Functions Etc.

[Note: Blogger seems to have done something nasty to my new blog template, so it's back to the old one at least temporarily]

Here's my first "Baby Bro" post. Before getting into using Bro scripting for its intended use of network traffic analysis, I wanted to figure out how to accomplish basic tasks common to most programming languages:

  • Functions
  • Common types and variable definitions
  • Loops
  • Conditionals
  • Iteration of container types
  • Basic string and arithmetic operations
This is the kind of stuff that many programmers can figure out instantly by looking at a language reference sheet, but I think it helps the rest of us to have explicit examples.

I'm not sure if I'll get through all of them in this series, but here's a start: a main dish of functions, with a side of string formatting and concatenation.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  1 # "add one" is the function name
  2 # (i:int) is the variable and type passed into the function
  3 # the final "int" is the type returned by the return statement
  4 function add_one(received_value:int): int
  5     {
  6     local returned_value = received_value + 1;
  7     return returned_value;
  8     }
  9     
 10 # this function shows two strings passed in, returning a string
 11 function concat(a:string,b:string): string
 12     {
 13     return a + " " + b; # one way of doing string concatenation
 14     }
 15     
 16 event bro_init() # bro_init() fires when Bro starts running
 17     { 
 18     local x = 3; # defining a local variable
 19     local y = add_one(x); # using the first function defined above
 20     print fmt("%d + 1 = %d",x,y); # formatted printing as in printf
 21     
 22     print concat("first","second"); # using the second function defined above
 23     }

I think this is fairly self explanatory, given the comments. We have two functions:

  • add_one: adds one to whatever integer is passed into the function, and returns the resulting integer.
  • concat: concatenates two strings, separated by a space, and returns the result. There is a built-in string function for this, but I wanted to show that you can also do it with "+".
I also show local variable definition (Bro also has globals, defined with the global keyword) and string formatting. String formatting is basically the same as printf in other languages.

We can run this from the CLI with no PCAP ingestion just to get the standard output:

jswan@so12a:~/bro$ bro test.bro
3 + 1 = 4
first second



Monday, January 14, 2013

Basic Bro Language References

Finding simple examples of Bro language features is somewhat difficult: the scripts that come packaged with Bro are written by experts in the language and are quite idiomatic.

Here are some of the basic Bro language references I've found so far. In upcoming blog posts, I'll show some "Baby Bro" that is even more basic than these examples.

From Ryesecurity by Scott Runnels(@srunnels):
Solving Network Forensic Challenges with Bro, Part 1
Solving Network Forensic Challenges with Bro, Part 2
Solving Network Forensic Challenges with Bro, Part 3
Logging YouTube Titles with Bro

Justin Azoff's (@JustinAzoff) Bro Presentation on Github

The Official Bro Workshop 2011 Pages

The Bro Language Cheat Sheet

Sunday, January 13, 2013

Changes!

Yee-haw! New blog template!

Trying to figure out how to do source code highlighting in a way that doesn't suck or rely on external JavaScript hosting.

Friday, January 11, 2013

"Hello World" in Bro IDS

One of the reasons I don't blog that much is that I generally assume that everything worth blogging has already been done, and that everyone reading is probably smarter than me and doesn't need me to explain things. I'm going to pretend that those are non-issues and try to blog more, no matter how basic the topic. 

I just got back from FloCon 2013, which was quite interesting. The highlight for me was some informal after-hours knowledge-dumps from Seth Hall (@remor) and Liam Randall (@hectaman) on the subject of Bro (@Bro_IDS).

Before these mini-lessons, I didn't really have a good idea of how to get started with Bro scripting. Now I do!

The stuff we covered was actually a lot more complex than Hello World, but in the spirit of beginning coders everywhere, here's how you do "Hello World" in Bro (and a little more):

ubuntu@ip-10-73-25-224:~/bro$ cat hello.bro 
bro_init()
    {
    print "Hello World!";
    }

bro_done()
    {
    print "Goodbye World!";
    }

The fundamental idea behind Bro is that it's a scripting language that responds to events that are derived from packet streams. When Bro is monitoring a raw packet feed or ingesting a pcap file, it fires events whenever something interesting happens: FTP sessions, HTTP sessions, SSH sessions, etc.

The script above responds to the two simplest Bro events: the startup and shutdown of the software. If we run Bro from the CLI with a dummy pcap file to ingest, it writes the output to the terminal:

ubuntu@ip-10-73-25-224:~/bro$ bro -C -r foo.pcap hello.bro
Hello World!
Goodbye World!

Of course, this isn't something we'd ever do in a real Bro environment; we'd want to be actually looking at the packet stream and taking actions in response to it. Stay tuned for more.

Wednesday, November 14, 2012

Rebooting Routers and Random Search Engine Fodder

I frequently tell junior networking folks that rebooting routers without a good reason is a sign of weakness. All too many people who got their start as Windows sysadmins have learned from experience that the way to fix any weird problem is to reboot. In a Windows environment it usually works.

In the networking world, that doesn't cut it. One of the defining characteristics of a good network engineer is a belief in a "culture of availability", in which you don't take down services in a haphazard, unplanned fashion just to see what happens. Reboot-just-to-see-what-happens isn't a part of that culture.

Unfortunately, software bugs sometimes make reloads necessary, but I prefer to identify them before rebooting. Twice in the last month though, I've run into bugs that required a reboot without much research beforehand. The first one involved a brand-new 2900-series router that needed a reboot before the SRST EULA would show up as accepted: annoying! I opened a TAC case on this one and was advised to try a reboot before anything else; never got a bug ID either.

The second one involved an end-of-support voice gateway that I had to shift from MGCP to H.323 during part of a M&A transition. After making the changes, calls coming in on the PRI would connect, then disconnect immediately before matching any dial-peers: everything in debug isdn q931 looked normal, yet debug voip ccapi showed nothing. The console would log this message:

vnm_dsprm_voice_connect: mismatch htsp

Google was remarkably unhelpful on this, with only a few matches and none that were enlightening. I'm guessing that "vnm" stands for "voice network module", and I know that "htsp" shows up in a lot of voice module debugs, but I don't know much more. The "dsp" part seemed promising, but the gateway had no DSP farm configuration. I'd tried various combinations of "shut/no shut" on the T1 controller, voice port, and D channel with no luck.

With nothing else to go on and end-of-support equipment that had been up for about three years, I decided to give in to weakness and try a reload, just to see what would happen. It immediately started working. Bah.

So, I'm posting this here so that search engines will find it and other people can try a reload if they run into the same situation.

Wednesday, October 24, 2012

Walk on the Wild Side: VoIP over VPN over Internet

Over the years I've seen or heard a lot of snide or offhand comments (from vendors, at conferences, on Twitter, etc.) regarding running voice over Internet VPNs in the enterprise environment. It's often taken for granted that people will pay for MPLS VPNs just to be able to control voice quality, and people who don't do so are sometimes assumed to be either too dumb to know better, or at least "deserving" of what they get.

At the company for which I work, we've migrated over the last several years from a WAN consisting mostly of leased circuits and MPLS VPNs to running almost entirely on IPSec VPNs over the Internet. The biggest reason is cost: we work in what I only half-jokingly refer to on Twitter as #ExtremeRuralNetworking. Many of our WAN sites are in remote locations served by only one small rural LEC, with extremely long distances from the central office. Provisioning MPLS VPNs through nationwide carriers to these sites can be unbelievably expensive, sometimes 20-30 times the cost of a DSL circuit or an Internet T1, or even more. Frequently the only service available is based on some kind of long-range wireless technology. Sometimes local providers will sell you (or a nationwide MPLS provider) something that claims to be a terrestrial T1 but actually includes microwave hops. Occasionally a MPLS provider's backhaul network doubles or even triples the latency compared to an Internet path.

Our experience has been this: VoIP performance over Internet VPNs is almost as good as over MPLS VPNs with dedicated service planes. There is definitely a small percentage of the time that voice quality suffers, but when we asked our business units if they would rather have better voice quality or pay substantially larger WAN bills, the choice was easy. I would go so far as to say that 99+% of the time, most people can't distinguish between Internet VPN voice versus MPLS VPN voice.

Keeping in mind that we deal with relatively low call volume (i.e., we're not running call centers over Internet VPNs!), here are a few things I've learned in setting up VPNs for the best voice-over-VPN-over-Internet quality:
  • Set up your QoS mostly the same way you would over private circuits or MPLS VPNs: put voice in a priority queue, reserve bandwidth for call control, use a scavenger class, etc.
  • Shape your traffic to the physical capacity of the link; if you have a 1.5 Mb contract with a 100 Mb physical link, make sure you shape to 1.5 Mb.
  • Avoid radically asymmetric circuits if the "up" speed is very slow. We had several sites with a 12 Mb "down" speed and a 768 kbps "up" speed; this proved to be mostly unworkable given our traffic patterns.
  • Use the same providers where possible, and consider the number of AS hops between you and the target AS. This makes for better consistency between sites and streamlines troubleshooting. It also usually results in small reductions in latency, which make a big difference in voice quality.
  • Latency is usually the biggest variable in VoIP over VPN setups. The modern Internet combined with good voice codecs is surprisingly good at dealing with packet loss and jitter, but latency is often highly variable, and as I said above, makes a bigger difference than I would have thought.
  •  Simple and consistent configurations make things easier, as always. We use Cisco DMVPN, which makes for a pretty easy configuration template.
  • If you have more than one uplink, choose the one with the lowest latency as your primary, and check it periodically. Small providers frequently change their transit providers, and it's not uncommon to see big changes performance through the same small provider several times a year.
  • Set user expectations. If your business knows you're saving them money and the price is that they have to switch to cell phones, long distance PSTN dialing, or POTS lines a few times a year, they'll deal with it.
Your mileage may vary.