IPv6 implementation

This post describes my first attempt at implementation of IPv6, a process that took place over a span of a couple of months. After this was done, and was working, a “better way” emerged, which will be the subject of an additional post. I leave this in here for the sake of documenting what I did the first time, but in the unlikely event that anyone finds this while looking on the net for information about implementing a similar arrangement, I urge you to find the other post, and read it as well. This implementation was fragile.

A few weeks back (10 Feb) my friend Mr. G and I exchanged an email in which he said of a possible project “…but this would be an opportunity to learn IPv6”, reminding me that I have for a long time wanted to learn more about IPv6. Part of the genesis of that email conversation was a recent switch by my brother-in-law to a new ISP that employed CGN, so called Carrier Grade Nat, which had disrupted arrangements I had in place for reaching my brother-in-law’s home network. Mr G. opined that the move towards CGN, and other things the ISPs were doing, raised the specter that someday, perhaps sooner than we expect, anyone desiring to do more with their network than occasionally use a browser would find ourselves having to move to ipv6.

More, I have actually wanted to use IPv6 for a long while, but had been under the impression (erroneously) that Comcast really wasn’t ready for this, that all they would give me was a 6to4 tunnel, which I barely understood anyway.

I began to do some research to learn more about IPv6, and how I could use it in the house. Mr. G helped throughout this effort, we were on the phone a lot and exchanged a lot of email. I could not have succeeded without his help, particularly when we found we had to set up a baby routing protocol in the house to cope with my multi-router setup (later).

I discovered that Comcast would give me a legitimate routable /128 address. It depended upon a setting in my old Netgear router. If it were set to “auto”, as it was, for some reason it would only obtain a 6to4 address. It had been this way for years. I had to figure out that what I had was a 6to4 address, and what that meant, and that I really couldn’t go farther that way. But then I found that if I simply changed the setting to dhcp, Comcast would dutifully gave me a real routable /128 address. And furthermore, magically, it would delegate a /64 prefix for the lan side. That is, the wan side of the router had a Comcast /128 address, but the lan side of the router had a different 64 bit network number, which was “all mine” and devices on the lan side could all be given different host numbers. I would be limited to a mere 18,446,744,073,709,551,616 computers in the house. It might just be possible to squeak by.

Initially I thought that I could just use my shiny new /64 and then everything else could use slaac. I didn’t know what slaac was before I started. The interfaces automatically give themselves an address, based on their mac address – thus ensuring uniqueness. It sounded good, but it was wrong. There are several routers in the house, and I don’t want to change the topology — I need all the ipv4 stuff to continue to work properly. Therefore there need to be routable subnets for ipv6 also. Subnetting in ipv6 is all about multiple prefix values in the first 64 bits. Eventually I figured out that I needed to get at least a /60 prefix delegation from Comcast, to give me multiple networks. With a /60 I can have 16 networks (each limited to the aforementioned pittance of 18,446,744,073,709,551,616 hosts).

In the end I have to say that what Comcast is actually doing turns out to be fine, but what they are willing or able to tell you about it is pretty abysmal. I searched in vain for any information from Comcast about whether or not it was even possible to get a delegation of other than /64. There is NO data from Comcast on this that we could find anywhere. Mr. G and I were reduced to sending each other links — here is some guy who said back in 2014 that Comcast only allows /64; here is another guy with a post on Reddit that claims he managed to get a /60 but he had to monkey with his DUID, whatever that is (I know what it is now, but didn’t know then). On one day Mr. G spent hours on the phone with Comcast. I felt bad that he had to do it, so the next day I spend hours on the phone with Comcast. I encountered a really helpful young lady named Blythe who didn’t understand the technical question but was willing to try to run interference inside the company to try to get answers if it took all day. She was great. But in the end, even with her help, we couldn’t get anywhere with the simple question: Does Comcast, by policy, limit prefix delegation to /64? There is all kinds of stuff on the internet which alleges this to be true. It turns out to be false, but I was never able to get an answer from Comcast.

One problem that emerged was that I wasn’t sure the old Netgear router was doing the right things, and how could I? I couldn’t see inside it, I couldn’t tell what it was doing. It was 8-10 years old and I was suspicious that some of my problem might be owing to it. Eleven days into the struggle, I bit the bullet and replaced it with another Ubiquiti EdgeRouter X (ER-X). I already had an ER-X, thanks to the recommendation of another friend, Mr. A, which was managing the internal network. I replaced the outer network router, the Netgear at the interface to Comcast with a new Ubiquiti ER-X called chersonese, yet another word that roughly means rock point. The routers are chersonese, promontory (the inner ER-X), obelisk (a chinese openwrt router always connected to a vpn on the wan side), and now the old Netgear is still in the system only for its radio, and now services IoT things in the house wirelessly. So it is now called “things” and offers a hidden SSID also called “things” for e.g. the thermostat.

Armed with the new external router, I was finally able to do tcpdumps at all interfaces and begin to learn what was going wrong. Little by little, with a lot of detours down various rat-holes (e.g. ipv6 proxy arp daemon, dhcpv6-relay) we got bits of it working, so that by day 18 all the boxes in the house were getting routable ipv6 addresses, and I was able to successfully ping6 google.com from an internal computer. Admittedly, at this point I was using a dhcpv6-pd server in the cheronese router which had hard-coded addresses in it, such that if the prefix delegation from Comcast were to change across, say, an extended power outage, the whole house of cards would collapse. Also, success at this point depended also upon my having added an additional route to the chersonese routing table, also by hand, which would disappear across a router reboot. But hey, now I knew it could work, if I could solve those problems.

It was at this point that Mr. G was even more essential to the project, because he helped me to set up routing daemons between the routers. Routing protocols are the deepest of dark magic to me, and I would not have attempted this without his guidance. We initially tried to do it with ripng, but after a few days of frustration we switched to ospf and that worked. Don’t know still what the problem was with rip, and probably won’t go back to figure it out. OSPF (or ripng if it had worked) was a battleship blasting a gnat. It simply makes it possible for the inner router to inform the outer router how to route packets to the inner network. With that addition, we solved the problem of needing a hand entered hard-coded route, dependent on the allocated prefix, which would disappear across a router reboot, and be wrong if a new prefix were allocated.

Today, after spending a few days reading the code in various perl scripts in the ER-X, I was able to make a small change to one of their scripts to add a single line to one of the config files, such that if the prefix delegated by Comcast changes, so too will the prefixes used on the inner router for computers on the inner network.

Conceptually, here is what is happening.

The outer router (chersonese) has eth0 as the upstream/wan interface, connected to Comcast. Interface eth1 is assigned as a secondary internal lan for some things, but isn’t important for this discussion. The other 3 interfaces (eth2-eth4) are combined as switch0, which is the primary lan interface for this router. On the eth0 interface there is a dhcpdv6-pd server, which requests a /60 prefix delegation from Comcast, while also obtaining a /128 address for itself. This pd-server then allocates from the /60 pool it receives a /64 prefix (xxx1) to eth1 and another /64 prefix (xxx0) to switch0.

On switch0 (the important one), the dhcpv6-pd server assigns a host-address of ::1, a prefix-id of 0 (has to be zero since this is a /64), and designates that switch0 should use the service dhcpv6-stateless. This results in chersonese starting another dhcpv6 server process which services incoming dhcpv6 requests from devices attached to the switch0 interface. I don’t actually use dhcpv6 server to assign addresses, but I have to have the server there, read on.

Here is where I deviate from a “standard” configuration. I want to arrange for the dhcpv6 server which is handling switch0 to ALSO perform prefix delegation, for routers downstream from it.

I obtained a /60 from Comcast, a 64 bit network number ending in (hex) xxx0, meaning I have 4 bits of prefix which I am allowed to allocate internally: 4 bits, 16 networks. Conceptually, I sub-divide this into two groups. The first group 0 to 7 can be allocated to the downstream interfaces on chersonese (and initially I was only using two of them, xxx0 for switch0 and xxx1 for eth1, leaving xxx2 through xxx7 unused. Later I will use one of these). The second group of 8, from xxx8 to xxxF I reserve as a pool of 8 prefixs to be assigned by the dhcpdv6 server operating on the switch0 interface, to be allocated to routers which are downstream from chersonese, using prefix delegation – just as Comcast does for me. I did various implementations of this, and in actual fact I am only delegating 7 prefixes, from xxx9 to xxxF, for a reason that is no longer important.

To take advantage of this, the inner router, promontory, also an ER-X, also operates on its eth0 upstream interface a dhcpdv6-pd server which requests a prefix delegation from upstream, but in this case it asks for a /64. The request goes to chersonese’s switch0 interface where the dhcpdv6 server there allocates one of its 8 (7 really) pooled prefixs in the response. Promontory asks for and is allocated a full /64 prefix to use on its downstream side. It actually gets the highest one, ending in xxxF, and is configured to assign that prefix to its own switch0 internal interface. By doing this, the switch0 interface issues router advertisements internally containing this prefix xxxF. Thereupon all the boxes on the internal network which are attached to the switch0 network get router advertisements and allocate ipv6 addresses for themselves using this prefix and slaac.

To make this work, the dhcpdv6 server which is running on switch0 of chersonese (a standard built-in feature of the router), must be configured to include a single additional line which while it is standard to the dhcpdv6 server, is not something the standard ER-X will generate:

prefix6 xxxx:xxxx:xxxx:xxx9 xxxx:xxxx:xxxx:xxxF /64

where xxxx:xxxx:xxxx:xxx0/60 is the prefix allocated by comcast. There is no way to do this in the ER-X through the standard cli or gui tools, but it is a very simple change to a single perl script which is responsible for generating the dhcpdv6 config file for the dhcpdv6 server upon receipt of dhcpv6 messages via the client, from upstream. This script is called dhcpv6-pd-response.pl, and it lives at /opt/vyatta/sbin/ in the ER-X. The script exists specifically to arrange that if the upstream delegation changes, the ER-X will update the downstream dhcpdv6 server and the downstream radvd server (router advertisements) with the new delegated prefix.

My change is somewhat kludgey… the prefix is ALWAYS put in. Any dhcpdv6 server on ANY of the downstream interfaces WILL offer to delegate prefixes. If I were to put such a server on more than 1 downstream interface they would be allocating the SAME prefixes, and if the were both asked for prefixes, it would almost certainly break. It is up to me not to do that.

There is more to do. The presence of ipv6 caused my redundant (vrpp) pi-holes to go crazy (I’m the master!, no I’m the master!).

Chapter 2: The Piholes

6 March: I need to fix the pi-holes to handle ipv6. As I look into doing this for ipv6 as I did for ipv4, the problem that emerges is assigning an ipv6 address for the pis. Much as I assigned static ipv6 addresses for clove and bayleaf, and another arbitrary static address for the vrpp protocol, so I need to have an ipv6 address I can give out to all the boxes in the house. I had been getting the ipv4 static addresses by way of dhcp, but this proved not a good idea for the ipv6 case.

Mr. G suggested I look at ULA addresses: unique local address, RFC4193. These are globally unique but reusable prefixes from the FC00::/7 range, with the next bit, bit 7, set to 1 to indicate a local address, and the following 40 bits being user-assigned according to a uniqueness algorithm (to minimize duplication of prefixes). Thus these addresses will all start with FDxx indicating local IPv6 unicast addresses. While they are not globally routable, the RFC says “Their limitation is in the routability of the prefixes, which is limited to a “site” and any explicit routing agreements with other sites to propagate them.” There is a suggested algorithm given for generating these prefixs, and I invoked that with the mac address of Promontory’s switch0 interface, which is: b4:fb:e4:b3:7d:c1. I used a page out on the net claiming to generate these prefixes according to the suggested algoriths (https://cd34.com/rfc4193/) which says: “This page uses the first method suggested by IETF using the current timestamp plus the mac address, sha1 hashed, and the lower 40 bits to generate your random ULA. Consequently, if two organizations hit this page within the same second, with the same mac address to generate a ULA, they could have identical ULAs.”

The resulting ULA prefix for my internal network: fd30:1839:ded::/48

How this will end up being allocated I am not yet sure. This gives me 65,535 networks I can allocate. Seems it ought to be enough. I guess I will start out by identifying one network upon which I can place the pi-holes. I’m not sure if there is some reason to make this something different from just being the internal 111 subnet. I think I will start by allocating ipv6 network 111 as the internal ipv6 equivalent of the ipv4 111 subnet. If you keep reading you will discover than later I assigned another network, the 122 subnet, for the virtual machines which are hosted on one of the internal machines. All those VMs are on the ipv4 subnet 122, so I chose the same for the ipv6.

So fd30:1839:ded:111::/64 is the internal ipv6 subnet

<internal>::1 promontory switch 0

<internal>::2 virtual (vrpp) VIP for dns

<internal>::2 bayleaf

<internal>::3 clove

It took me a while to get this set up correctly. First, I thought of starting a dhcpdv6 server on promontory, giving out these addresses. That proved to be a bad idea, because all the boxes picked up those addresses and stopped using the delegated prefix ones. So I got rid of that.

It is really only these few that actually need to have/use an fd30 ULA address. For everyone else I want them to prefer ipv6 addresses with the delegated prefix. So on the two piholes, instead of using dhcp (I disabled dhcpcd), and I have set them up in the old way, with /etc/network/interfaces (actually a file in interfaces.d is the current preferred way). This contains the desired static ip addresses, netmasks, and routers for both ipv4 and ipv6, with auto to do the ifup automatically after a reboot.

Once the addresses were assigned and working, I change the pihole configuration file setupVars.conf to have the ipv6 addresses, and I also set them to use the comcast ipv6 dns addresses 2001:558:feed::1 and ::2 as upstream ipv6 nameservers. I also, while in the neighborhood, apt upgraded both of the piholes, updated the pihole software pihole up, and updated the gravity databases to be sure I was picking up ipv6 addresses to block also.

Chapter 3: keepalived

The next step was getting the two piholes to fail over correctly under ipv6. There wasn’t a lot of information available on this, but it turns out you have to create separate vrrp_instances for the ipv4 and ipv6 cases, so I have one PIHOLE4 and the other PIHOLE6, pretty much identical except the unicast_src_ip, unicast_peer, and virtual_ipaddress which are the ipv6 equivalents in ipv6. Then I created what is called a vrrp_sync_group which lists these two instances. That means that they switch to backup together – if one fails, they both transfer.

Chapter 4: IPv6 nameserver

Now that the piholes were working right, and sharing a virtual ip4 and a virtual ip6 address, I needed to start distributing the new virtual ipv6 name server address fd30:1839:ded:111::2/64 to all internal boxes. Since we are not running a dhcpv6 server, the way I did this was to modify the router advertisement on promontory. In the config tree this is interfaces/switch/switch0/ipv6/router-advert where I entered that address as the name-server.

I think what happened here is that when I make an entry for router-advert under switch0/ipv6 that causes the code to regnerate the /etc/radvd.conf, and restart the radvd. When I looked at the radvd that was running it was:

 
 interface switch0 {
 #   (comments removed)
     IgnoreIfMissing on;
     AdvSendAdvert on;
     AdvOtherConfigFlag off;
     AdvDefaultLifetime 1800;
     AdvLinkMTU 0;
     AdvCurHopLimit 64;
     AdvReachableTime 0;
     MaxRtrAdvInterval 600;
     MinRtrAdvInterval 198;
     AdvDefaultPreference medium;
     AdvRetransTimer 0;
     AdvManagedFlag off;
     RDNSS fd30:1839:ded:111::2 {
     };
 }; 

Notice there is not a prefix advertisement, but there is a name-server advertisement. I removed the switch0/ipv6/router-advert, and made a little dummy change to the eth0/dhcpdv6-pd/pd/0/interface/switch0 so that it would regenerate the radvd.conf, and it created a new one:

 interface switch0 {
 #   (comments removed)
     IgnoreIfMissing on;
     AdvSendAdvert on;
     AdvManagedFlag off;
     AdvOtherConfigFlag off;
     prefix ::/64 {
           AdvOnLink on;
           AdvAutonomous on;
     };
 }; 

Notice here, the prefix advertisement is back, and the name-server advertisement is gone. I think what happens is that when the code processes the router-advert setting for switch0, it re-generates that part of the /etc/radvd.conf for switch0, but doesn’t detect there are settings there generated automatically by the pd, and loses track of them.

As an experiment, I added a setting in interfaces/switch0/ipv6/router-advert with not only the nameservers, but also a prefix ::/64. Now I get both things in /etc/radvd.conf:

 interface switch0 {
 #   (comments removed)
     IgnoreIfMissing on;
     AdvSendAdvert on;
     AdvOtherConfigFlag off;
     AdvDefaultLifetime 1800;
     AdvLinkMTU 0;
     AdvCurHopLimit 64;
     AdvReachableTime 0;
     MaxRtrAdvInterval 600;
     MinRtrAdvInterval 198;
     AdvDefaultPreference medium;
     AdvRetransTimer 0;
     AdvManagedFlag off;
     prefix ::/64 {
         AdvPreferredLifetime 604800;
         AdvAutonomous on;
         AdvOnLink on;
         AdvValidLifetime 2592000;
     };
     RDNSS fd30:1839:ded:111::2 {
     };
 }; 

This is better, of course, and is accurate. But it doesn’t change the fact that if for some reason I have to change the switch0 pd specification, it will overwrite this one.

Chapter 5: IPv6 on Amazon

The next step was to get an ipv6 address for tarragon. I thought this would be simpler than it was, because I failed to understand just how rich a network environment I was actually getting with an EC2 instance. Of course I had seen various references to vpcs and subnets, but had never paid much attention. Suddenly it mattered.

With an Amazon EC2 instance, in addition to the region stuff, and the elastic IP stuff, they also place your instance on a subnet (which is one of the many Amazon things with a number that you can manage), and the instance is located on a “vpc” which is a “virtual private cloud”, another thing with a number that you can manage. You don’t have to go out of your way to get these things, when you create an instance it is on a subnet (like subnet-917788ff) and that subnet is on a vpc (like vpc-9fa249fa). A vpc has certain properties (like dhcp options, a routing table, ipv4 non-routable address ranges (like a /18 group of 4 /20s: 172.31.0.0/20, 172.31.16.0/20, 172.31.32.0/20 and 172.31.48.0/20, and it can also have, if you click the buttons, an ipv6 prefix (like 2600:1f13:18:1400::/56). All you need do is ask, and they give you a /56. You can only have one (I suppose if you are a bigger customer there are ways to get more, but gracious whatever would I want with more than 256 ipv6 networks! A /56 already seems excessive, and profligate.

Having obtained the /56 for your vpc, you then assign a /64 to each subnet, and as I have only one subnet I assigned subnet 01 to it (so 2600:1f13:18:1401::/64) by editing the subnet and just clicking a few buttons. Then you run along to your actual instance, and click a few more buttons to modify its interface, and tell it to assign an IPv6 address, and it does so, a /128 within you /64. Bob’s your uncle.

Next you have to add a default ipv6 route to the routing table. A vpc has a routing table associated with it (like rtb-9921cbfc) as well as a dhcp option set (like dopt-55998937). One has to modify the routing table to add a default route ::0/0 pointing to the amazon provided internet gateway. This is conveniently provided in a drop down box, and again is a thing with a number you can manage, (like igw-57658432), which has various properties – though in this case not many aside from tagging it with various strings that may mean something to you. The igw, which I didn’t actually even know about before, appear to be the gizmo that provides nat on the ipv4 interface.

Chapter 6: Virtual Machines

March 17: The virtual machines on Cinnamon are on ipv4 subnet 192.168.122.0/24, and I already had a static route in Promontory to set next hop for that subnet to Cinnamon. This was working. To get ipv6 to work on the VMs, had to do some additional things:

  1. I manually (conceptually) assigned prefix 3 as an IPv6 subnet for the virtual machines. Subnet (prefix) xxx0::/64 is switch0 on chersonese (the “external” network), xxx8::/64 is eth1 on chersonese (the DMZ), xxx9 to xxxF are the PD pool assigned to switch0 on chersonese (as described above), which are available for delegation downstream, and of that, only xxxF::/64 is currently in use, and is assigned by promontory to its switch0 interface, and then advertised to the boxes on the internal subnet. So now I have also assigned xxx3::/64 for the virtual machines ipv6 subnet.
  2. I added a static ipv6 route into Promontory for the prefix xxx3:/64 using as next hop the ULA address for Cinnamon. This was done in the Promontory config tree through the gui. Caveat: This is not automatically updated if Comcast changes my delegated prefix. That needs to be fixed somehow.
  3. I added the setting “redistribute static” to the OSPFv3 parameters on Promontory (also via the config tree), so that Chersonese would be informed of the route. Once again I relied upon the advice of Mr. G who told me in one of our email exchanges to do this, I might not have thought to do it.
  4. I started a radvd daemon (with systemd) on Cinnamon to advertise the xxx3://64 prefix onto the subnet serving the VMs. The config file for radvd was done manually, and needs to be changed/automated as it contains a hard coded prefix which won’t be correct if the delegated prefix from Comcast changes. The radvd daemon also advertises my ULA prefix fd30:1839:ded with the next 16 bits set to 122 into the virbr0 bridge. VMs on the bridge are picking up both RAs.
  5. In order to be able to send RAs to the bridged network, I have to give an ipv6 address to virbr0. This is done by modifying the settings in libvirtd for his “default” network, which is virbr0. He has a file /etc/libvirt/qemu/networks/default.xml containing the parameters for the setup of the default network, which is the bridge. I had to modify that file to add an ipv6 address for the cinnamon connection to the bridge, which I made <prefix>::1 (in similar fashion virbr0 is allocated 192.168.122.1/24 on the ipv4 side, but that is standard libvirt). I did this for both the delegated prefix address and the ULA address. Again, this needs to be manually adjusted if the delegated prefix changes.
  6. When I was first trying to debug this stuff, I had to manually add a static route on cinnamon for the delegated prefix address of virbr0. I think that when I first set it up, I had radvd sending routes specifically to the interfaces named vnet0, vnet1, tap0, etc which were the interfaces that got built on the bridge for each VM that got launched. I wasn’t sure what I was doing at that point, and although sending routes to those interfaces from radvd worked (in the sense of the VMs configuring themselves with the advertised prefixes), it wasn’t until I added the IPv6 address for virbr0 into libvirt that things began to work — and once I did this I realized that I did not need to list every interface in radvd — all I had to do was send the RAs to the address of virbr0 (now that I had one) — since all the VMs are bridged to it. Once I started doing this, I think the need for manually putting in the static route in cinnamon disappeared. It looks to me like when radvd runs and sends RAs into virbr0, the information from those RAs is automatically creating the routes on cinnamon.
  7. For the RAs to be processed, in is also necessary for the VMs to handle the RA correctly. On at least the linux boxes it is necessary to set the /proc/sys/net/ipv6/all/accept_ra to 2. I made this change in /etc/sysctl.conf also.