.eu and EURid: .eu comments on Internet security, IPv6 and new gTLDs

General Manager of EURid, the .eu top-level domain (TLD) registry, Marc Van Wesemael expresses candid opinions on Internet security, IPv6 and the upcoming TLD market liberalisation, ahead of .eu’s fifth birthday on 7 April 2011.

Complete info at TMCnet.

Microsoft in the market for second-hand IPv4

Nortel, in bankruptcy, sells IPv4 address block for $7.5 million.

The March 23 edition of the Dow Jones Daily Bankruptcy Report has reported that Nortel’s block of 666,624 IPv4’s was sold for $7.5 million – a price of $11.25 per IP address. The buyer of the addresses was Microsoft. More information is in its filing in a Delware bankruptcy court. Now the interesting question becomes, does the price of IPv4s go up or down from here?

Source: Internet Governance Project

Wereld IPv6 dag

Op 8 juni wordt er, op initiatief van ISOC, een wereldwijde IPv6 dag georganiseerd. Zo veel mogelijk bedrijven/websites worden opgeroepen om gedurende 24 uur hun content over IPv6 aan te bieden. Het doel van deze grote test is om organisaties in de gehele industrie, ISP’s, hardware producenten, verkopers en webservices, te motiveren om hun diensten IPv6 gereed te maken en om ervoor te zorgen dat er een succesvolle overgang mogelijk is als de IPv4 adressen op raken.

De Nederlandse IPv6 Task Force roept alle organisaties op om mee te doen aan dit initiatief en zo de overgang naar IPv6 te versoepelen. Voor meer informatie klik hier.

Digging Through the Problem of IPv6 and Email – Part 3

IPv6 changes things up because there are 128-bits in an IP address. Here’s an example from Wikipedia:

It’s beyond the scope of this post to describe the notation of IPv6, but you can see that a /32 is no longer the smallest IP range, it is now /64. The size of a standard subnet is 2^64 IP addresses, the square of the size the number of IPs in IPv4. While the planners of IPv6 don’t think that the entire address space will be used, it will very much make network routing and management more efficient.

One idea to make the problem of mail more manageable is to restrict the address space that is allowed to send mail. In an ideal world, we’d restrict where mail mail servers could send mail from. So, if we say that the number of individual mail servers in the world will probably never exceed 32 million (not unreasonable), or 2^25, then what if the 25 least significant bits were reserved for mail servers? Right off the hop, any IP address that tried to connect to you and send mail that was outside the range (in hexadecimal) of 0:0:0:0:0:0:0:0 to 0:0:0:0:0:0:0200:0000 (or, :: to ::0200:0000) could automatically be rejected. This would almost be a PBL in reverse. Whereas PBL lists IPs that should never send mail, this algorithm would say to only accept mail from IPs that are allowed to send it and to reject everything else.

This is actually related to the idea of moving to a whitelist solution — to only accept mail from the servers you want to receive mail from. However, the problem with whitelisting is that you would never be able to hear from a new recipient, only pre-existing ones and that defeats the purpose of email — that you can hear from new people that you haven’t previously communicated with. With this idea of mail addressing restriction, you do get to hear from new servers/IPs and ignore those from whom you have never been introduced because new people who you might want to hear from will be sending mail from a permitted set of IP addresses. All of the standard reputation tracking applies and we have now restricted the amount of space that spammers can hide in. If they want to send spam from spamming mail servers that traditionally never send mail, they won’t be able to do it because all of the good guys have already set up an agreement that says “If you want to send mail to us, you must do it from this set of IP addresses.” Randomizing the IP to send from a mail server that is outside the pre-agreed range will not make it easier for a spammer to hide because they wouldn’t have been able to send mail from it anyhow. To make an analogy, if you send mail from an IP on the PBL and then switch IPs to another IP on the PBL, it doesn’t matter because in either case, your email would still be rejected.

Now, as it turns out, the least significant 64 bits are actually reserved in IPv6. The first 64-bits of the IPv6 address are the network address (48 bits routing prefix and 16 bit subnet id), and the last 64 bits are the interface identifier. The 64-bit interface identifier is either automatically generated from the interface’s MAC address using the modified EUI-64 format, obtained from a DHCPv6 server, automatically established randomly, or assigned manually. So, using those least significant 64 bits is going to be problematic because an IP address is how we identify a device attached to the Internet and if they are already predefined by some algorithm, then we can’t use them. In other words, the least 25 bits in an IPv6 address are already spoken for. However, we could allocate some other 32 million or so IP addresses (a /103) somewhere that is used sending mail… couldn’t we?

[Side note: because the MAC address of the machine is used to generate the interface identifier in some cases, this makes it easier to reject mail from these servers. You’re no longer blocking an IP address that is subject to change in the case of DHCP, but instead blocking the actual piece of hardware who cannot change its MAC address. It’s a more granular level of block that is more reliable… if we can determine that the IP was generated using the MAC address.]

While in theory this could work, it would have to be managed and that could sprawl out of control. The reason is this: which block of IP addresses do we reserve for only sending mail? What if that range had to be shared across millions of customers? For example, suppose we had 1024 IP addresses to allocate and we decided to reserve 500-564 (1/16 of the Internet) for sending mail. How do we share it? Let’s suppose that there are 10 major regional Internet registries who hand out the IPs to their customers (ISPs, people with their own home Internet permanent connections, etc). Let’s suppose they decided to divide it up manually. RIR 1 gets addresses 0-99, RIR 2 gets 100-199, and so forth up to RIR 10 who gets 900-999 with the final 24 IPs being reserved for special functions. However, RIR 6 has all of the IPs that get to send mail. That’s not fair and nobody would agree to that.

So, we decide to divide things up. RIR 1 gets addresses 0-99 plus 500 — 504 (5 IP addresses used to send mail). RIR 2 gets 100-199 plus 505-509 (also 5 IP addresses). Thus, each of the registrars has to “logically” manage both its allocated range and its special email range. Instead of using CIDR ranges to allocate everything nicely, it has to have a big table of who owns what. This gets very messy when you have to have a lot of different IP ranges, particularly when the universe is as vast as IPv6 is. On the other hand, we’re going to have to manage lots and lots of IP addresses anyhow. If IANA publishes the rules and says these are the designated IP ranges that are used to send mail, and here’s how you apply for them, then everyone is playing by the same set of rules right from the beginning. Not only that, but it’s really not all that different from today. Regional Internet registries (RIRs) already allocate space to local Internet registries (LIRs) who then distribute the blocks down to their customers. When IANA provisions space, it would have to ensure that it provisions it such that it takes the special reserved range for mail into account. Indeed, this is something that it already does today when it provisions IP space as well as geo-allocates it. Smarter people than me could probably figure out the necessary algorithms.

You can see from the above doing an even distribution based upon numerical order is not going to work but reserving IP ranges and then mapping them out and handing them out probably would. Even today, we have reserved IP address space that nobody is supposed to use (224.0.0.0 upwards is reserved for multicast, 10.0.0.0/8 is part of RFC 1918’s internal address space, and so forth). The work that needs to be done here is that a committee of people has to sit down, figure out how many IP addresses should be reserved for sending mail — such that we are not likely to run out of space in a couple of decades — and then reserve an appropriate range for it. IANA then has to reserve that space and come up with rules for how to hand that out to the RIR’s who then have to come up with rules for how to allocate it to the LIRs, who then have to figure out how to allocate it to their customers. They then have to manage the infrastructure necessary to maintain the mappings of who owns what.

Next, RFCs need to be written on how to send and receive mail over IPv6. Then, software vendors need to write code to do IPv6 email transaction that are able to implement these rules. Finally, IP blocklist maintainers need to start populating their lists in IPv6 notation but pursuant to the restrictions that are built into the RFCs.

It’s a ton of work, years of it, but if we want to start receiving mail over IPv6 then that’s what needs to be done.

Click to read Part 1 and Part 2.

Written by Terry Zink, Program Manager

Digging Through the Problem of IPv6 and Email – Part 2

IPv6 multiplies this problem. We have seen that spammers already possess the ability to hop around IP addresses quickly. They do this because once an IP gets blocked, it is no longer useful to them. There are only so many places they can hide, though — 4.2 billion places they can hide. However, in IPv6, if they are able to do the same pattern of sending out mail and hopping around IP addresses the same way they do in IPv4, then there is virtually unlimited space they can hide in. To put it one way, I’ve seen estimates that there are 250 billion spam messages sent out per day. Under IPv6, spammers could send out 1 piece of spam per IPv6 address, discard it and then move on to the next IPv6 address for the next 1000 years and never come close to needing to re-use a previous IPv6 address. A mail server could never load a file big enough even for one day’s IPv6 blocklist if spammers sent every single spam from a unique IPv6 address. Because spammers could hop around so much, IP blocklists could conceivably encounter the following problems:

  1. They would get to be too large for anyone to download, process and upload.
  2. They would be latent since by the time the IP was listed, spammers would have discarded it and moved on to the next IP address.

This is why no mail receivers are thrilled about the idea of using IPv6 to send mail. It means that they have to allow for the worst case scenario, and that worst case scenario is that spammers will overwhelm their mail servers and drain processing power having to deal with a 10x increase in traffic.

So how do we deal with it?

One idea, as referenced by the writers above, is to use whitelists instead of blocklists. Block all mail from everyone and then maintain a central whitelist of good mail servers that send legitimate mail. The weakness here is that it defeats the whole purpose of email. The purpose of email is that you can hear from new people you haven’t heard from before. New mail servers are brought up all of the time. There’s no way for you to know about it and the process of having to opt people in is a pain and hassle. This idea could be centralized, but the legitimate mail servers for one set of folks is not going to be legitimate for another set of folks.

Another idea is to take an unmanageable problem and break it down into a manageable one. I haven’t really fleshed this out through any working groups, but let’s go back and take a look at how CIDR notation works and how blocklists take advantage of them. Consider the IP 216.32.180.16. This can be broken down into four 8-bit octets, and then combined to make one 32-bit number:

A CIDR range is something that is a bit-wise operator. The CIDR range is the number of bits that is common to the range and contains every IP within that range which contains the first xx number of bits (wow, that didn’t sound very clear). Let me use an example. Let’s take the range 216.32.180.0/24. If we convert this down to the bits that it represents, then this range of IPs is any IP that contains the first 24 bits since the /24 says to take the first 24 bits:

216.32.180.16 is said to fall within the range 216.32.180.0/24 because the first 24 bits of the 32-bit representation of 216.32.180.16 is the same as the first 24 bits of 216.32.180.0/24:

The first 24 bits match, the last 8 do not (illustrated by the 1 in green) but it doesn’t matter because we only need to match the first 24 bits. The red and blue parts match up and therefore 216.32.180.16 falls within the range of 216.32.180.0/24. However, if we take a slightly different IP address, 216.32.181.16, that will have a different 32-bit mapping. It will not fall into the /24 range because the last bit does not match:

You can see that specifying things in CIDR notation is a very quick and easy way to list IPs on a blocklist. It makes sense to us humans reading it because we can interpret the numbers “naturally”, and it works from a technical perspective because it translates into bit-mapping. This is how PBL and some other lists are able to manage so many IPs. The IP range 65.55.0.0/16 lists any IP that matches 65.55.xx.xx; this is 65,536 IP addresses. They all fall into a logical range.

The number of IPs that falls within a CIDR range is evaluated as 2^(32-n) where n is the CIDR range (the number after the slash). So, a /24 (pronounced slash 24) is 2^(32-24) = 2^8 = 256 IPs, a /12 is 4096 IPs, and so forth. The larger the CIDR range number n, the smaller the range of IPs it covers. To newbies, this is counterintuitive and takes a bit of time to wrap your head around it but after a while you pick up the lingo. The smallest IP range is a /32 (1 IP) whereas the largest is a /1 (every single IP).

Click to read Part 1 and Part 3.

Written by Terry Zink, Program Manager

Digging Through the Problem of IPv6 and Email – Part 1

Recently, a couple of anti-spam (or at least email security related) bloggers have written some articles about IPv6 and the challenges that the email industry faces regarding it. John Levine, who has written numerous RFCs and a couple of books about spam fighting, writes the following in his article A Politically Incorrect Guide to IPv6, part III:

We will eventually figure out both how people use IPv6 addresses for mail, and how to manage and publish v6 reputation data (I’ve been doing some experiments, which I’ll blog about when I have enough results), but until then, running a mail server on v6 will be a lot harder than running one on v4. And since you’ll be able to handle all the real mail on v4, why bother?

We will eventually figure out both how people use IPv6 addresses for mail, and how to manage and publish v6 reputation data (I’ve been doing some experiments, which I’ll blog about when I have enough results), but until then, running a mail server on v6 will be a lot harder than running one on v4. And since you’ll be able to handle all the real mail on v4, why bother?

Barry Leiba, another email security writer, writes the following on CircleID on an article entitled IP Blocklists, Email, and IPv6:

John Levine has one approach: leave the email system on IPv4 for the foreseeable future. Even, John points out, when many other services, customer endpoints, mobile and household devices, and the like have been — have to have been — switched to IPv6, we can still run the Internet email infrastructure on IPv4 for a long time, leaving the IP blocklists with v4 addresses, and a system that we’re already managing fine with.

John Levine has one approach: leave the email system on IPv4 for the foreseeable future. Even, John points out, when many other services, customer endpoints, mobile and household devices, and the like have been — have to have been — switched to IPv6, we can still run the Internet email infrastructure on IPv4 for a long time, leaving the IP blocklists with v4 addresses, and a system that we’re already managing fine with.

Of course, some day, we’ll want to completely get rid of IPv4 on the Internet, and by then we’ll need to have figured out a replacement for the IP blocklist mechanism. But John’s right that that won’t be happening for many years yet, and he makes a good case for saying that we don’t have to worry about it.

Both writers are saying the same thing, and I have been on discussion threads where the consensus was similar: there is no agreement on how to handle IPv6 over email at least in the short term, but eventually it will probably have to be figured out (there are some believe mail will never move to IPv6 vs some who think that it will have to go there one of these days). In the meantime, just use IPv4 to send mail.

To expand a bit on what both writers are saying, the biggest reason why no mail providers are particularly thrilled about using IPv6 to handle email is because there is no way at the moment to deal with the problem of abuse. Today, spammers make extensive use of botnets. Each day, they compromise new machines and start using them to spew out spam. Each of these bots use different IP addresses, and the IP addresses change all of the time. I haven’t done an analysis in a while, but if you had 10,000 IP addresses today that are sending out spam, then tomorrow there would be 10,000 again but at least 9700 of them would be different IP addresses than were there the previous day.

The reason that there is so much rotation in IP addresses is because spam filters today make use of IP blocklists. When a blocklist service detects that an IP is sending spam, it adds it to the blocklist and rejects all mail from it. There are exceptions to this rule such as a legitimate IP that sends a majority of good mail (such as a Hotmail or Gmail IP address), but in general, mail servers reject all mail from blocklisted IPs. The reason they do this is the following:

  1. 90% of all email flowing across the Internet (not including internal mail within an organization) is spam. If a sending IP is on a blocklist, a mail server can reject it in the SMTP transaction and save on all of the processing costs associated with accepting the message and filtering it in the content filter. Many mail servers these days would topple over and crash because they could not keep up with the load if they had to handle all of the mail coming from blocklisted IPs since it would increase the number of total messages to deal with by a factor of 10.
  2. Spam filters get slightly better antispam metrics by using IP blocklists. Content filters are pretty good today, but rejecting 100% of mail from a spamming IP address means that there is no possibility of a false negative from that IP address. By contrast, if a content filter does not use an IP blocklist, the content filter has to learn to recognize the spam coming from that IP address, update the filter and then replicate out the changes. This is almost always slower than pulling down a blocklist and then using it as the first line of defense. Without an IP blocklist, a spam filter might be expected to filter between 80% and 99% of the mail coming from a blocklisted IP. While many spam filters get pretty close to that 99% range, it’s still not 100%.

Those are the two primary reasons to use IP blocklists. They are essential in blocking spam. Next up, the question is how blocklists are populated, and I’m going to leave that aside because there are resources elsewhere on how to deal with that. Blocklist operators publish their lists in two ways:

  1. They list individual IP addresses of all the servers that are sending mail, one by one.
  2. They make use of CIDR notation. CIDR notation, or Classless Internet Domain Routing, is basically a way to group large blocks of IP addresses. In IP blocklists, a provider would list a larger group of IP addresses in CIDR notation in order to save on space in the file (they don’t have to list them one by one). For example, the XBL is about 7 million entries (lines of text) and is around 100 megs in size. By contrast, the PBL contains 200,000 lines of text (without exceptions in ! notation) and is 6 megs. However, the PBL is represented mostly in CIDR notation. If all of these ranges are expanded, it is over 650 million individual IP addresses. That’s a whole heck of a lot more IPs in the PBL for a whole lot less file size space.

In terms of effectiveness, we run XBL in front of PBL and XBL blocks about 4 times as much mail as PBL (I don’t know how many would be blocked if we ran them in reverse). The XBL is better at catching individual bots that are sending out spam but are not listed anywhere (they are new IPs) whereas the PBL is better at pre-emptively catching mail servers that should never send out spam (probable bots but it doesn’t matter because they shouldn’t be sending mail anyhow). They are designed to be used in tandem. However, if we had to list every single PBL IP singly instead of compressing it into CIDR ranges, and if we use about the same ratio of 7 million IPs ~ 100 megs, then the PBL would be 9.4 gigs in total size. 9.4 gigs is a large file size. It isn’t completely unmanageable but it goes from being a minor inconvenience to being a major one. It takes a long time to download/upload/process a 9.4 gig file. It’s also far easier to store the file entries in a database if it is only 500,000 entries (or even 7 million) vs 650 million of them. Databases that large start to run into the problem of scale.

The PBL and XBL are prime examples of why different styles of IP blocklists are required. The PBL lists 650 million IPs and we still have over 7 million IPs on the XBL that aren’t on the PBL. Clearly, spamming bots can and do move around such that they are not listed on the lists that have large swaths listed. Bots are very good at hiding in places that are not called out and blocked yet. If they could not do this they would not be in business, and spammers are still in business. The fact is that given enough space to hide, spammers will hide in that space. The problem that we in the industry face is that as soon as we find a hiding space, we can block it for a bit but the spammer will vacate it, relocate elsewhere and continue to spam.

And therein is the problem of IPv6. An IPv4 IP address consists of 4 octets, and each octet is a number running from 0-255. This means that there are 256 x 256 x 256 x 256 possible IP addresses, which is 4.2 billion possible IP addresses. In reality, there are far less than this because there are lots of ranges of IPs that are reserved and not for public consumption. Still, using our formula from above, if you had to list every single IP address singly in a file, then the size of the file would be 61 gigs. 61 gigs is a very large file size and there are very few pieces of hardware that can handle that size of file in memory (whether you are doing IP blocklist look ups in rbldnsd or some other in-memory solution on-the-box). Processing the file and cleaning it up would take a very long time; you simply couldn’t do it in real time where IP blocklists need to be updated frequently (once per hour at a bare minimum).

Click to read Part 2 and Part 3.

Written by Terry Zink, Program Manager