All effective spam filters use DNS blacklists or blocklists, known as DNSBLs. They provide an efficient way to publish sets of IP addresses from which the publisher recommends that mail systems not accept mail. A well run DNSBL can be very effective; the Spamhaus lists typically catch upwards of 80% of incoming spam with a very low error rate.
DNSBLs take advantage of the existing DNS infrastructure to do fast, efficient lookups. A DNS lookup typically goes through three computers, like this:
The client, usually the mail server, asks a nearby DNS cache for the DNSBL entry for the IP address in question. If the cache already has a copy of the entry, it returns it immediately, otherwise it fetches a copy from the DNSBL’s DNS server and returns it. DNSBL lookups, like most other kinds of DNS lookups, tend to be fairly repetitive, since the same IP addresses tend to send multiple messages, so the local cache handles the bulk of the work, limiting the load on the remote server. DNS caches are an essential part of making DNSBLs work, Since the remote server (or typically a group of remote servers) has to handle all the requests for the DNSBL not handled by caches. Unfortunately, caches don’t work for IPv6 DNSBLs.
The reason is the vastly large IPv6 address space. IPv4 addressees are 32 bits long, allowing 4 billion addresses. That seems like (and is) a lot, but it’s few enough that all the addresses will be handed out by sometime next year, and any given network has only a limited supply of them. This means that a single host usually has a single IPv4 address, or at most a few hundred addresses. IPv6 addresses are much longer, 128 bits long. They are so long that where as in IPv4, an ISP usually allocates a single IP address to each customer, ISPs will probably allocate a /64 of IPv6 space to each customer, that is, a range of addresses 64 bits long. While there are sensible technical reasons to do this, it also has the unfortunate effect that a computer can switch to a new IP address each time it sends a new message, and never reuse an address. (As a rough approximation, if you sent a billion messages a second, each with its own address, it would take about a thousand years to use all the addresses in a /64.)
Blocking addresses one at a time isn’t going to work if a bad guy can pick a new address for each message. The obvious countermeasure is to put ranges of addresses into the DNSBL. That’s already standard practice in IPv4 DNSBLs, where if a bad guy controls a range of addresses, the BL lists the whole range. But the problem with listing IPv6 ranges is that the ranges are so vast that they risk overloading DNS servers and caches. If every spam comes from a different IP address, every DNSBL lookup will require the DNS cache to query the DNSBL server since the answer won’t be in the cache. This will overload the DNSBL servers. Worse, since DNS caches tend to keep the most recent answers around in preference to older ones, the flood of DNSBL data will force all of the other DNS info out of the cache as well. On most systems, DNSBLs use the same cache as all other DNS queries, so it will also increase the load on every other DNS server, re-fetching answers that were flushed out of the cache. Even if the DNSBL servers use a single DNS wildcard record to cover a large range of DNSBL entries, that doesn’t help, because DNS caches can’t tell that a response was created from a wildcard, and so keep a separate entry for each response.
I see a few possible responses to this situation. One is to switch from DNSBLs to DNS whitelists. The number of legitimate hosts sending e-mail is surprisingly small, probably on the order of 100,000 in the world. Even though the number of whitelist entries is small, that still doesn’t solve the DNS cache problem, since each failed request potentially takes up a cache slot as well, to remember not to retry the request.
The second is to change the DNS to handle DNSBLs with ranges more efficiently. It turns out that DNSSEC, the cryptographic security add-on to the DNS which is finally starting to see broad use does most of this already. In particular, if a DNS query is satisfied by a wildcard, the DNSSEC information that is sent along with the response identifies the wildcard and the range of queries that that the wildcard can answer. As far as I know caches don’t use this information to do subsequent wildcard responses themselves, but they could do so without any changes to the DNS or DNSSEC. Also, most DNSBLs and DNSWLs are served using a package called rbldnsd which doesn’t support DNSSEC and, due to its internal structure, would be hard to modify to do so.
Another is to modify the way that IPv6 DNSBLs work. On the ASRG list we’ve been discussing some possible changes that would improve cache behavior by telling telling query clients what the granularity of BL entries is, so they can do one query per entry rather than one query per IP address.
The last, is that for the most part mail systems simply won’t use IPv6 addresses, since all the mail that anyone wants will continue to be sent using IPv4. I will blog about that in a few days.
Written by John Levine, Author, Consultant & Speaker