Same host detection through IP-ID

This page explains one of the privacy risks related to the identification field in the IPv4 header. These issues are not new information. Shortcomings of the field were known in the '90s and as a result a few improvements were included in the IPv6 design.

The field is mandatory in IPv4 and thus poses a risk regardless of whether you have any need for the field. The field is only 16 bits long in IPv4, which has proven to be insufficient.

In IPv6 the field is optional and on most IPv6 packets it isn't used. Moreover the size of the field has been increased to 32 bits, which avoids many of the problems.

How IP-ID can detect IP addresses assigned to the same host

Some networking stacks use a counter to generate IP-ID values. If the host has multiple IP addresses and use the same counter for them it is possible to inspect traffic from these IP addresses and see that they were generated by a common counter.

Linux uses an array of counters. When a IP-ID value is needed a hash value is computed to choose an index in this array. The hash input includes the source and destination IP address. This reduces the risk of using the same counter for unrelated flows, but it doesn't eliminate the possibility. Older Linux versions used 1024 counters. Newer Linux versions use up to 262144 counters depending on available memory.

The proof-of-concept provided here can ping a pair of IP addresses using different source addresses looking for a pair of source addresses which gets to use the same IP-ID counter. Due to the birthday paradox the number of IP addresses needed is only the square root of the number of counters. Thus you would need around 32 IP addresses when targeting an older Linux version and around 512 when targeting a newer Linux version. For more reliable detection I opted for twice that number.

To use the tool you need a /22 IPv4 address range. The code in the repository has hardcoded the client IP range as 100.100.0.0/22. But you only need to edit a few places in the code to use it with a public IPv4 range.

Example usage

The source can be downloaded through this link or by running the command:

hg clone https://v6tools.kasperd.dk/same-host/

Once you have the source you can run it with two IP addresses as argument:

# ./same-host.py 172.19.0.2 172.19.0.3
2034
2034 2032
2032 2032
89
2032 2032
2032 2032
67
Evaluating
100.100.1.182 > 172.19.0.3
100.100.0.147 > 172.19.0.2
Evaluating
100.100.2.66 > 172.19.0.3
100.100.1.143 > 172.19.0.2
Evaluating
100.100.3.174 > 172.19.0.2
100.100.3.63 > 172.19.0.3
Evaluating
100.100.2.128 > 172.19.0.2
100.100.2.128 > 172.19.0.3
!!!!!! Found shared IPID counter between 172.19.0.2 and 172.19.0.3 !!!!!!!
Evaluating
100.100.1.203 > 172.19.0.3
100.100.1.245 > 172.19.0.2
Evaluating
100.100.2.11 > 172.19.0.2
100.100.2.248 > 172.19.0.3
Evaluating
100.100.3.241 > 172.19.0.3
100.100.0.49 > 172.19.0.2
Evaluating
100.100.0.97 > 172.19.0.3
100.100.0.230 > 172.19.0.2
Evaluating
100.100.1.250 > 172.19.0.3
100.100.1.50 > 172.19.0.2
Evaluating
100.100.0.224 > 172.19.0.2
100.100.1.186 > 172.19.0.3
!!!!!! Found shared IPID counter between 172.19.0.2 and 172.19.0.3 !!!!!!!
Evaluating
100.100.0.66 > 172.19.0.2
100.100.2.207 > 172.19.0.3
Evaluating
100.100.2.87 > 172.19.0.3
100.100.3.238 > 172.19.0.2
Evaluating
100.100.0.246 > 172.19.0.2
100.100.2.126 > 172.19.0.3
Evaluating
100.100.2.194 > 172.19.0.2
100.100.3.242 > 172.19.0.3

The proof-of-concept code produces somewhat verbose output with information about the steps it is taking. The relevant output is the line printed when IP addresses have been found to be using the same counter:

Evaluating
100.100.0.224 > 172.19.0.2
100.100.1.186 > 172.19.0.3
!!!!!! Found shared IPID counter between 172.19.0.2 and 172.19.0.3 !!!!!!!

This tells us that 172.19.0.2 and 172.19.0.3 are sharing an IP-ID counter, so they must be pointing to the same host. The output also shows which two client IP addresses were being used to achieve the same counter.

Several other sets of IP addresses were evaluated along the way when some of the counters happened to have nearby values. But the verification concluded that they were not the same counter after all.

How does this impact privacy

You might be running two applications on one machine which communicate with servers with an expectation that the servers cannot tell that both applications are running on the same host. This could for example be two separate web browsers.

If you open a web page in each browser, those pages could collude to load resources from several different IP addresses in order to detect your host using the same IP-ID counter for some of those resources.

Another scenario would be if you are running one website using your regular IP address and a separate website using an IP address you got through a tunnel or VPN. You might not want clients to know that both IP addresses are hosted on the same machine, but by inspecting IP-ID counters they could find out.

What does Linux do about this?

Linux has taken two different approaches to address this issue on IPv4 and IPv6.

Linux has stopped using counters to assign IP-ID values on IPv6. Instead IP-ID values are generated using a random number generator. This increases the risk of colliding IP-ID values, however due to the field being 32 bits on IPv6 such collisions will still happen less frequently than on IPv4.

On IPv4 the number of counters has been increased from 1024 to as much as 262144 on hosts with lots of memory. However as demonstrated by this proof-of-concept, that is insufficient to prevent a targeted attacker from arranging two flows using the same counter.

Solutions

Workarounds

Non-solutions

How severe is this problem?

The issue is not very severe. It is however more severe than some of the excuses sometimes made up for not deploying IPv6. So this page is primarily intended to be used to counter flawed arguments in favor of IPv4. What's demonstrated by this proof-of-concept is one inherent problem in IPv4 that's mostly fixed by IPv6.