Cloudflare engineers have encountered significant challenges in expanding their use of soft-unicast functionality within the Linux networking stack, driven by complex routing and anycast configurations for redundancy. Attempts to bypass limitations using advanced socket options ultimately led back to a simpler proxy solution. The experience highlights the difficulties in customizing Linux for high-scale networking demands.
Cloudflare's network infrastructure relies on intricate routing and configurations that test the boundaries of the Linux networking stack. As detailed in a recent blog post by engineer Chris Branch, the company sought to enhance soft-unicast capabilities, which align with their heavy use of anycast to distribute redundancy across external networks.
The core problem arose with the Netfilter connection tracking module, known as conntrack, and the Linux socket subsystem during packet rewriting processes. Soft-unicast requires multiple processes to recognize the same connection, but Linux's design prevented effective packet rewriting. Initially, the team implemented a local proxy to handle this, though it introduced performance overhead.
To address this, engineers explored abusing the TCP_REPAIR socket option, typically used for migrating virtual machine network connections. This allowed them to fully describe and 'repair' the socket connection state. They paired it with TCP Fast Open, using a TFO cookie to bypass the standard handshake. Despite these innovations, lingering issues persisted, with an early demux mechanism proposed as a partial fix.
In the end, the complexity proved too high. The team opted for the more straightforward local proxy approach, which terminates TCP connections and redirects traffic to a local socket. This decision underscores that fully escaping the Linux networking stack remains a formidable challenge, even for a company like Cloudflare at the forefront of internet infrastructure.