As artificial intelligence infrastructure scales at breakneck speed, outdated assumptions about networking continue to circulate. Many of these myths stem from technologies designed for much smaller clusters, but the game has changed. Today’s AI systems are pushing into hundreds of thousands and soon, millions of GPUs. Old models simply don’t hold up.
Let’s take a closer look at the most persistent misconceptions about AI networking and why Ethernet has clearly established itself as the foundation for modern large-scale training and inference.
Myth #1: Ethernet Can’t Deliver High-Performance AI Networking
This one’s already been disproven. Ethernet is now the standard for AI at scale. Nearly all of the world’s largest GPU clusters built in the past year use Ethernet for scale-out networking.
Why? Because Ethernet now rivals and often outperforms alternatives like InfiniBand, while offering a stronger ecosystem, vendor diversity, and faster innovation. InfiniBand wasn’t designed for the extreme scale we see today; Ethernet is thriving with 51.2T switches in production and Broadcom’s new 102.4T Tomahawk 6 setting the pace. Massive clusters of 100K GPUs and beyond are already running on Ethernet.
Myth #2: You Need Separate Networks for Scale-Up and Scale-Out
That was true when GPU nodes were tiny. Legacy scale-up designs worked when you were connecting two or four GPUs. But today’s architectures often include 64, 128, or more GPUs within a single domain.
Using separate networks adds complexity and cost. Ethernet allows you to unify scale-up and scale-out on the same fabric, simplifying operations and enabling interface fungibility. To accelerate this convergence, we introduced the Scale-Up Ethernet (SUE) framework to the Open Compute Project, moving the industry toward a single AI networking standard.
Myth #3: Proprietary Interconnects and Exotic Optics Are Essential
Not anymore. Proprietary approaches may have fit older, fixed systems, but modern AI requires flexibility and openness.
Ethernet provides a broad set of choices: third-gen co-packaged optics (CPO), module-based retimed optics, linear drive optics, and long-reach passive copper. This flexibility lets you optimize for performance, power, and economics without being locked into a single path.
Myth #4: Proprietary NIC Features Are Required for AI Workloads
Some AI clusters lean on programmable, high-power NICs for features like congestion control. But often, that’s compensating for a weaker switching fabric.
Modern Ethernet switches, including Tomahawk 5 and 6, already embed advanced load balancing, telemetry, and resiliency — reducing cost and power draw while leaving more resources available for GPUs and XPUs. Looking ahead, NIC functions will increasingly integrate into XPUs themselves, reinforcing the strategy of simplifying rather than over-engineering.
Myth #5: Your Network Must Match Your GPU Vendor
There’s no reason to tie your network to your GPU supplier. The largest hyperscaler deployments worldwide are built on Ethernet.
Ethernet enables flatter, more efficient topologies, supports workload-specific tuning, and is fully vendor-neutral. With its standards-based ecosystem, AI clusters can scale independently of GPU/XPU choice-ensuring openness, efficiency, and long-term scalability.
The Takeaway:
Networking is no longer a side note; it’s a core driver of AI performance, efficiency, and growth. If your assumptions are rooted in five-year-old architectures, it’s time to update your playbook.
The reality is clear: the future of AI networking is Ethernet and that future is already here.
(This article has been adapted and modified from content on Broadcom.)