MTU controls how much data fits into a single network packet. It affects throughput, fragmentation, CPU overhead, and how predictable your network behaves under load. If you’re running storage replication, virtualization, or iSCSI over 10GbE, MTU is one of those settings that’s easy to leave at default and forget about – until something doesn’t perform the way you expect.
Most admins start thinking about MTU after an iperf test on a 10GbE link comes back well below line rate. That’s usually when someone suggests jumbo frames. Sometimes it helps. Often, the real problem is somewhere else.
What is MTU?
MTU (Maximum Transmission Unit) is the largest payload a single Ethernet frame can carry without fragmentation. The standard default is 1500 bytes. When people say “jumbo frames,” they mean an MTU of 9000 bytes – six times the payload per packet.
One thing to keep straight: MTU is not the total frame size. A standard Ethernet frame includes 14 bytes of header (source/destination MAC addresses and EtherType) plus the payload. So an MTU of 1500 means a total frame of 1514 bytes on the wire. MTU only refers to the payload portion.
There’s also a distinction between Layer 2 frame size and Layer 3 IP MTU. Switches enforce frame limits at Layer 2, while hosts and routers work with IP MTU at Layer 3. Both need to agree if jumbo frames are going to work. A mismatch at any hop in the path causes fragmentation or drops.
How MTU works in practice
When your application sends data, it passes through the transport layer (TCP or UDP), gets wrapped in an IP header at Layer 3, and is encapsulated in an Ethernet frame at Layer 2. MTU sets the ceiling on that Layer 2 payload. If the IP packet exceeds the link MTU, something has to give – either the packet gets fragmented, or it gets dropped.
TCP handles this more gracefully than UDP. TCP uses MSS (Maximum Segment Size) during the handshake, so both sides agree on a packet size upfront. If one side is at 9000 and the other at 1500, TCP will negotiate down to 1500. UDP has no such negotiation – the sender just fires packets, and if they’re too large, they either get fragmented or silently dropped, depending on the DF (Don’t Fragment) bit.
There’s also Path MTU Discovery (PMTUD), which is supposed to detect the smallest MTU along the entire path. It works by sending packets with the DF bit set and waiting for ICMP “Fragmentation Needed” messages from routers that can’t forward them. The problem is that many firewalls block ICMP entirely, creating what’s called a PMTUD black hole – the sender never learns the packet is too big, keeps retransmitting, and the connection stalls. This is one of the more frustrating network issues to debug because ping works fine (small packets), but bulk transfers hang.
Standard MTU values and jumbo frames
1500 bytes has been the Ethernet default for decades. It stuck around because it works reliably with virtually all hardware, and it’s a reasonable balance between efficiency and compatibility. PPPoE connections typically use 1492 bytes due to the 8-byte encapsulation overhead.
Jumbo frames (MTU 9000) are common in storage networks, iSCSI deployments, and HPC clusters. The appeal is straightforward: more data per packet means fewer packets to process. At MTU 1500, transferring 20 GB requires roughly 14.3 million packets. At MTU 9000, that drops to about 2.4 million – an 83% reduction in packet count.
One detail that trips people up: enabling jumbo frames on a switch only means the switch will forward larger frames. It doesn’t change the MTU on connected hosts. You have to configure both sides. And in Windows, the network adapter might show 9014 bytes while Linux shows 9000 – the difference is just how headers are counted in the driver UI. They’re functionally the same.

MTU and performance: the real numbers
The theoretical overhead saving from jumping to MTU 9000 is about 3-4% – you go from roughly 95% payload efficiency to 99%. That’s real, but it’s not dramatic. The bigger win is in packet rate reduction: fewer packets means fewer interrupts, less buffer churn, and less CPU time spent in the network stack.
In practice, the performance gain depends heavily on your hardware and workload. Community benchmarks on various storage forums and our own experience show a consistent pattern: on 10GbE with modern CPUs and NIC offloading (TCP Segmentation Offload, interrupt coalescing), the difference between MTU 1500 and 9000 is often 5-15% for large sequential transfers, and close to zero for mixed or interactive traffic.
For context: at MTU 1500 on 10GbE, you’re pushing around 810,000 packets per second at line rate. Modern CPUs and NIC offload engines handle that without breaking a sweat. Jumbo frames drop that to about 135,000 pps – a reduction that mattered more ten years ago when CPUs were the bottleneck.
Bottom line: jumbo frames are worth it for sustained, large-block transfers on dedicated storage networks – especially if your hardware is older or CPU-constrained. For general-purpose networks, web traffic, email, or remote desktops, MTU 1500 is fine.
Also, don’t assume MTU is always the problem when iperf numbers are low. TCP window size, the number of parallel streams (the -P flag in iperf), CPU saturation, and OS network stack tuning are more common culprits. Test those first.
MTU in virtualization, storage, and cloud
Virtualization adds layers where MTU can go wrong. Packets travel from a physical NIC through a virtual switch, into a VM’s virtual NIC. A mismatch at any layer – hypervisor vSwitch, physical switch, or VM NIC – means fragmentation or drops. In mixed environments where you can’t guarantee end-to-end jumbo support, staying at 1500 is the pragmatic choice.
Storage replication is the strongest use case for jumbo frames. SAN traffic, iSCSI sessions, and block-level replication involve continuous, high-volume, large-block transfers – exactly the workload profile where reduced packet overhead matters. If you’re running a dedicated 10GbE (or faster) replication network where you control every hop, MTU 9000 makes sense. The key word is “dedicated” – don’t mix jumbo frame traffic with general LAN traffic on the same VLAN.
Cloud and hybrid environments complicate things. AWS supports MTU 9001 within a single VPC, but VPN and inter-region connections drop back to 1500 (or lower). Azure defaults to 1500 for most networking. GCP uses 1460 over VPN tunnels. In hybrid setups, set your MTU to match the lowest link in the path. Getting this wrong is one of the most common causes of mysteriously slow VPN-connected replication.
Common MTU misconfigurations
MTU problems are frustrating because they’re subtle. Everything looks fine until you start pushing large packets, and then throughput tanks or connections stall under load.
The classic mistake: servers and switches are set to MTU 9000, but one intermediate device – a firewall, a load balancer, a misconfigured VLAN trunk – supports only 1500. Large packets hit that device and either get fragmented (adding overhead and reassembly cost) or silently dropped. The result is unpredictable throughput and latency spikes, which is the opposite of what you were trying to achieve.
Some specific hardware can also cause problems. Certain Intel X710 NICs have shown increased CPU utilization and throughput instability when jumbo frames are enabled. If you see weird behavior after enabling 9000-byte MTU, the NIC driver is worth investigating before blaming the network.
PMTUD black holes (described above) are another common pain point. If your firewall blocks ICMP, PMTUD can’t do its job, and TCP connections hang during data transfer while small pings succeed. This is notoriously hard to diagnose. The fix is to allow ICMP type 3 (Destination Unreachable) through your firewall, or use MSS clamping as a workaround.
How to check and tune MTU size
Rule number one: MTU must be consistent across the entire path. Every NIC, switch, virtual interface, router, and firewall in the path needs the same value. Increasing MTU on the endpoints without checking every hop in between is how you create fragmentation problems.
Before touching MTU at all, verify that your throughput issue is actually MTU-related. Check CPU utilization, NIC offload settings, TCP window size, and your test methodology. MTU should not be the first thing you change.
Check current MTU
Windows: netsh interface ipv4 show subinterfaces
Linux: ip link show
Test maximum MTU along a path
Windows: ping -f -l 8000 <destination IP>
Linux: ping -M do -s 8000 <destination IP>
The -f (Windows) and -M do (Linux) flags set the Don’t Fragment bit. If the packet is too large for the path, you’ll get an error. Decrease the size until it passes to find your path MTU.
Tip: when setting switch MTU for jumbo frames, set it slightly higher than 9000 (e.g. 9198 or 9216). This gives headroom for any additional encapsulation headers (VLAN tags, VXLAN, etc.) and prevents edge-case fragmentation.
MTU in StarWind storage environments
StarWind Virtual SAN (VSAN) replicates data at the block level between nodes. This is sustained, high-volume traffic – exactly the workload where jumbo frames pay off. On a dedicated 10GbE replication network, setting MTU to 9000 reduces packet rate and lowers CPU overhead during synchronization.
The non-negotiable requirement is end-to-end consistency. Jumbo frames must be enabled on every component in the replication path: NICs, hypervisor virtual switches, physical switches, and VLAN interfaces. If one link in the chain doesn’t support 9000, you’re worse off than staying at 1500. In StarWind deployments, MTU configuration is part of the storage network design from the start – not an afterthought.
Conclusion
For general traffic, leave MTU at 1500. It’s reliable, universally compatible, and modern hardware handles the packet rate without issue. Jumbo frames belong on dedicated, controlled storage or replication networks where every hop is configured consistently and the workload is large sequential transfers.
If you do change MTU, verify every device in the path before and after. One mismatched hop is enough to create fragmentation that negates whatever performance you hoped to gain.
from StarWind Blog https://ift.tt/L5UimDz
via IFTTT
No comments:
Post a Comment