It is 2022 and I am still quite surprised at the number of people with an incorrect appreciation on why portfast is good for hypervisor hosts. So, a week ago (26-Feb-2022), I posted a poll on LinkedIn, to cast a wider net to see if my observation scales to a greater population. Indeed the common choice is “A”. The correct answer is “B”. Read on and I will explain. For those who voted C and left a comment, thank you : ) I hope this post helps.
Let’s start by appreciating a little more on the Network side of the house. I believe most will have experienced some sort of network meltdown that was caused by broadcast storms. The simplest of sort is due to loops being created in ethernet networks. For a network to defend itself from such situations, Spanning Tree Protocol (STP) was introduced.
The basics of Spanning Tree Protocol
Essentially, when a switch port is participating in STP, it will always go through a few states from Listening > Learning > Forwarding / Blocking. I’m over simplifying here, but the first two stages are essential. That is when something has just been connected to the port, and the switch will determine if that connection introduces a loop or not. It needs to spend time to figure out by studying the frames that arrive. As a result, during the Listening & Learning states, any ethernet frames that arrive from the new connection will be discarded. Once STP has finally determined that there are no loops, as it should be for a host connection, the port will enter the Forwarding state. From this point on, all packages from the host will be forwarded on. If a loop is detected, the port will remain in the Blocking state.
Let me re-iterate this part from another angle. When you first connect up a port from host to switch, the switch port will NOT FORWARD TRAFFIC. You will see the link light is up, but nothing will flow, pings will time out. It’s a classic “The lights are on, but no one is home” observation. This happens when the switch port is going through Listening & Learning states. Until the switch is happy that no loop has been introduced, it will then allow traffic to pass (forwarding). Your pings will stat working from this point.
Typically, end devices want to be on the network as soon as the link is up. We can allow this by enabling Portfast on the particular switch port. This tells the port to totally skip the Listening & Learning states and go direct to Forwarding. So almost immediately after the link light is on, the port is already forwarding packets. That’s great for hosts, but the risk that is introduced here is, what if it’s not a host that’s connected, and a genuine loop is formed? You can understand why some network engineers are appropriately hesitant about enabling Portfast.
That should cover the basics on what is STP, Portfast and how it affects a single port traffic. Let’s take this into a teamed network scenario, where the truth gets uncovered.
Defining our ESXi test setup
At this juncture, let’s keep the scenario that the switch ports have STP enabled, and Portfast disabled. Meaning, all ports are participating in STP, and you’ll have that 15 seconds wait time to transition from Link-is-up to Traffic-is-flowing. (15 seconds is just to illustrate the example, some will take more time, some less.)
Let’s start with ESXi which most of us are familiar with. We have a vanilla install with 10 running Windows VM. The host is configured with the default vSwitch0, two 10G uplinks, single vmk0, and default standard vSwitch settings: –
- Load balancing – Route based on originating port ID
- Network failure detection – Link status only
- Notify switches – Yes
- Failback – Yes
- Initial state – both uplinks are disconnected; you ping the host from your laptop and gets a ping time out, of course.
- Connect only one of the two cables, you’ll see link light is on, but ping doesn’t respond till at least 15 seconds later.
- Connect the other cable, there are two possible scenarios here, either ping continues, or it fails again for at least 15 seconds, and then it recovers. (interesting?)
Let’s dive into option A – NIC failover is faster
Now let’s address first why option A in the poll is incorrect, simply because portfast is irrelevant during a NIC failover event. I will explain.
Let’s expand further to illustrate what is considered as NIC failover. Take our current state of the ESXi host, both NICs are already connected and it has been more than 15 seconds. Pings are working fine, our 10 Windows VMs are fully functional and have no network issue.
Let’s say with the Load balancing policy we have, ESXi has placed 4 VMs + vmk0 on uplink A, and 6 VMs on uplink B. Both links are up for more than 15 seconds, it means with or without portfast, both switchports are already in the forwarding state.
If we are to pull out uplink A, ESXi will detect that the Link status is down, and so flips the 4 VMs and vmk0 over to uplink B. This failover occurs independent of portfast. It is ESXi that does the relocation of the vNIC MAC from uplink A to uplink B. There’s no change in link status for uplink B, it was already online since the beginning. As such, how fast the failover happens, has nothing to do with portfast. It is simply just ESXi doing that it is supposed to do.
Got it?
How about option B – NIC failback is clean?
This is the interesting, and rather unintuitive part. Option B is the right answer, and I’m glad there a few of you who got the right answer. Let’s continue with the illustration to explain why this is the right answer.
Back to the scenario, where portfast is still not configured on the switch ports.
We still have uplink A disconnected, all 10 VMs and vmk0 are happily using uplink B.
Let’s reconnect uplink A.
Link status is Up.
ESXi sees the Link Up status, and because of the vSwitch configuration, immediately does a failback of the 4 VMs and vmk0 to uplink A.
Uh oh, those 5 entities are now offline, ping is timing out. Why?
Simply because the switch port is taking 15 seconds to go through the Listening & Learning states. During these states, all network frames are dropped.
Mind you though, the other 6 VMs that started on uplink B, remains on uplink B and are still good. At this point, you’ll see a weird situation, my host is up, one of the 2 links are up, but 50% of my workloads are off the network.
15 seconds later, switch port for uplink A goes into the Forwarding state, and everything is able to communicate again.
Did you notice the 15 seconds of network blackhole? This is the part where failback happens but it’s not clean, because it results in a network outage for some workloads.
That happens because the switch port did not have portfast enabled, blocks traffic for 15 seconds, and ESXi is really eager to failback, too soon.
This is a classic situation of both ends are not configured in a fully compatible way.
Digressing to your experience
Has this happened to you before, but you can’t explain why?
When the network engineer is doing a planned switch maintenance, some where along the way some of the VMs lose network connectivity, and some don’t, but suddenly it recovers. Network team claims nothing was wrong, nothing was changed, and their end is healthy. You, the system administrator also did not change anything, and there are no errors per se on your end. Tickets were opened with Cisco and VMware, and each side confirms nothing is out of the ordinary.
Classic case of silo’d investigation. If we had $1 for each of these…
The solution
So, this is truly an integration issue. I’d call it next level up, since it is beyond just getting the network up. You want failure handling and recovery to be smooth as well.
There are two ways to resolve this, to avoid the 15 seconds of network blackhole.
- Enable portfast on every switch port that has a connected host
- Disable failback on ESXi vSwitch setting
My preference is option 1, to enable portfast. No, it’s not because I’m a systems guy and rather leave the work to the network engineer. The reason is that with portfast, a link becomes usable much sooner. The workloads that us, system and network are tasked to keep online and to operate well, gets to enjoy the full benefits of multiple uplinks sooner, and more.
The network engineer may be very hesitant and get sleepless nights because the network is now vulnerable to loops. The more experienced network engineer will sleep better, because he knows he can enable BPDU Guard on his Cisco switch. With that, even with portfast enabled, if a loop does form BPDU Guard will be able to quickly down the port to protect the network.
Now, if we are in a situation where there’s a really good reason not to enable portfast, the alternative is to have the ESXi vSwitch setting, Failback – No. In that situation, although the link state is up, because it won’t failback, then nothing gets sent down the blackhole. There is a gap though, let’s say at the point where uplink A is restored, a VM is powered up. It may get assigned by ESXi to uplink A, then again it will be blackholed for a period of time. Another reason why enabling portfast is my preferred option.
Closing off
The purpose of this post is to share my experience on the significance of using portfast. It is something I knew since my early days of designing vSphere that must be enabled. I never truly appreciated why, until in 2015 I saw it for myself.
I hope this is post helps both network and system engineers understand both sides of the solution. A fundamental principle I believe in so that the world can be a better place. : )
In case you may be wondering, how can we validate that portfast is enabled and in effect, I hope an earlier post of mine helps you out. That’s right, no matter what, the proof is in the pudding, never assume, but always test and validate every single port.