Spanning Tree Protocol

Enterprise growth results in the commissioning of multiple switches in order to support the interconnectivity of end systems and services required for daily operations. The interconnection of multiple switches however brings additional challenges that need to be addressed. Switches may be established as single point-to-point links via which end systems are able to forward frames to destinations located via other switches within the broadcast domain. The failure however of any point-to-point switch link results in the immediate isolation of the downstream switch and all end systems to which the link is connected. In order to resolve this issue, redundancy is highly recommended within any switching network.

Redundant links are therefore generally used on an Ethernet switching network to provide link backup and enhance network reliability. The use of redundant links, however, may produce loops that cause the communication quality to drastically deteriorate, and major interruptions to the communication service to occur.

One of the initial effects of redundant switching loops comes in the form of broadcast storms. This occurs when an end system attempts to discover a destination for which neither itself nor the switches along the switching path are aware of. A broadcast is therefore generated by the end system which is flooded by the receiving switch.

The flooding effect means that the frame is forwarded via all interfaces with exception to the interface on which the frame was received. In the example, Host A generates a frame, which is received by Switch B which is subsequently forwarded out of all other interfaces. An instance of the frame is received by the connected switches A and C, which in turn flood the frame out of all other interfaces. The continued flooding effect results in both Switch A and Switch C flooding instances of the frame from one switch to the other, which in turn is flooded back to Switch B, and thus the cycle continues. In addition, the repeated flooding effect results in multiple instances of the frame being received by end stations, effectively causing interrupts and extreme switch performance degradation.

Switches must maintain records of the path via which a destination is reachable. This is identified through association of the source MAC address of a frame with the interface on which the frame was received. Only one instance of a MAC address can be stored within the MAC address table of a switch, and where a second instance of the MAC address is received, the more recent information takes precedence.

In the example, Switch B updates the MAC address table with the MAC address of Host A and associates this source with interface G0/0/3, the port interface on which the frame was received. As frames are uncontrollably flooded within the switching network, a frame is again received with the same source MAC address as Host A, however this time the frame is received on interface G0/0/2. Switch B must therefore assume that the host that was originally reachable via interface G0/0/3 is now reachable via G0/0/2, and will update the MAC address table accordingly. The result of this process leads to MAC instability and continues to occur endlessly between both the switch port interfaces connecting to Switch A and Switch C since frames are flooded in both directions as part of the broadcast storm effect.

The challenge for the switching network lies in the ability to maintain switching redundancy to avoid isolation of end systems in the event of switch system or link failure, and the capability to avoid the damaging effects of switching loops within a switching topology which implements redundancy. The resulting solution for many years has been to implement the spanning tree protocol (STP) in order to prevent the effects of switching loops. Spanning tree works on the principle that redundant links be logically disabled to provide a loop free topology, whilst being able to dynamically enable secondary links in the event that a failure along the primary switching path occurs, thereby fulfilling the requirement for network redundancy within a loop free topology. The switching devices running STP discover loops on the network by exchanging information with one another, and block certain interfaces to cut off loops. STP has continued to be an important protocol for the LAN for over 20 years.

The removal of any potential for loops serves as the primary goal of spanning tree for which an inverted tree type architecture is formed. At the base of this logical tree is the root bridge/switch. The root bridge represents the logical center but not necessarily the physical center of the STP-capable network. The designated root bridge is capable of changing dynamically with the network topology, as in the event where the existing root bridge fails to continue to operate as the root bridge. Non-root bridges are considered to be downstream from the root bridge and communication to non-root bridges flows from the root bridge towards all nonroot bridges. Only a single root bridge can exist in a converged STP-capable network at any one time.

Discovery of the root bridge for an STP network is a primary task performed in order to form the spanning tree. The STP protocol operates on the basis of election, through which the role of all switches is determined. A bridge ID is defined as the means by which the root bridge is discovered. This comprises of two parts, the first being a 16 bit bridge priority and the second, a 48 bit MAC address.

The device that is said to contain the highest priority (smallest bridge ID) is elected as the root bridge for the network. The bridge ID comparison takes into account initially the bridge priority, and where this priority value is unable to uniquely identify a root bridge, the MAC address is used as a tie breaker. The bridge ID can be manipulated through alteration to the bridge priority as a means of enabling a given switch to be elected as the root bridge, often in support of an optimized network design.

The spanning tree topology relies on the communication of specific information to determine the role and status of each switch in the network. A Bridge Protocol Data Unit (BPDU) facilitates communication within a spanning tree network. Two forms of BPDU are used within STP. A Configuration BPDU is initially created by the root and propagated downstream to ensure all non-root bridges remain aware of the status of the spanning tree topology and importantly, the root bridge. The TCN BPDU is a second form of BPDU, which propagates information in the upstream direction towards the root and shall be introduced in more detail as part of the topology change process.

Bridge Protocol Data Units are not directly forwarded by switches, instead the information that is carried within a BPDU is often used to generate a switches own BPDU for transmission. A Configuration BPDU carries a number of parameters that are used by a bridge to determine primarily the presence of a root bridge and ensure that the root bridge remains the bridge with the highest priority. Each LAN segment is considered to have a designated switch that is responsible for the propagation of BPDU downstream to non-designated switches.

The Bridge ID field is used to determine the current designated switch from which BPDU are expected to be received. The BPDU is generated and forwarded by the root bridge based on a Hello timer, which is set to 2 seconds by default. As BPDU are received by downstream switches, a new BPDU is generated with locally defined parameters and forwarded to all non-designated switches for the LAN segment.

Another feature of the BPDU is the propagation of two parameters relating to path cost. The root path cost (RPC) is used to measure the cost of the path to the root bridge in order to determine the spanning tree shortest path, and thereby generate a loop free topology. When the bridge is the root bridge, the root path cost is 0.

The path cost (PC) is a value associated with the root port, which is the port on a downstream switch that connects to the LAN segment, on which a designated switch or root bridge resides. This value is used to generate the root path cost for the switch, by adding the path cost to the RPC value that is received from the designated switch in a LAN segment, to define a new root path cost value. This new root path cost value is carried in the BPDU of the designated switch and is used to represent the path cost to the root.

Huawei Sx7 series switches support a number of alternative path cost standards that can be implemented based on enterprise requirements, such as where a multi-vendor switching network may exist. The Huawei Sx7 series of switches use the 802.1t path cost standard by default, providing a stronger metric accuracy for path cost calculation.

A converged spanning tree network defines that each interface be assigned a specific port role. Port roles are used to define the behavior of port interfaces that participate within an active spanning tree topology. For the spanning tree protocol, three port roles of designated, root and alternate are defined.

The designated port is associated with a root bridge or a designated bridge of a LAN segment and defines the downstream path via which Configuration BPDU are forwarded. The root bridge is responsible for the generation of configuration BPDU to all downstream switches, and thus root bridge port interfaces always adopt the designated port role.

The root port identifies the port that offers the lowest cost path to the root, based on the root path cost. The example demonstrates the case where two possible paths exist back to the root, however only the port that offers the lowest root path cost is assigned as the root port. Where two or more ports offer equal root path costs, the decision of which port interface will be the root port is determined by comparing the bridge ID in the configuration BPDU that is received on each port.

Any port that is not assigned a designated or root port role is considered an alternate port, and is able to receive BPDU from the designated switch for the LAN segment for the purpose of monitoring the status of the redundant link, but will not process the received BPDU. The IEEE 802.1D-1990 standard for STP originally defined this port role as backup, however this was amended to become the alternate port role within the IEEE 802.1D-1998 standards revision

The port ID represents a final means for determining port roles alongside the bridge ID and root path cost mechanism. In scenarios where two or more ports offer a root path cost back to the root that is equal and for which the upstream switch is considered to have a bridge ID that is equal, primarily due to the upstream switch being the same switch for both paths, the port ID must be applied to determine the port roles.

The port ID is tied to each port and comprises of a port priority and a port number that associates with the port interface. The port priority is a value in the range of 0 to 240, assigned in increments of 16, and represented by a value of 128 by default. Where both port interfaces offer an equal port priority value, the unique port number is used to determine the port roles. The highest port identifier (the lowest port number) represents the port assigned as the root port, with the remaining port defaulting to an alternate port role

The root bridge is responsible for the generation of configuration BPDU based on a BPDU interval that is defined by a Hello timer. This Hello timer by default represents a period of 2 seconds. A converged spanning tree network must ensure that in the event of a failure within the network, which switches within the STP enabled network are made aware of the failure. A Max Age timer is associated with each BDPU and represents life span of a BPDU from the point of conception by the root bridge, and ultimately controls the validity period of a BDPU before it is considered obsolete. This MAX Age timer by default represents a period of 20 seconds.

Once a configuration BPDU is received from the root bridge, the downstream switch is considered to take approximately 1 second to generate a new BPDU, and propagate the generated BPDU downstream. In order to compensate for this time, a message age (MSG Age) value is applied to each BPDU to represent the offset between the MAX Age and the propagation delay, and for each switch this message age value is incremented by 1.

As BPDU are propagated from the root bridge to the downstream switches the MAX Age timer is refreshed. The MAX Age timer counts down and expires when the MAX Age value exceeds the value of the message age, to ensure that the lifetime of a BPDU is limited to the MAX Age, as defined by the root bridge. In the event that a BPDU is not received before the MAX Age timer expires, the switch will consider the BPDU information currently held as obsolete and assume an STP network failure has occurred.

The spanning tree convergence process is an automated procedure that initiates at the point of switch startup. All switches at startup assume the role of root bridge within the switching network. The default behavior of a root bridge is to assign a designated port role to all port interfaces to enable the forwarding of BPDU via all connected port interfaces. As BPDU are received by peering switches, the bridge ID will be compared to determine whether a better candidate for the role of root bridge exists. In the event that the received BPDU contains an inferior bridge ID with respect to the root ID, the receiving switch will continue to advertise its own configuration BPDU to the neighboring switch.

Where the BDPU is superior, the switch will acknowledge the presence of a better candidate for the role of root bridge, by ceasing to propagate BPDU in the direction from which the superior BPDU was received. The switch will also amend the root ID field of its BPDU to advertise the bridge ID of the root bridge candidate as the current new root bridge.

An elected root bridge, once established will generate configuration BPDU to all other non-root switches. The BPDU will carry a root path cost that will inform downstream switches of the cost to the root, to allow for the shortest path to be determined. The root path cost carried in the BPDU that is generated by the root bridge always has a value of 0. The receiving downstream switches will then add this cost to the path cost of the port interfaces on which the BPDU was received, and from which a switch is able to identify the root port.

In the case where equal root path costs exist on two or more LAN segments to the same upstream switch, the port ID is used to discover the port roles. Where an equal root path cost exists between two switches as in the given example, the bridge ID is used to determine which switch represents the designated switch for the LAN segment. Where the switch port is neither a root port nor designated port, the port role is assigned as alternate.

As part of the root bridge and port role establishment, each switch will progress through a number of port state transitions. Any port that is administratively disabled will be considered to be in the disabled state. Enabling of a port in the disabled state will see a state transition to the blocking state ①.

Any port considered to be in a blocking state is unable to forward any user traffic, but is capable of receiving BPDU frames. Any BPDU received on a port interface in the blocking state will not be used to populate the MAC address table of the switch, but instead to determine whether a transition to the listening state is necessary. The listening state enables communication of BPDU information, following negotiation of the port role in STP ②, but maintains restriction on the populating of the MAC address table with neighbor information. 

A transition to the blocking state from the listening or other states ③ may occur in the event that the port is changed to an alternate port role. The transition between listening to learning and learning to forwarding states ④ is greatly dependant on the forward delay timer, which exists to ensure that any propagation of BDPU information to all switches in the spanning tree topology is achievable before the state transition occurs.

The learning state maintains the restriction of user traffic forwarding to ensure prevention of any switching loops however allows for the population of the MAC address table throughout the spanning tree topology to ensure a stable switching network. Following a forward delay period, the forwarding state is reached. The disabled state is applicable at any time during the state transition period through manual intervention (i.e. the shutdown command) ⑤.

Events that cause a change in the established spanning tree topology may occur in a variety of ways, for which the spanning tree protocol must react to quickly reestablish a stable and loop free topology. The failure of the root bridge is a primary example of where re-convergence is necessary. Non-root switches rely on the intermittent pulse of BPDU from the root bridge to maintain their individual roles as non-root switches in the STP topology. In the event that the root bridge fails, the downstream switches will fail to receive a BPDU from the root bridge and as such will also cease to propagate any BPDU downstream. The MAX Age timer is typically reset to the set value (20 seconds by default) following the receipt of each BPDU downstream.

With the loss of any BPDU however, the MAX Age timer begins to count down the lifetime for the current BPDU information of each non-root switch, based on the (MAX Age – MSG Age) formula. At the point at which the MSG Age value is greater than the MAX Age timer value, the BPDU information received from the root becomes invalid, and the non-root switches begin to assume the role of root bridge. Configuration BPDU are again forwarded out of all active interfaces in a bid to discover a new root bridge. The failure of the root bridge invokes a recovery duration of approximately 50 seconds due to the Max Age + 2x Forward Delay convergence period.

In the case of an indirect link failure, a switch loses connection with the root bridge due to a failure of the port or media, or due possibly to manual disabling of the interface acting as the root port. The switch itself will become immediately aware of the failure, and since it only receives BPDU from the root in one direction, will assume immediate loss of the root bridge, and assert its position as the new root bridge.

From the example, switch B begins to forward BPDU to switch C to notify of the position of switch B as the new root bridge, however switch C continues to receive BPDU from the original root bridge and therefore ignores any BPDU from switch B. The alternate port will begin to age its state through the MAX Age timer, since the interface no longer receives BPDU containing the root ID of the root bridge.

Following the expiry of the MAX Age timer, switch C will change the port role of the alternate port to that of a designated port and proceed to forward BPDU from the root towards switch B, which will cause the switch to concede its assertion as the root bridge and converge its port interface to the role of root port. This represents a partial topology failure however due to the need to wait for a period equivalent to MAX Age + 2x forward delay, full recovery of the STP topology requires approximately 50 seconds.

A final scenario involving spanning tree convergence recovery occurs where multiple LAN segments are connected between two switch devices for which one is currently the active link while the other provides an alternate path to the root. Should an event occur that causes the switch that is receiving the BPDU to detect a loss of connection on its root port, such as in the event that a root port failure occurs, or a link failure occurs, for which the downstream switch is made immediately aware, the switch can instantly transition the alternate port. 

This will begin the transition through the listening, learning and forwarding states and achieve recovery within a 2x forward delay period. In the event of any failure, where the link that provides a better path is reactivated, the spanning tree topology must again re-converge in order to apply the optimal spanning tree topology.

In a converged spanning tree network, switches maintain filter databases, or MAC address tables to manage the propagation of frames through the spanning tree topology. The entries that provide an association between a MAC destination and the forwarding port interface are stored for a finite period of 300 seconds (5 minutes) by default. A change in the spanning tree topology however means that any existing MAC address table entries are likely to become invalid due to the alteration in the switching path, and therefore must be renewed.

The example demonstrates an existing spanning tree topology for which switch B has entries that allow Host A to be reached via interface Gigabit Ethernet 0/0/3 and Host B via interface Gigabit Ethernet 0/0/2. A failure is simulated on switch C for which the current root port has become inactive. This failure causes a recalculation of the spanning tree topology to begin and predictably the activation of the redundant link between switch C and switch B.

Following the re-convergence however, it is found that frames from Host A to Host B are failing to reach their destination. Since the MAC address table entries have yet to expire based on the 300 second rule, frames reaching switch B that are destined for Host B continue to be forwarded via port interface Gigabit Ethernet 0/0/2, and effectively become black holed as frames are forwarded towards the inactive port interface of switch C.

An additional mechanism must be introduced to handle the MAC entries timeout period issue that results in invalid path entries being maintained following spanning tree convergence. The process implemented is referred to as the Topology Change Notification (TCN) process, and introduces a new form of BPDU to the spanning tree protocol operation.

This new BPDU is referred to as the TCN BPDU and is distinguished from the original STP configuration BPDU through the setting of the BPDU type value to 128 (0x80). The function of the TCN BPDU is to inform the upstream root bridge of any change in the current topology, thereby allowing the root to send a notification within the configuration BPDU to all downstream switches, to reduce the timeout period for MAC address table entries to the equivalent of the forward delay timer, or 15 seconds by default.

The flags field of the configuration BPDU contains two fields for Topology Change (TC) and Topology Change Acknowledgement (TCA). Upon receiving a TCN BPDU, the root bridge will generate a BPDU with both the TC and TCA bits set, to respectively notify of the topology change and to inform the downstream switches that the root bridge has received the TCN BPDU, and therefore transmission of the TCN BPDU should cease.

The TCA bit shall remain active for a period equal to the Hello timer (2 seconds), following which configuration BPDU generated by the root bridge will maintain only the TC bit for a duration of (MAX Age + forward delay), or 35 seconds by default.

The effect of the TCN BPDU on the topology change process ensures that the root bridge is notified of any failure within the spanning tree topology, for which the root bridge is able to generate the necessary flags to flush the current MAC address table entries in each of the switches. The example demonstrates the results of the topology change process and the impact on the MAC address table. The entries pertaining to switch B have been flushed, and new updated entries have been discovered for which it is determined that Host B is now reachable via port interface Gigabit Ethernet 0/0/1.

Huawei Sx7 series switches to which the S5700 series model belongs, is capable of supporting three forms of spanning tree protocol. Using the stp mode command, a user is able to define the mode of STP that should be applied to an individual switch. The default STP mode for Sx7 series switches is MSTP, and therefore must be reconfigured before STP can be used.

As part of good switch design practice, it is recommended that the root bridge be manually defined. The positioning of the root bridge ensures that the optimal path flow of traffic within the enterprise network can be achieved through configuration of the bridge priority value for the spanning tree protocol. The stp priority [prioritycommand can be used to define the priority value, where priority refers to an integer value between 0 and 61440, assigned in increments of 4096. This allows for a total of 16 increments, with a default value of 32768. It is also possible to assign the root bridge for the spanning tree through the stp root primary command.

It has been understood that Huawei Sx7 series of switches support three forms of path cost standard in order to provide compatibility where required, however defaults to support the 802.1t path cost standard. The path cost standard can be adjusted for a given switch using the stp pathcost-standard {dot1d-1998 | dot1t |legacy } command, where dot1d-1998, dot1t and legacy refer to the path cost standards described earlier in this section.

In addition, the path cost of each interface can also be assigned manually to support a means of detailed manipulation of the stp path cost. This method of path cost manipulation should be used with great care however as the path cost standards are designed to implement the optimal spanning tree topology for a given switching network and manipulation of the stp cost may result in the formation of a sub-optimal spanning tree topology.

The command stp cost [cost] is used, for which the cost value should follow the range defined by the path cost standard. If a Huawei legacy standard is used, the path cost ranges from 1 to 200000. If the IEEE 802.1D standard is used, the path cost ranges from 1 to 65535. If the IEEE 802.1t standard is used, the path cost ranges from 1 to 200000000.

If the root switch on a network is incorrectly configured or attacked, it may receive a BPDU with a higher priority and thus the root switch becomes a non-root switch, which causes a change of the network topology. As a result, traffic may be switched from high-speed links to low-speed links, causing network congestion.

To address this problem, the switch provides the root protection function. The root protection function protects the role of the root switch by retaining the role of the designated port. When the port receives a BPDU with a higher priority, the port stops forwarding packets and turns to the listening state, but it still retains a designated port role. If the port does not receive any BPDU with a higher priority for a certain period, the port status is restored from the listening state.

The configured root protection is valid only when the port is the designated port and the port maintains the role. If a port is configured as an edge port, or if a command known as loop protection is enabled on the port, root protection cannot be enabled on the port.

Using the display stp command, the current STP configuration can be determined. A number of timers exist for managing the spanning tree convergence, including the hello timer, max age timer, and forward delay, for which the values displayed represent the default timer settings, and are recommended to be maintained.

The current bridge ID can be identified for a given switch through the CIST Bridge configuration, comprised of the bridge ID and MAC address of the switch. Statistics provide information regarding whether the switch has experienced topology changes, primarily through the TC or TCN received value along with the last occurrence as shown in the time since last TC entry. 

 
For individual interfaces on a switch it is possible to display this information via the display stp command to list all interfaces, or using the display stp interface <interface> command to define a specific interface. The state of the interface follows MSTP port states and therefore will display as either Discarding, Learning or Forwarding. Other valid information such as the port role and cost for the port are also displayed, along with any protection mechanisms applied.

SUMMARY

Following the failure of the root bridge for a spanning tree network, the next best candidate will be elected as the root bridge. In the event that the original root bridge becomes active once again in the network, the process of election for the position of root bridge will occur once again. This effectively causes network downtime in the switching network as convergence proceeds.

The Root Path Cost is the cost associated with the path back to the root bridge, whereas the Path Cost refers to the cost value defined for an interface on a switch, which is added to the Root Path Cost, to define the Root Path Cost for the downstream switch.