Archive for December, 2012

On the Premature Death of Spanning Tree and the Indiscriminate Killing of Canaries

I have a bee in my bonnet. After my last post full of love and bromance, this one is full of hate and vitriol – and I don’t apologise! We have all seen many presentations on each vendors latest and greatest “fabric” technologies over the past 18 months. It doesn’t matter which vendor, whether the presenter is sales or tech, or even enterprise or service provider focused – at some point almost every one declares that their solution is “the end of spanning tree”. It gets worse when they actively advise that you do not run spanning tree in your environment.

And I don’t buy it 🙁

The Premature Death of Spanning Tree

Spanning Tree: noun – A pox on the house of networking

Everybody loves to hate on Spanning Tree. Haters gonna hate. While we’ve all been bitten by something horrible happening related to spanning tree, I have seen many more things go wrong because people *didnt* configure Spanning Tree properly.

Vendors knew how painful it was and went to great lengths to ensure that we didn’t need to do anything so that it would “just work”. Which is great… right up until the point that you run into a limitation on the STP PixieDust Mode. Often this comes in VLAN dense environments when you max out the total number of spanning tree instances that your devices will handle. Oh thats easy – lets run out Multiple Spanning Tree!

I can hear the gasps from many people now. If people hated Spanning Tree, then they have a full lynch mob ready with pitch forks and stakes at the mere mention of poor MSTP. And they hate it for a reason. MSTP makes you think about how spanning tree works again and all the PixieDust goes away. And networks become hard again.

I will let you in on a little secret:

I actually really like MSTP and I implement it every chance I can get.

Yes its a little harder and requires a little more forethought, but I would rather do that at the design time then have to overhaul my design later to meet some new need well after my network has reached “critical mass”. I have spent many hours rebuilding spanning tree designs because I needed more than 128 instances. Sadly in more than one case I needed to work out the best way to deal with a group of Catalyst 3750 in PVST+ with 1000 VLANs configured (and 900 VLANs with STP disabled).

And things get messy, and things get hard. So lets find a new solution.

The Magical Healing Powers of Woven Unicorn Hair Fabrics

Somewhere over the rainbow, far beyond the Dark Forest of Broccoli Despair, many magical elves have worked hard to deliver us a the perfect solution to the problems I listed above. Vendors have taken this creation and moulded into their own “Fabric Solutions”. Some created skinny jeans, others an uncomfortable sweater vest. Sadly most of the time they have just presented us with a sensible pair of slacks that the sales people try to sell as a three piece suit.

A sensible pair of slacks (Unicorn Hair or otherwise), is perfectly apt when used as intended, but if you drape them over your shoulders and call them a shirt then your wrong (or a hipster. In which case your cardigan is probably over the top of your shirt-slacks).

And so it is with data centre fabrics. I agree that most of these solutions will allow us to disable spanning tree on our core/fabric facing interfaces. We will get many of the benefits of multi-path layer 2 and some times efficiencies gained by avoiding the flooding of L2 addressing information. Turning off spanning tree into the fabric core makes sense. Im happy with that.

So what about all those edge interfaces?

Do we live in a world where end users never plug two ports together?

A client PC never bridges interfaces?

How about “Oh my VoIP phone has two network ports let me just…. BOOM!”

Maybe you have no requirement to integrate with other networking infrastructure, but end stations can still do bad things and thats usually when you don’t want them to.

The Indiscriminate Killing of Canaries

So how do we go about detecting these loops? Well over the past couple of decades we’ve presented ourselves with a whole cage full of canaries that can alert us to loops or other similar problems in the network. These are our early warning signals that “Something bad just happened…” and better yet “… so let me just fix that for you!”. And sadly, many of these have been built around the functionality that Spanning Tree provides.

Let’s take the BPDU Guard feature as an example. BPDU Guard is set on an access port or another port that you do not expect to see Spanning Tree Packets (BPDUs). If a BPDU is detected, the switch will usually log a message and send the port into a blocking mode. In the scenarios listed previously the offending port is now taken out of action and the loop is removed. If we have disabled Spanning Tree on all ports then the BPDU will never be sent or received and our little bridging loop will happily continue. Well at least until your switch is a bubbling blob on the bottom of your rack.

Another feature available on most switches is the BPDU Filter. With BPDU Filter enabled on a port the switch will pass all traffic on the port but silently drop the BPDU messages. Now I agree that their are certain times when this feature is useful, such as when interconnecting with a 3rd Party that you “know” can never form a loop with you and you do not want to either learn a STP root from them or go into block due to election issues.

Sadly, our good friends at VMWare love to advocate that we implement BPDU Filter on the ports facing our VMWare Hosts. Unfortunately I have been bitten by loops coming from inside a VMWare environment due to a Microsoft Guest Bridging two vNICs in separate VLANs. A BPDU from the came in from the Physical NIC on VLAN A out to the vNIC in that VLAN and back out through vNIC and VLAN B. Thankfully when this happened, my canary (BPDU Guard) signalled that there was a problem then promptly died in its cage and disabled the port to the VMWare Host. Yes this would have some undesirable effects on all the other guests on that host, but we were alerted to the problem and needed to fix it. In the scenario with BPDU Filter these alerts would have been filtered out and the loop would continue unnoticed.

So what other methods do we have to detect possible bridging loops that do not involve Spanning Tree to be operational? I have the following list as a start to some ideas, and I am looking for others that you might know of too:

  • Broadcast Storms
    • Possible Mitigation: Storm Control
  • Multiple Mac Addresses on a  Port
    • Possible Mitigation: Max MAC Address restrictions
  • MAC Address Flapping
    • Possible Mitigation: MAC Flap Dampening
  • High CPU Usage (in some cases)
How do we best monitor these details and present them in a useful way to our NOC and Service Desk people to know when something bad is happening without the tools we originally created? How do we mitigate these issues so that we can maintain some of the “self healing” we have had with out previous tools?

Mop and Bucket

Yes, I’ve written his post at 2am, but its been something that I have been thinking about for the past 8 or 9 months.

I can see that Spanning Tree doesn’t have an indefinite future, but calling it dead today is premature. If you are looking at fabric technologies or worse still you dont have a new fangled fabric but hate spanning tree so bad that you have just turned it off, then ask yourself how you will detect loops in the edge networks and how you will mitigate them.

Take your canaries with you and let them do their job and don’t strangle them at the top of the mine shaft.

If you do you might just find that the Emperors new cloths are just a sensible pair of slacks!

Comments (7)

Mentoring your way to better career happiness

Yesterday was possibly one of the proudest days in my professional life. My very good friend Anthony Burke sent me a text message just after 9am to tell me that he had passed his JNCIA-Junos exam. This is a close tie with when he earned his place as a delegate to Network Field Day 4 back in October.

So why am I so proud about this? Well Anthony and I have become great friends over the past 18 months all through the magic of social media. One day for no reason (as is the way of things) we just started talking. Tweets became skypes and in January, even though we had only ever met face to face once, my wife and I drove the 10 hour trip to Melbourne to attend his wedding. (Sitting at table answering the question of “So how do you know Anthony?” with “Oh. From the Internet” was a humorous experience)

So how is this different to any other random person on twitter? Well very early one I realised how eager Anthony was to learn more about network and to soak up knowledge from those willing to share it. I’m not entirely sure if he just caught me on the right day (Im pretty sure I commenced our first Skype with a rant about something he never saw coming because I needed to vent at somebody). Over the course of the coming months we maintained a Skype IM dialog open most of the day discussing various networking, technology and career issues.

I spent time helping him learn new topics and sharing any possible pointers and tidbits I had that might help him with what he was working on. In return he acted as a sounding board to bounce ideas off and offering up suggestions. Over time I noticed his advice back to me started to surprise me with how much he had already learned.

In my time as a senior engineer and various management roles I have been required to assist junior staff with both their technical and their career development. I always found this to be a difficult task because I was trying to balance what I wanted out of them as co-workers with what it is they wanted to do with their own careers. I had financial incentive to push them down the path that benefitted me more than it did them. Each time they moved away from “my chosen path” there was a feeling of disappointment or failure (on my part). Even when I worked with the staff to clarify where they wanted their careers to go, I felt they were trying give “the right answer” or at least the one they thought I wanted to hear.

I have since learnt that the most personally rewarding mentoring system for me is one where I get no personal benefit from the success or failure of the other participant. Helping them make the decisions that are right for them without those choices directly impacting my day to day work life has been quite a different experience. I am better able to frame my advice to give both the benefit of my own experiences coupled with knowing where they want to go professionally.

Mop and Bucket

In a recent email conversation, reference was made to the the gutting of the middle ranks of IT. We have a lot of junior staff and a lot of highly skilled engineers, but we are starting to see a thin mid-section (something my mirror cannot attest to). Whether this is due to the focus on specialisation vs generalist IT or not is uncertain to me, but I have seen evidence to suggest this accurate (or becoming so).

So how do we move junior engineers onto the path of becoming journeymen and future experts? I suggest that each of you should keep your eyes open to “up and comers”. The ones who “have it”. You will be surprised when and where you find them, but they are everywhere. Take the time to help them out when possible. Answer the questions that they ask. Take time to get to know their strengths, their weaknesses and their career aspirations. Put them in contact with your contacts and help build their network of resources. Everyone “knows a guy who knows that thing”. Pass it on. Our contacts are not just secret tools to make us look good.

Once they become a journeyman engineer, encourage them to also look for people to mentor and repeat the process. I have found this process to be very rewarding for me personally, and I plan to continue doing it. Having a mentor (or several) above you and below you can do a lot to encourage and maintain your own growth.

When I first started talking with Anthony, he had passed his CCNA and started studying for his CCNP. Since then he has continued on with increasing his Route, Switch and Security skills. He fell in love with the ASA (urgh) and despite much encouragement I couldnt talk him around to playing with Junos. In late October he finally got his hands on his own SRX110 and started using Junos for the first time. I just over a month he has learnt more than I learnt in my first 6 months playing with Juniper kit. He passed his JNCIA-Junos yesterday, and apparently preparing for his JNCIS-SEC.

Taking on new technologies and really understanding them so quickly? This is why I am proud.

If this sounds like a bit of a love letter to a friend then so be it 😀

And Burkey…. #FHP 😉

Comments (14)