DCB: How to Engineer your way out of a poor architecture decision!

I recently gave a presentation to the New Zealand Network Operators Group (NZNOG) 2011 conference on “Data Centre 3.0”. During my research over the last 8 months coupled with the fact checking I had been following up during the creation of the slides, I kept asking myself:

“Would we need all these protocols if we, as an industry, had made better technology implementation decisions?”

I understand the background and requirements for some of the different technology proposals, particularly Layer 2 Multi-path and the various Data Centre Bridging (DCB) QoS standards, but I cant help but feel that we are trying to bring features of the higher layer protocols down into Layer 2.

Back when I started studying networking (probably around mid 2001 when I first obtained my CCNA), the CCNA curriculum was quite clear on the OSI layer and how each layer had a very particular purpose, with clear functional definitions:

  • Layer 2 – Communication of hosts on the same media segment
  • Layer 3 – End-to-End addressing and communication
  • Layer 4 – Connection Oriented traffic via TCP (including Window Scaling) , or Connectionless with UDP
  • Layers 5 to 7 – Application Layer with added session control and possible tracking of per flow based statisticsFrom early on we had options for end to end communications, we had options for scaling traffic based network conditions (eg TCP Window Scaling), and we have various iterations of Layer 3 Quality of Service (TOS, DSCP etc).

    As the popularity of Ethernet Switching (Ivan: You know I mean bridging!) continued to grow and with the majority of Layer 2 networks standardising on Ethernet as the de-facto Layer2 Standard, we started to see individual Layer 2 domains span larger and larger areas. No longer were these simply a series of hosts on a shared bus segment (eg 10Base2) or even a simple hub and spoke segment on a single hub/bridge (eg 10BaseT) but rather a large interconnected mesh of bridges spreading across floors, buildings and campuses.

    Now we needed a way of classifying traffic based on priorities that would be consistent across these large layer 2 domains. This was addressed in the 802.1p standard, which allowed priority classification on 802.1Q trunk links – but did nothing for access ports.

    Various proposals have been put forward in an effort to address the need for end to end QoS control of Ethernet traffic. One of the driving forces behind this is the requirement of “Lossless Ethernet” in converged storage networks.

    The history of SCSI, FibreChannel and FCoE is documented elsewhere, but needless to say some bright spark decided the best solution would be to embed SCSI commends directly into Layer 2 (plus some L2 headers of course), and not build in any error or packet loss checking. Had they chosen instead to use an IP based protocol, they could have easily used the functions already existing TCP/IP to detect these problems, but instead now boffins and propeller heads are busily creating an array of standards to try and combat the fear of dropped packets in storage networks. All of this adds up to new hardware, new chips, and more places for things to break!

    On top of this, we have the wonderful phenomenon of “Virtualisation”. With the poor architecture choice of a single vendor (and those that copied them), we now have an army of SysAdmins shouting the mantra of Layer 2 Data Centre Interconnect. Not only do we need to have multiple locations for redundancy, but they must be in the same Layer 2 segment for this design to work correctly!

    Traditional (and sensible) network design would put each of these locations into separate IP subnets, and utilise IP routing for clear separation of the distinct networks. Now vendors of network equipment – including load balancers and security devices are scrambling to re-architect their products to support this new design paradigm.

    Greg Ferro and I were chatting a while back about all the things people are trying to tack onto “Ethernet” – QoS, OAM, end-to-end communication etc, and this question came up:

    How Far do you go before it stops being ethernet?

    Why is it that we continually are making a rod for our own back? When do we stop trying to extend protocols with functions they were not designed for, especially when we already had to solutions available to us elsewhere?

    I’m not sure where this is all headed, but with the growth of Layer 2 networks spanning across geographic locations fuelled by the growth of virtualisation and converged storage networking are we treading down a well worn path to failure? What costs will there be for organisations when they need to re-evaluate the designs currently considered “Best Practice” by certain vendors?

    As always, your thoughts (and flames) are welcome 🙂

  • Comments (6)

    RADIUS “Auth-Type” Attribute

    The Intro

    One of the things I do regularly for my clients is to build LNS Infrastructure to take wholesale L2TP handoff. It is not uncommon to take these handoffs from several suppliers covering technologies such as DSL, Dialup ports, Wireless, 3G etc.

    Wholesale pricing in Australia means that it is not unreasonable to take a wholesale L2TP handoff from an Aggregator company and to build your corporate WAN using wholesale DSL services. Often these networks are built for customers who’s primary business is not running networks, but rather delivering a service. I have an upcoming series of blog posts that will cover creating these services in more detail.

    Once building this infrastructure, barring a new wholesale type, there is little more required than to order the service from the wholesaler and add a service to your RADIUS server. Other than the regular maintenance and patching of any network device, things generally go smoothly day-to-day.

    That is until the “provisioning team” from said customer sends out a poorly configured router, and nobody discovers till it gets onsite. Usually this manifests in an incorrect PPP password.

    I wont say that this happens often, but it happens enough that I need a “fix” ready to apply when the customer calls.

    Isn’t it amazing how a poorly configured CPE is my fault?.

    The Fix

    The RADIUS standard provides us with an attribute that can be utilised in this situation – The Auth-Type Attribute. This attribute can be used to forcibly “Accept” or “Reject” a RADIUS account. All you need to do is to add “Auth-Type := Accept” to your RADIUS Check entry and RADIUS will ignore the password stored.

    A simple RADIUS entry looks something similar to:

    ppp-user        User-Password:= "password"
    		Framed-IP-Address = "10.10.10.10"
    

    With the Auth-Type override it would look like this:

    ppp-user        User-Password:= "password", Auth-Type := "Accept"
    		Framed-IP-Address = "10.10.10.10"
    

    Next time the client tries to re-authenticate, RADIUS will choose to authorise this user even if the password is wrong.

    The Catch

    So after having to once again implement this attribute earlier this week to fix a router that was only sending a CHAP (and therefore encrypted) password, I decided to make a tweet about it. Not long after had I posted this, Ivan Pepelnjak of IOS Hints replied with:

    It's also a terrific security hole. How do U make sure the acct is not used to log into your router?
    

    My first response was that you should use a separate AAA server for your Console/Admin Authentication to Customer/PPP Authentication. Ivan promptly replied with:

    Still, is there a way to prevent that acct from being used as a router login?
    

    Thinking this was some kind of test, I started thinking…

    The Answer

    Before I was able to tweet a reply, Petr Lapukhov replied with:

    @ioshints @networkjanitor hmmm, "autocommand=ppp" for PPP accts? remember doing that back in 98's :)

    Now while I believe Petr’s response will have the desired overall effect, I suggest that it is less than ideal. Essentially the “autocommand” feature is used to run a set command on login of the user and then log them out. Ivan has written some posts on this on his blog. The two short-comings I see to this method are:

    1. If the you try to use this account for console access to the router you are given a session with either a stream of rubbish on the terminal, or “This line may not run PPP.” followed by a disconnection.
    2. The user is actually authenticated “OK”, by the RADIUS server, and no denied attempt will be logged for auditing purposes.

    My view is the correct fix to this problem lies in RADIUS Best Practice implementation of the “Service-Type” Attribute.

    Yes, I purposely left this out of the RADIUS configurations above!)

    When used in the RADIUS Check section, the Service-Type attribute will only allow a match if the Correct Service type is sent. The two general ones that relate to this post are:

    1. Service-Type = Login-User – Used for console/vty access to a RADIUS client
    2. Service-Type = Framed-User – This is generally the service type that is used for PPPoE and L2TP termination.

    I built a test lab this morning, and tested the above assumption with the following results:

    When configured with the original settings listed above. the “ppp-user” account was able to successfully authenticate and log into the router.

    The modification below was able to successfully stop the account for being used for vty login to the router, while still successfully being able to login via PPP.

    ppp-user        User-Password:= "password", Service-Type = Framed-User
    		Framed-IP-Address = "10.10.10.10"
    

    And just because I wanted to ensure that “Auth-Type” would not override the “Service-Type” attribute, I tested the below config as well, and confirmed that this also would not allow vty console access to the router.

    ppp-user        User-Password:= "password", Service-Type = Framed User, Auth-Type := "Accept"
    		Framed-IP-Address = "10.10.10.10"
    

    The Wrap Up

    What was meant to just be an off handed tweet about a useful feature ended up sending me to the lab to test a bunch of assumptions I had been making in the past. I ended up learning a lot about a task I spend a lot of my week doing.

    If you have any other suggestions (or corrections!), then please comment below.

    Stay tuned for the upcoming series on creating your own LNS infrastructure.

    Update: The Forgotten Credits

    I would also like to thank Marko Milivojevic from IPExpert who acted as a sounding board while working through this post, and confirmed some of my suspicions before I went on to lab them. As always Marko, thankyou for your assistance!

    Leave a Comment

    Exam Review: JNCIS-ENT (JN0-343)

    As you may have heard, Juniper has been shaking up their certification program – and all I can say is “It’s for the better!”.

    In an effort to consolidate the disparate certification tracks (which were previously product based), they have moved towards being more centered around the market segments (and by extension the careers of the engineers going for the certs).

    The first change was migrating the M track to becoming the Service Provider track. This is actually the track that would have made the most sense for my 9 to 5 (also 5 – 9) job, but as usual I don’t like to follow convention.

    In August, it was announced that the exam I had been studying for (JNCIS-ER) was being retired, and that a new exam track which brought together both the -ER (Routing) and -EX (Switching) in the context of Enterprise Networking was being released. I decided to instead sit the JNCIA-ER exam (which I blogged about previously and received some interesting feedback 😉 ), and then wait until the new JNCIS-ENT course was announced.

    It was an anxious wait with the teasers about the new exam coming out of the JNCP office! In early October the exam objectives were announced, then quickly followed by course material being available on the Juniper Fast Track website. I am fairly sure the fast track information had been up for maybe an hour before I had downloaded it and started working on a study plan.

    At this point I knew that I had about a week before Prometric would allow me to book the exam, and I knew I wanted to be one of the first people with this certification. Its an ego thing – leave me alone. Unfortunately fate (well a friend’s wedding) would see me out of town the first 3 days the exam was available to sit, and during this time a friend from the twitterverse managed to beat me to it (JERK!). So instead I booked for the following weekend (October 16th).

    I figured I had all the time in the world to study for this exam (1 week to be exact), and that I would manage to get all of it in with time to spare, and maybe relax with a few dirty chais pretending that I was enjoying it! Once again, fate intervened. After the 3 day weekend interstate for a wedding, I managed to have an extremely busy week preparing for my presentation at the Australian IPv6 Summit, as well as the lab guide for the training I was presenting on the first day (I flew out to the summit the day after I sat my exam).

    Never fear, I can read through 160 pages of routing study guide on the Thursday night, and 140 pages of switching on the Friday. Well, maybe I can read 50 pages on Thursday, but I will head home early on Friday and read all of the remaining sections Friday night. Yeah, that’s a great plan! Lets move on with that.

    So, on rolls 6pm Friday night, and I’m still in the office. I’m making every effort to be out of there without anybody stopping me! Phone rings – customer has a problem and somehow I’m the only one able to fix it. Joy! I guess I can study while supporting an onsite engineer. Maybe I can get a few pages in while he is moving between locations.

    No. No I couldn’t. By the time I left the office at a 21:45, I had read 3 pages and possibly remembered half a sentence at most! Never fear! The exam isn’t until 10am – that’s more than 12 hours away, and I can still read when I get home. My wife had a different opinion on the matter! Never mind I can get up early, drive the 100+ km back to Sydney and be at the exam centre early and read from the car park.

    3 alarm clocks and and two snooze cycles later, I managed to get away on time. I should make it with plenty of time to spare. Wait… what’s that up ahead? Why is there traffic as far as the eye can see? This doesn’t fit at all with my plan.

    So a lesson for life – don’t curse at traffic on the freeway and wish bad thoughts on who ever caused it! If only because you will feel really bad when you see that its a Rural Fire Service truck over turned in the middle lane. That moment really snapped my perspective into shape!

    So I made it to the exam centre with 30 minutes to spare, so I decided best bet was to flick through both guides as a refresher, and go in trusting on the experience I have gained over the past exams I have studied for and real world experience.

    Sign in, sit down, and get underway! Here is my review of the exam:

    • JNCP have put a lot of thought into their new plan and the work has paid off
    • The study material provided on the Fast Track portal vastly superior to the previous material on the site. If you have previous theory knowledge on the exam objective topic areas, these study guides should be enough to get you through. If not, you should look to supplementing your knowledge with some of the great courses offered by your local Juniper training partners.
    • Between the time I sat my JNCIA-ER and JNCIS-ENT, I have learnt a lot more about the Juniper philosophy of building skills layer on layer throughout the certification process. Unfortunately this makes several of my comments about the JNCIA less valid – such a focus on “simple” features, J-Web, or the low end focus. Each exam is there to test a level of knowledge, and the following exam builds on that without repeating content.
    • The merging of the Routing and Switching tracks makes a lot of sense, and the content has been very well distributed between the two.
    • Don’t be cocky! You will need to study for this one, and you will need to know the theory behind each of the technologies from the exam objectives. Try and get some hands on with both routers and switches.
    • There were several questions about correct configuration of certain features from the exam objectives. If you know the theory, and have practical experience with the Junos configuration model, it shouldn’t be too hard to pick between the different options.
    • The exam confirmed things I already knew – my skills were strongest in BGP, Protocol Independent Routing and HA. My weakest areas were Spanning Tree (know the default values!), and IS-IS
    • IS-IS? In an enterprise? It’s not unheard of, but its far from common! Given that it is the only scalable IGP other than OSPF on the Junos platform, I can understand why it is in the exam. My previous certification and study experience had given me a basic theory for IS-IS, but I should have focused more on the implementation of this to supplement the theory.
    • While the exam felt heavier in IS-IS and spanning tree questions at the time, when I think back I must admit the questions were evenly placed, but because of my weakness in those areas they really stood out in my mind.
    • The pass mark for this exam was lower than I expected, but this is probably good news for many out there!
    • The discounted price of this exam if you go via the Fast Track program certainly makes it viable to sit the exam to get a feel for it!

    My result? Well I really thought the result could have gone either way once I closed my eyes and hit submit on the final question and review stage. I knew I had nailed some of the questions, but there still felt like many I wasn’t sure about. Nervously I opened my eyes to the news I had passed! It wasn’t the greatest score I have received in a test, but I had pulled off what I come to do!

    So what’s next? Well, the JNCIP-ENT hasn’t been released yet, but I am certainly keen to get started on that as soon as the exam objectives and format are released some time in the new year! In the mean time I have to pass my CCDA (worst exam in the world to study for as an engineer!), and continue preparing for my upcoming CCIE written exam.

    Special Thanks

    I think special mention needs to be made of Liz Burns and her team at the JNCP. If you aren’t already, you should start following @JuniperCertify on Twitter. Liz provided a lot of information in preparation for the exams, as well as encouragement in the weeks leading up to the exam. The JNCP team are really great community ambassadors for Juniper. Liz and Kieran from the JNCP also featured on PacketPusher’s Runt Packet this last week outlining the future of certification at Juniper.

    Thanks to Nick (@NetDonkey) and Chris (@ccie25655) for the encouragement and friendly competition in getting ready for the exam.

    Lastly, thanks to those three readers of my mindless blog ramblings!

    Comments (4)

    Wireless Drinking Stories

    I had a “peaceful” weekend pretending to study for my upcoming CCDA exam in a couple weeks. Trying to convince myself that I cared, I decided the best course of action was to watch a movie! Sounds reasonable, right? I’ve been working my way back through Kevin Smith’s movies over the last couple of nights, and decided “Chasing Amy” was the perfect way to concentrate on what needed to be done.

    There is a scene in this movie where two of the main characters are trading stories about their scars, paying homage to the classic scene in Jaws. At approximately the same time @amyengineer and @NetDonkey were on Twitter discussing geek talking at parties, and this got me thinking about some of the usual stories I tell after a few beers at the different conferences I attend.

    So here it is. Think of this as my “Greatest Hits” of stories so far, these ones in particular relating to my experience as the Network Operations Manager of a wireless ISP in Australia. Alternative title – “When its not just an Invisible Blue Cable”.

    The one about the tide

    This particular story occurred within the first few weeks of my job at this company. At this stage we only had wireless coverage in one region of Australia, and were rolling out a test network in a region about 6 hours away using a new technology. In our local network we had a wireless back haul that used Motorola Canopy in a PtP arrangement across Brisbane Waters bay that backed onto another 900MHz Canopy AP.

    The engineers who had been troubleshooting this problem on and off since before I started explained the situation as such:

    “At random times our signal strength goes out the window, and we get to the point where we lose signal to the point that the PtP link drops out”.

    I started troubleshooting this myself with another engineer who gave me my first lesson in real world wireless deployments:

    “Sometimes its not just an invisible blue cable”

    We had the device in MRTG and were reviewing historical signal strength readings when it started to occur to us that the signal strength issues followed a 6 hour cycle. Upon further investigation we realised that this link that went across a reasonable expanse of water was being affected by the change in sea level as the tides came in and out.

    As I soon learnt there is a piece of critical wireless theory called “The Fresnel Zone“. Basically this is the area of space between two radios in a 3D elliptical shape and the size is based on both distance as well as the radio properties of the radio. Generally you don’t want to have more than 20% of the Fresnel zone obstructed, though it is often acceptable for up to 40%. There is plenty of stuff relating to this piece of wireless theory, but as long as you remember 60% clearance and the fact it is pronounced “Fray-nel” not “Frez-nal” you should be able to fool you way passed many RF guys and other lesser beings 😉

    In our story the original link must have had marginal clearance within the Fresnel zone and the mere impact of the tides was enough to throw our signal strength out the window! Simple things like this can make your day disappear real quick.

    The one about the GPS

    In our second network we rolled out a (for the time) new and fancy wireless mesh product from SkyPilot Networks. On paper this product seamed to tick all the boxes for me – Single Channel for every repeater in the mesh, 360 degree coverage, Auto-provisioning, 802.1q and QoS capable and the ability to perform signal testing from the repeater.

    In a SkyPilot network you generally have three types of devices:

    1. The SkyPilot Gateway – Looks much like a up-turned garbage can, but is the brains of the mesh network. This device has a built-in GPS for coordinating communication within the mesh. This device is what you would plug into your traditional networking equipment in the datacenter / Point of Presence.

    2. The SkyPilot Repeater – exactly the same look as the gateway, but a bit less smarts. This box needs to talk either directly to a Gateway device, or another repeater that is somehow connected to the gateway.

    3. The SkyPilot Connector – This is the end customer CPE. Connects to either a Repeater or a Gateway on the wireless side, and Cat5 down to the customer’s computer via a PoE adapter on the LAN side. You can set the 802.1q VLAN id associated with the LAN interface per radio.

    The basic concept is that each Gateway and Repeater device has 8 directional antennas arranged internally to provide 360 degree coverage. All sorts of wonderful maths comes in to play, but suffice it to say that the Gateway device uses timing sequences from the built-in GPS to (try to) ensure that no to panels that are facing each other are transmitting at the same time in an effort to reduce back chatter and radio interference. In a perfect world! All of this means that without correct GPS sync the device goes down the toilet taking your mesh with it. If the device detects a problem with the GPS of a continued period of time, the gateway will reboot… once again taking your mesh with it.

    I came across a problem where at a “random” interval one of our gateway devices would reboot, and when it finally came back online (sometimes 15 – 45 minutes later), the reboot cause was listed as “Lost GPS Sync”. Much troubleshooting went on over quite a period of time.

    The vendor suggested ensuring correct grounding of the device, because if there is significant difference between the electrical ground provided to the grounding lug on the base of the gateway unit can be enough to distort the signal and cause a reboot. This did reduce the frequency of the reboots, but they were still occurring.

    Eventually on our 3RD RF check using expensive equipment we discovered that there was a Pager service on the water tower at the end of the block that was used as part of council surveying procedures. This device broadcasts on (iirc) 450MHz and had its antenna pointed almost directly into our Gateway device. It was merely coincidence that we were there at exactly the same time as the signal was being sent, but we caught it.

    The lesson we took away from this:

    “When troubleshooting problems with wireless, be aware of not just your broadcast frequency, but that of your GPS, and the various half, quarter and other offset frequencies for your application”.

    Forgive me if my science is a little off on this, but while I spent 2 years working there, I was an “IP Guy” and not an “RF Guy”. Once again proves there is more than just an invisible blue cable!

    The one about the barn

    This is one of my favourites. It has become much like one of those fishing stories where the size and height of the barn get bigger with each retelling. I love to fish, but I rarely catch anything so this is the closest I get. I like to tell this one to younger staff who complain when I ask them to get on a ladder or crimp a cable from the comfort of their desks.

    The setting is our third region we rolled out into. I had never been to this region let alone know the location of the sites in relation with each other – but I had a map, and the guy who chose the sites. While standing on the top of a mountain with beautiful scenery all around me I was given the following instructions:

    “OK, you see those mountains on the horizon? They are about 8km away. You see that dip where my finger is pointing? OK, well another 8km on the other side of that dip is where the other side of this link is going”.

    At this stage we had a discussion about paralax error and the fact that “where my finger is pointing” is not an exact instruction. Eventually I put together which “dip” in the mountains we were talking about and proceeded to mark out the point where the mast would be erected. We then drove to the proposed location of the other side of the link.

    When I get there, I was informed that we were going to erect a 10m mast on the top of a barn that just happened to be in the right spot to see “through the dip in the mountains”, as well as line of sight back into town where a usual fibre backhaul provider had a POP at the local hospital.

    After a day and a half of troubleshooting bringing up this link and taking both masts up and down several times, we discovered that our two antennas were assembled incorrectly. Our first few lessons from this trip:

    “When installing any wireless link, ensure that you have matching polarity in your antenna.”

    “Just because someone more knowledgeable than you assembled your antenna back in the office and shipped them to site, don’t just blindly trust that they did it correctly”

    “One antenna set to horizontal polarity and one antenna set to vertical polarity can cause enough loss to make sure you never see signal from the other side (especially when your aiming at a non-descript dip)”.

    After another day of troubleshooting the long haul link we finally got a good signal and locked off the mast where it was. Now to focus on the 1500m short hop into the hospital. This is about the same time that you learn that the cable leading up to the AP was crimped incorrectly and the device is not powering up. Not long after this you learn that the riggers who have been helping you erect the mast refuse to take it down again so you can fix up a cabling problem.

    How does one solve this problem? That’s easy! You borrow an extension ladder straddle it either side of the peak of the roof while somebody who pretends not to look holds it steady. Climb said ladder and unclip the cover off the AP at almost full stretch of your arms. Remove the cable and cut the RJ45 end off. Recrimp cable above you head while trying not to think about the fact you are standing atop a barn that is about 8m high to the guttering, another 3m to the peak, atop a 2m ladder leaning on a flimsy pole. So thats White-Orange/Orange/White-Green/Blue/White-Blue/Green/White-Brown/Brown, yeah? This lesson:

    “Always crimp your cable correctly. Then check it, and check it again”

    “Test you cabling while everything is still on the ground”

    “If you do something like this, don’t post it publicly where WorkCover can read about it… oops!”

    Eventually I got the whole thing sorted out, but not without a lot of wasted time because of silly assumptions. This leads to my final lesson from the trip, which I call “Kurt’s Law of Away Trips”:

    “When somebody else spec’s a job and insists its a 2 day job, pack for a whole week just in case and take a good book!”

    The one about the Boat

    This next story begins with me driving down the freeway with a family member when I got a series of alerts from our NMS system indicating that we just lost the primary hop between one of our POPS and the first AP up on the hill about 2km away.

    Being the always on call network ninja that I am, I started SSH’ing from one smartphone, while RDP’ing into the NMS server from another. Things didn’t look good – there was lots of red. Red is never good! I called the local support engineer who informed me that there was a storm in the area, but he would make the trip up to the tower to check on everything. He braved the rain on the way up the mountain, and I braved the local Chinese take away.

    I got a call about 45 minutes later informing me that everything was powered up correctly, but there was still no link. At about this time our conversation went a little like this:

    Engineer: Oh! Could it be the boat?

    Me: What boat?

    Engineer: The boat that pulled into port today?

    Me: What?

    Engineer: A big container ship pulled into port today and its about 2 blocks from the POP and smack bang between the POP and the tower!

    Me: Umm… what?

    As it turns out, that is exactly what the problem was! Much like the first story, when the tide was at its highest (coupled with the storm) the boat was blocking enough of the signal to cause an outage. And three days later when the boat left signal was restored. 5 9’s anyone? I took a few lessons from this:

    “Network documentation for remote wireless hops needs more than some Visio diagrams. Get photos in both directions as well as the general area. It will come in very handy when trouble shooting”

    “Don’t let people without the technical ability determine where sites will be located”

    “When the name of the region is something like “Port X”, there is a pretty good chance a big damn boat is gonna end up nearby so make sure that doesn’t ruin your weekend!”

    The one where my car fell of a mountain

    Funnily enough, this is the one my old work mates like to remind me of! One of our main tower sites for the wireless network on the Central Coast was only about 10 minutes drive from the office. I decided on the way home one night (spur of the moment decision) to drive up the tower and try and replace a management card in a UPS that had been giving us grief.

    In theory this should have been a simple 5 minute stop off – and I guess it was. The real problem occurred as I was walking back to my car. While I had been inside it had started to rain ever so lightly. This was the first rain we had had in about 2-3 months at that stage so the (very well sealed) road was covered in a fine layer of dirt and once dried leaves.

    The road was kind of winding, but with a max speed of 45km/hr we arent really talking about rally driving here. coming around the last bend before hitting the actual public road, I hit a patch of leaves and my car decided to move sideways off the road towards the down hill side of the mountain. Nothing here happened fast. I had a good 15 to 20 seconds of sideways drift as my car decided to continue with no traction off the road and down the side. The add insult to injury, when my car finally came to rest it was stopped by a tree no thicker than my arm!

    Upon getting out of the car and surveying the situation, I realised there was no way I was getting myself out of this one alone. I had to admit the defeat. I called our Operations Manager who lived a couple of blocks away and who owned a 4WD. The phone call went something like this:

    Me: Ralph? Do you love me?

    Ralph: What? Why?

    Me: I drove of a cliff!

    Ralph: …

    Me:…

    Ralph: BAHAHAHA! I’ll be there in 5!

    When he finally arrived, recovered from the site of me and my car sitting on the side of a mountain, and took enough photos to make fun of me the next day, we hooked up a tow strap to the back of the fourby and tow my car out. I busted up the front grille of my car and put a small ding in the bonnet, but other than that the Lancer came out OK. Me on the other hand am still held accountable for this story when enough alcohol is applied!

    The one where you got bored and stopped reading

    Well I guess I will leave it there for now. I’m sure you all have much better things to spend your time doing, and as it is I have just given most of my best stories away. I guess this means that if we ever do meet, we will have to talk about the weather or some such.

    So this wasnt the most technical blog post on the face of the earth, but lessons learned in the field are the ones that seem to stick! Feel free to share any interesting stories you have that may or may not be related. In the mean time, I am going to go back to studying for this stupid exam!

    Comments (1)

    Shout out: Junos Firewall Filters by Robert Juric

    In my previous blog post about Juniper Training, I discussed how Juniper Firewall Filters were quite interesting and new to me because I have been using SRX since I started with Juniper equipment 12 months ago.

    Robert Juric (@robertjuric) has written two really good blog posts about this topic that provides a really good overview of the topic. Robert is currently studying for his JNCIA-EX exam and has written several articles about Junos configuration.

    If you are new to Juniper and Junos, then you really should check out the following to Articles:

    Comments (1)

    CCIE BootCamp World Tour!

    Living in Australia, I have gotten used to hearing about “once in a lifetime opportunities” and expecting they will always be on the other side of the world. This is a nice way of protecting myself from the inevitable disappointment that follows… BUT NOT TODAY!

    The news is out that Emmanuel Conde from CCIE Flyer has brought together two of the biggest names in CCIE training and taken the show on the road.  Narbik Kocharians and Scott Morris have joined together to create a 12 day tag team CCIE R&S Bootcamp. Details are on his website.

    Currently the proposed “Tour Dates” include Bangalore/India in January 2011, Sydney/Australia in April (Woohoo!), Milton Keynes/UK in July and Wilmington/Delaware in October.

    Now I just need to put my CCIE prep into high gear to ensure I am in an optimal position by the time they come to Sydney!

    Stay tuned for more information or send an email to [email protected] and he “will be happy to share more details as the planning is finalized!”

    Leave a Comment

    Exam Review – JNCIA-ER (JN0-342)

    As a Network Janitor, I spend a lot of time mopping up other peoples mess!  When called in for a consulting job, it doesnt pay to be a vendor bigot. This is why we decided that staff at my company would need to get trained in the key vendors in the networking space. We identified our first 3 targets as Cisco, Juniper and HP. We then started working towards improving our partner levels with each of these vendors, and this is a process that is still underway.

    The partner process opens up the requirement for X number of individuals with A, B and C qualifications – Juniper is no different. There was a requirement for at least one career certified individual (along with the obligatory sales and SE “certification). Being no stranger to certification, I felt I should at least attempt to meet all three requirements. Passing the Sales and SE was somewhat trivial, but Juniper has provided many good resources to accomplish this in their Partner Portal.

    I had registered with the Juniper Fast Track program back in 2008, but had not really attempted to complete the process – I guess I took the slower track?! Back in 2008 my Account Manager sent me on the Junos as a Second Language course as an incentive to buy more Juniper kit. They threw in a copy of “Junos Enterprise Routing” and a Juniper Sports bag!

    I had started to study for the JNCIS-ER (second level) exam when I saw the announcement from Juniper that they had decided to retire the JNCIx-ER and -EX certification and replace them with a single -ENT course. I decided to “wimp out” and instead sit the JNCIA-ER (entry level exam), as this still met the requirements for my partner status. I made this decision on Wednesday, sat the Fast Track prelim exam online, and booked the exam for 10am last Saturday.

    I arrived an hour early (I live about an hour out of Sydney so I like to leave plenty of time), and after the usual pre-exam processing and ritual emptying of pockets, I made my way to my assigned seat.

    What follows is my cliff notes from the JNCIA-ER:

    • The first thing I noticed was that I was able to go back and change questions after submitting them in the exam. This really took me by surprise after so many Cisco exams. I really had to resist the urge to swap and change my answers. I did give in at the end and did a complete review of the exam.
    • My allocated question set included 60 questions in 90 minutes. All multiple choice. No Lab/Sim questions.
    • There seemed to be a lot of product specific questions – “What is the default setting for X on the M Series Platform” etc.
    • Very few of my questions were protocol or technology specific, but rather “Which command implements feature Y”.
    • There seemed to be a surprising number of questions relating to the J-Web interface. “Where would you configure Z in the J-Web Interface”. Now Juniper have spent a lot of time making J-Web be pretty and functional, but to be completely honest I had never logged into this interface in the 12 months I have played on Juniper kit (Well… until I got back from the exam at least!). I’m a Network Engineer, not a Windows Admin 😉 I do everything from the CLI.
    • If you managed to work your way through the Fast Track material, and were able get some hands on with the Junos platform, you should not have problem passing this exam. (Lets just say I had more than 2/3rds my allotted time left over when I left the room, much to the annoyance of the other candidates who started at the same time I did).

    In the end I passed this exam, and am actually looking forward to reading what the curriculum is for the JNCIS-ENT certificatiom, and would like to make a start on that soon after it is announced. Part of my certification road map has the JNCIE-ER (-ENT?) as a probability within the next 18-24 months, so I plan to put a lot more effort into the Juniper product portfolio.

    Comments (4)

    Reflections on Juniper Training

    If you follow my @networkjanitor twitter feed than you may know that I spent 3 days last week in training provided by Juniper and the local distributer Avnet.

    In the old tradition of “free training for channel partners”, I signed up for “Junos Routing Essentials (JRE)” and “Junos for Security Platforms”. There was an “Introduction to Junos Software” course on the Monday that I sent one of my engineers along to, but I didnt attend personally. I have included below my review of the two courses.

    Junos Routing Essentials (JRE)

    This course was a one day course aimed at engineers who may or may not already understand the theory behind various routing protocols and processes.

    There was a brief overview of how a routing table works and how the forwarding table is produced from this, which felt a little redundant at first, but led into further discussion about the various routing tables used within Junos and their functions. There were a few things I had not picked up working on Juniper kit that was handy here.

    Quite a bit of this course was devoted to routing policy and how to import and export using Junos routing policies. This makes sense as once you understand the routing policy structures within Junos you open the doorway to some of the true power of the design inherent in Junos. There are quite a lot of match options available to routing policies that make life much easier (especially if you come from a Cisco background). I am working on a seperate blog post to discuss this topic further, as I feel there is a lot to point out.

    The section on (stateless) firewall rules was interesting for me because I am used to working on SRX series routers which use zone based / statefull firewalls. To date the extent of my firewall policies was around rules on the loopback to control access to SSH/Telnet/SNMP etc.

    Class of Service section was brief but gave an overview of how to build policies to control different CoS settings. You really would want some kind of previous exposure to QoS/CoS to supplement this module, but that really is the point of these condensed courses.

    Junos For Security Platforms (SEC)

    This course was a two day course focusing on the SRX series routers. Most of my Juniper experience has been on SRX240s, so I felt quite comfortable in this class. As always, I taught myself to do exactly what I needed to do to get the job done, so learning the ins and outs of how and why the platform works the way it does was insightful.

    The opening module discusses the benefits and features of a converged router/firewall device and the superiority over traditional disparate devices. Mostly a lot of “my product is better”, but there is little dive into how the hardware traffic flows through the SRX platform. The module finishes up discussing the modular design of the Junos OS and further discussion of Flow based processing that is the foundation of the SRX platform (and shows its lineage from ScreenOS products).

    The next two sections discussed the advantages of Zone based firewalling and how to build security policies to implement your goal. Discussion of the scheduling feature of policies to enforce time of day or day of week style firewall rules was an interesting design I had never really looked at, but has obvious uses within an enterprise type environment.

    Firewall authentication, which is the ability to auth against the firewall to open up a particular set of firewall policies was interesting, and I have seen similar things when I used to use OpenBSD as a firewall, but I felt the uses were fairly restrictive and somewhat limited. If I really wanted something like this, the SRX is perfectly suited to operate as a VPN device and provide even greater functionality to boot.

    Given the lineage from ScreenOS, the SRX platform has inherited a series of SCREEN features to filter broadstroke denial of service attempts as well as handling suspicious traffic. We discussed when to use SCREEN versus some of the optional IDP features or using firewall policies.

    NAT on the SRX platform is somewhat different from both traditional Junos as well as from ScreenOS. The usual list of features are supported. Static 1:1 NAT, Destination Nat (port forwards etc), and Source Based Nat (both with and without PAT). Two interesting gotchas with the NAT implementation:

    a) Security Policy is applied after NAT translations, so say you have a static 1:1 NAT arrangement, you would actually apply your zone based rules on the outside interface, but reference the internal address in the destination as opposed to the IP address used by the remote host. This makes sense after a while, but took some time to get my head around.

    b) Like other firewalls, the SRX will happily snatch and translate any traffic routed through it according to any NAT rules that are configured. If on the otherhand you have say a large subnet on the “untrust” side of the firewall, and you try to make some NAT rules using some of those additional addresses, you will need to tell the router to Proxy ARP those addresses. I had been caught out on this one on a previous job, and felt a little foolish when they brought it up on the course. I wont be forgetting this one.

    As one would expect from a modern firewall product, the SRX supports IPSec VPNs, which were covered quite well in the course. There are two types of IPSec VPNs – policy based and route based. Essentially policy based uses security policies to determine which traffic gets handled by IPSec. Anyone familiar with Cisco IPSec implementations should understand this concept. The other option is route based which configures a new interface on the router (st0.x) that is used as the tunnel between two VPN gateways. You can assign IP addresses to these interfaces and route traffic across it (Or use a dynamic routing protocol) just like any other interface type. It feels somewhat like a GRE tunnel in IOS, but with the added benefit of IPSec encryption and integrity checks.

    A very brief look at the Intrusion Detection and Prevention features of the SRX was given, but this could have been a whole course on its own, not to mention this is a licensed feature of the SRX. A lot of interesting features, but not as powerful as a dedicated IPS/IDS solution. Worth considering for a branch deployment though, which is where this feature is aimed.

    The last section covered an area of the SRX that I have spent some time on – High Availability. One of the great features of the SRX platform is that you can implement an Active/Active zone based firewall solution even on the smaller branch/appliance series of devices. I have implemented a HA pair of SRX240’s for a customer and have been quite happy with the result (though I suggest you lab this heavily before implementing due to instability issues on certain Junos versions).

    In HA mode, you configure a set of redundancy groups and weightings for device failover triggers. There is a bit of fiddling to get some of these groups configured the way you expect them, but this is mostly due to the fact that both devices in the cluster have active data planes, and you need to know which interfaces (and on which device) traffic will ingress and egress.

    HA on SRX Platforms could take another whole blog entry, which I am happy to go into if there is enough interest – so let me know if you want to hear more.

    Final Thoughts?

    So, after 3 days of training I walked away feeling that I had managed to learn quite a bit even though I have been working with Juniper equipment for 12 months now. The theory was aimed at engineers who already understood core concepts and routing protocol requirements, but even a junior engineer would learn a lot from these courses. There was a lot of hands-on lab exposure to teach you the ins and outs of the theory – it certainly made sure you learnt the material.

    If you can get your account rep to organise the training (or have a company who will pay for it for you), then this is certainly worth spending some time on.

    Hope this helps someone out there who is starting to look into Juniper as an alternative network vendor. Please let me know if you want me to follow up anything here, or would like me to show some further examples of the Juniper solutions.

    Comments (3)

    CCIE Assault – Part 1

    Im currently working on my plan of attack on the CCIE R&S and I need some advice. I finally finished my CCNP in June after years of putting it off (I first got my CCNA in 2001!), and now I am trying to determine the best course of action moving forward.

    So far my plan is this:

    1. Buy CCIE Written Certification Guide – Check!
    2. Improve my Skills in QoS, BGP and MPLS – Sit each of the CCIP exams associated with these subjects as confirmation of understanding of the base knowledge.
    3. Review each major section of the CCIE R&S Blueprint, and read books from the CCIE Recommended Reading list.
    4. Purchase IP Expert (At this stage) self study package and study from the video and audio material
    5. Sit CCIE Written
    6. Continue deeper study of each topic from the Blueprint
    7. Work through practice lab exams from IP Expert and other online sources
    8. Book and Sit Lab exam
    9. Repeat #8 until I pass!

    So does the above plan sound reasonable? Should I attempt the written exam earlier and spend more time focusing on the lab preparation? Are there other resources that you have found worked well for you? Should I alter some of these steps?

    Let me know your thoughts.

    Comments (2)

    Packet Pushers – Making my commute educational!

    During my research into the new world of storage network and all the wonders contained within, I stumbled across a really great podcast – The Packet Pushers Podcast!

    I think the tagline of the series says it all for me “Where too MUCH networking is NEVER enough”.

    Now I’m a network geek and this has provided something I have been looking for – A networking specific podcast, that is entertaining, informative, not specifically vendor biased. Sure there is a tendancy to be Cisco-centric when dealing with networking, but there is a line between discussing discussing networking at a technical depth and being a vendor bigot. This team has pulled that off quite well.

    The range of guest speakers/hosts they have brought in already also adds to the quality of discussions.

    With my 200km round trip commute everyday, having another great podcast series to listen to is certainly a welcome addition!

    PS. Hearing an Australian accent saying “rooters” is still humorous – Im used to it from English/European engineers.

    Leave a Comment