Monday 14 December 2009

Something are just not worth releasing

I am a bit of a fan of film. If I had seen this either in the cinema or on television I am not really sure of my reaction beyond sheer horror!

Video is via this link (youtube).

This is possibly the worst example of modern special effects I have seen to date.

I think I would not be able to show my face to my peers if I did something like this in my profession.

Thursday 8 October 2009

Communicate! For the Love of Sales Communicate!

Imagine.

You are pitching for a multi million dollor/pound/euro deal. You are in the boardroom of your customer. The CEO, CFO and various other boardmembers are in attendance.

You turn on your laptop to start your presentation. You insert your tumbdrive with the presentation on it.

Access denied.

I have to admit to being thankfull that it was not me in that situation.

Unfortunately I took the call from the poor bugger though.

That morning I had been politely informed from the corporate security team that a new security tool had already been pushed to all laptops which locked down functionality in terms of USB/CD-ROM/DVD access.

Salesguy had been in the office and logged on and had the tool had been pushed to his laptop before he went to the customers site.

Bad timing.

So along with this persons issue I had calls that swamped my service desk from all sorts of people who needed access to these devices due to their work.

A simple matter of communication letting us know that this was happening would not have resulted in the loss of a rather large sale.

Using the principles of Release and Change Management would certainly assist in reducing these issues. Needless to say the actual tool was buggy and we had an awful lot of work that needed to be done to recover. Luckily I had a great ICT support team and we managed to get through the issue but not without an awful lot of pain. I mean A LOT OF PAIN.

Its not fun getting chewed out for something that you had no idea about and little to no control over.

Communication is vital.

Wednesday 7 October 2009

Hello? Anyone There? We Heard There Was A Riot!

How to deal with the unexpected.

We get a trouble ticket stating that a remote site in a third world country had no network connectivity. This was reported to us from an automated system rather than a human.

So we start the usual investigations - ping and trace routes.

Always timed out with the last hop being a local ISP. Very odd.

We tried to get in touch with relevant persons on site and at the ISP. Absolutely no joy. What could we do but wait for someone to arrive on site and contact us (we left numerous messages and emails).

During our lunch break one of the lads working with me had a news aggregator and the city where our site was located appeared. Large scale riots reported.

Madness, Mayhem & Anarchy apparantly.

So we knew that it was going to be a very long day waiting for someone to arrive on site. Eventually (I think possibly the following morning) we get a message from the site manager.

"Unable to log on to network, office broken into and damaged"

We then activate our local external engineer who arrives on site and calls us back.

Engineer "I found the problem, the network is gone"

Me "Ok...just reboot the switches and servers and lets see where we are"

Engineer "That could be a problem"

Me "?"

Engineer "Well...unless we can get data across two tins cans a long length of string I am going to need alot of CAT5"

Me "?"

Engineer "Yes...the robbers took all the copper cabling from the building"

Me "!"

Sometimes....things happen you have no control over.

Tuesday 6 October 2009

Dr No's Secret Lair!

I have worked in many different places and environments but this...this is awesome.

Link to the Lair of Dr No.


Yes. I want one. I don't know what I would do with one (well actually I do but for the sake of a cheap joke lets ignore this part) if I had one. Maybe take over the world ala Pinky and the Brain?

Jokes aside I wonder how many other shelters are available and in a position to be build out as data centres. Certainly this facility would have many inherent advantages for high availability data centres that you would not find elsewhere. One downside I can think of is the usual remoteness of these places.

I would also find it hard to resist to play the role of Q in such a place!

Tuesday 1 September 2009

Help! I Need Somebody!

Apologies for the lack of posts recently. I have been interviewing and researching for upcoming interviews.

Which (oddly enough!) leads me to a story when I was a manager hiring a new support engineer. I am a pretty plain speaking kind of guy and when I interview potential new team members I make it clear what the work is like.

When working in support you need the right kind of people who can handle dealing with problems and dealing with people with problems on a constant basis and still be professional and pleasant. Its not easy and not for everyone. You need the technical skills, the ability to communicate with non technical people and of course able to deal with stress. I prefer good humour to combat stress but of course tech humour can be quite dark. I think spending most of your working life dealing with problems lends itself to a certain jaundiced view of life.

So one of the things I make quite clear is the expectation of having to deal with stress and problems. How do you (as the candidate) deal with stress and the like.

Well I was interviewing one person who mentioned that while for the most part they are able to deal with stress they do have a breaking point and when that point is reached, chances are that this candidate will resort to some form of violence.

I don't think I have ever really been speechless before...Still on the plus side the direct questions seem to have uncovered this candidates violent tendencies.

Lessons learnt - when interviewing
  • Be Honest
  • Be Truthful
  • Be Direct
  • Set Expectations
I have found that following these four simple concepts I have managed to recruit some top notch support engineers during my career so far.

Be Honest - if its an intensive role and fraught with stress...tell the candidate. I'd rather have someone think about this and reject the role than join and find out 6 months down the line that the job is not for them.

Be Truthful - there is nothing actually wrong with admitting that there are problems as long as you are looking to correct them. In this case I would be truthful to the candidate and make clear what the candidate brings to the role and business as well as what challenges the candidate and team face.

Be Direct - too much time and effort is wasted with running around in circles. I much prefer to be direct and give concrete reasons or answers to candidate questions. The interview is not only for me as hiring manager to make sure a candidate is suitable but also for the candidate to acertain whether the role or the team is for them.

Set Expectations - This is really important as far as I am concerned. Of course this needs to be in line with the needs of the team and business which I think the vast majority of candidates understand. Especially in today's world of fluctuating economics. Having said that I do find that being able to tell, if not specifically then generally, what the expectations are from me (hiring manager/team manager) and the business.

I'm not a professional interviewer and I am sure that some of my techniques give professional interviewers the heebee jeebee's but I have had a reasonable success rate. I know team members I have recruited appreciate this.

Monday 24 August 2009

Dammit Jim! I'm a Hardware Engineer not a miracle worker!

I really enjoy working with people from different cultures and countries. Not only does it widen my horizons but also affords the opportunity to meet new people with very different points of view.

However there are some truths that are universal.

I was dealing with a desktop machine that had lost its networking capability. A pretty easy fix...just need to install a new step card and install the drivers. This card was not on the NT4 HCL (hardware compatibility list) and therefore the device drivers had to be installed by hand.

As I was several thousands of miles distant from the machine I needed to engage with the local hardware engineers. So I call them and state what the issue was and what I needed to have done. Which was really to install the step card and put a floppy (install disk) in the disk drive.

They were more than happy to do the install. No problems. Until I came to the part about the floppy.

'Sorry but we are hardware engineers. We don't 'do' software'.

Picking myself up off the ground after hearing this, I replied that that was a very interesting answer and that while yes indeed they were hardware engineers driver installs are a part and parcel of the job.

Much too-ing and fro-ing ensued regarding this. The engineers were resolute that they should not be installing any software or middleware or any other kind of ware.

Eventually the end user herself installed the entire thing including drivers in about ten minutes.

Last I heard there were two hardware engineers looking for work. They did not realise that the machine they needed to fix was actually their bosses. Oppps!

Lessons learnt – well for the hardware engineers I'd say the top one is to find out who has a problem before playing games.

Tuesday 18 August 2009

Plug the Hole! The Birds! The Birds! The Horror!

When you are looking to build out a Data Centre there are usually a few options one should consider from the start.

Are you looking to build a brand new facility or looking for a existing building? This is a good start and as with most things one needs to understand the impact of that decision. In this post I am looking more at pre-existing facilities.

There are many factors to consider when looking at a possible Data Centre facility. Location, security, size, access, utilities, wildlife...all must be considered.

Some very brief examples (I know there is much more but for the sake of my fingers I will jump to the chase)
  • Location - this comes with a price tag. Central London is expensive but siting your DC out in the middle of nowhere also has costs.
  • Security - is it located near an airport? What is the surrounding area like? You don't really want to place your DC in a high crime area.
  • Size - This is a good one. Can your Data Centre withstand growth if needed? Is there space?
  • Access - No bloody stairs!* How easy is it for kit to be dropped off, staged and rollout into the actual machine room?
  • Utilities - Does the area suffer from brown/black outs? Who do you share your mains power with? How old and how well maintained are local water services?
  • Wildlife - When buildout has completed make sure there are no creatures left behind.
Ok yes the last one is slightly odd I admit. However there is a story about this. At the time not amusing for all concerned but as with many harshly learnt lessons, time lends a hand to soften the blow.

I was visiting a DC when there was a commotion in the server room. I noticed a couple of engineers running past the window into the server room. I looked at my host, the DC Manager and the DC Director looked back at me with looks of 'no...we are not crazy here!'. Obviously I don't have much of a poker face!

So DC Manager rushes into the server room while Director and I make idle chat. About 5 minutes later of watching the activities in the server room and making small chat the Director goes into the server room.

The next sight was one I really would never have thought I would see. There was the DC Director using their jacket to catch something. So I amble closer to the window and peer in.

Birds. Well pigeons. Flying rats. Two of them had somehow managed to get in this room. That is supposed to be sealed. So I was given a top class show on how geeks catch birds and how managers deal with the supervising aspect of such a project. I was pretty floored and counted my lucky stars that I was not involved. Mainly as I am sure I'd have flapped around like a headless chicken as well. Still there was no way I was going to leave!

I think eventually the birds got tired and the good Data Centre folks caught them and released them outside. Then they spent a goodly time trying to figure out what happened. While they were poking around the server room one of the engineers spotted a pigeons head poking through from the outside wall.

Seems that there was a smallish hole (you would not have seen it unless you looked straight at it) that these birds were using to get into the server room. Yes...they eventually found the nest as well.

Lessons learned here are to really make sure that your site is actually secure. That includes the facility's integrity. This means that while the buildout of the server room is being done that there are checks to ensure that there are no holes.

* http://tftsr.blogspot.com/2009/08/you-gotta-be-strong-in-it.html <--- I hate stairs. No really.

Sunday 16 August 2009

You Gotta Be Strong In IT

Bingo. A national pastime for many people. I was tasked to help roll out a bunch of Domino servers across the UK. In reality I was there as muscle.

It was a technology re-fresh meaning all brand new kit. Including server racks. Which tend to be quite weighty. Make that very weighty. Now one would think that that ought not to be a problem as they do have wheels/coasters. And you would be quite correct.

However many of this organisations Bingo halls were located in Victorian era buildings. No elevators but lots and lots of stairs. Sometimes narrow but always steep. Not fun with only two people. The Domino server install engineer and myself. Luckily my partner was built like a gorilla (and a top notch techie to boot!) and I'm not exactly a light weight either. Still these things were a real struggle and a real pain to carry.

So one day we arrive at a new site and do a quick very cursory survey and realise with heavy sinking hearts that the server room is not only on the top fifth floor but also that it may as well have been the attic. We didn't look into the room as the manager was not on site yet - as usual the cleaners had let us in.

It took us an entire day just to get the rack up there. The real killer was the last spiral stair case. At least three times I faced a squishing and my colleague a hernia or a snapped back.

So we finally get the thing onto the top floor next to the room. We left then and there for a well deserved beer or two and dinner at our hotel.

We arrive back on site and carry up the rest of the kit - hub, switch, two servers and a tape unit plus cable. On the first trip up the manager is waiting for us by the rack we'd carried up.

He quizzically asks "what is this thing?"

I reply that its the replacement rack. For some reason I had a not so good feeling about this.

He turns to the door to the server room and says "Well I've never seen one of those! Must have been fun carrying up those stairs though..."

The door opens and yes. There was the old server, switch and hub...sitting on a table.

Thursday 13 August 2009

Ten Thousand Fatal Alerts

Sometimes, even with the best will and intention, things just go wrong.

One night when I was working in what is essence was a NOC our Tivoli (which I do really rather like!) tool started to generate some fatal traps. Normally these we would delete them as they were false positives.

So after deleting these traps (about 5 I seem to remember) I went off to make a coffee...well it was about 1am!

I get back to my workstation some ten minutes later or so and my console was full of these alerts and it was rising! Within a half hour or so I had something like ten thousand of these alerts. And not all were false positives.

You know that sinking feeling you get when you know that things are really not going well at all? Well I had that feeling in spades. I really had little idea of how to tackle this (not being a Tivoli expert) so I called and woke up my team lead.

15 minutes later he's in the office and gets cracking on solving the problem. By the end of the night the system had generated multiple tens of thousands of these alerts. Amazingly enough my team lead managed to resolve the issue (bad config somewhere in the infrastructure) by about half two. He then hunkered down under his desk just in case it came back.

I know this is not a particulary amusing incident but there is lesson and a positive one - getting the right kind of team lead is important. Not only for being able to lead the team but also being the technical lead.

Top marks for this team lead!

Wednesday 12 August 2009

You Need A Larger Mouse Mat!

I once worked on a project that quite possibly was one of the best run change/transformation projects I have seen and had the pleasure to work on.

I'll not go into the massively well designed infrastructure and the other technical stuff; suffice it to say it brought tears to this ICT support vetran's eyes.

One of the more interesting areas was the engagement of the user community and how the change would impact them. This is one of the areas that can make or break a project and frankly is not an easy part of a project to manage. However with planning as well as understanding the needs of the user community you can do much to limit any user type issues. In this case an intensive 3 week IT traning course for the users.

However there are always some...

A colleague had a call from user who complained that the mouse would not move across the screen properly. So the usual diagnostics...is the mouse cable plugged all the way in, is there any slack of give in the cable...all questions we have either asked or been asked.

In this case everything was fine so we requested a reboot of the machine. Of course this did not solve the problem. So we engaged an on site engineer to have a look with the user.

We paid little heed to the case until it was closed and we recieved the resolution details from the onsite engineer.

"User requires larger mouse mat".

In other words the user just moved the mouse to the edge of the mouse mat and then stopped and somehow expected the on screen pointer to move over the required icon.

Lessons learnt - even with the best will in the world you will encounter issues like this.

Monday 10 August 2009

When Production Gets Re-formatted

During one of my shifts working on a outsourced helpdesk (one of many help desks I have had the pleasure to work with) I noticed that I was getting alerts from a remote Oracle Server.

We had a batch utility that basically pinged the servers and if there was a time out we'd get an alert to our desktop to open a helpdesk case as a P1. As the helpdesk agent I would retain the case but engage the various technical resources available.

As we ran a 24/7 operation we had tight deadlines with things like back ups. Every on site (we supported a nationwide financial organisation) server needed to be backed up before business opened. It was preferable that the backups had completed about an hour before that time.

When I was alerted it was, needless to say, about an hour before the business started for the day. As the case developed we realised that the server was completely gone. Without the Oracle server this office was out of action and losing large amounts of money.

We had contacted the on site engineer who was pretty sheepish. Seems that he was building a server and had come in early to start. Of course he started the install correctly . Insert a bootable floppy so he could format the drives and then create the files system and partitions. All via the install disk.

Only problem is that he put the disk in the wrong machine and had forgotten that he had not turned the machine on. So Oracle server gets shut down ungracefully and then auto boots into formatting the drives. Lovely!

The engineer learned from the mistake...I doubt he will ever do something like that again. Mistakes happen. Sure there was a chewing out but last I hear this engineer is very dedicated and working on some very nice projects. Lesson learned.

I learned much from this incident as well. I learned that you cannot remove all risk but you can place processes in place to mitigate as much risk as possible.

・ Do not have staging areas in the server room
・ Use a worksheet detailing not only the work to be done but also all the other info needed (asset tags, user group owner, configuration info)
・ Follow a defined release management process including install sign off, testing sign off and placing into production sign off
・ Any automated system that requires a major change in the system state needs human approval before continuing (No booting up and straight into formatting the drives)

Its all about mitigation!

Friday 7 August 2009

Uses For A Car Jack In A Server Room

When I take on new ICT management roles I usually have a look at the facilities IT infrastructure with a emphasis on the server room or data centre.

This happened on my first day.

One design scheme in data centres uses what we call a raised floor. This meant that there is space of about a foot from the concrete floor to the tiles that rest on a metal framework. Of course the design has an intrinsic flaw and that is how much weight it can handle before collapsing, which could be a bit of a problem. Especially when the weight is from one of those big Airedales* air con unit leaning forwards by about 7 or 10 degrees off centre.

I turned to the facilities guy and quietly say - ok...move back slowly...not a sound. He looks at me and says oh that's ok look and proceeds to walk in front of this beast of a device. I swear I could see the thing swaying. I scanned the trajectory and realised that the Symetrix storage array (several hundreds of thousands of UK pounds worth of kit) was just plumb in the way. More a disaster than a problem.

So I go to my car get my car jack as well as his and we jacked the unit upright. It was a bit iffy to start with I have to admit.

I never did get my jack back!

* not really sure the exact weight but certainly in the region of a ton. And I have to say they really did the air con job well.

Lessons Learnt?

While reading posts on another website I realised that we all have our horror stories of ICT support. So I thought as I have quite a few accounts to relate that I may as well give it a go by using a blog.

Yes. I am a blog virgin. So this should be a wild ride. I will take you to the dizzying heights of bizzare user requests to the lows of some pretty wild asks from businesses. Such as the company that insisted that all mice were asset tagged and logged on a asset register. The fact that the tag actually made the mouse unusable did not seem to be a concern.

You...yes you! You too are a part of this and I would enjoy hearing from you. Please feel free to send me any stories - nfmueller at yahoo. co. uk

Please note - this is not to be used to publicise any grievence towards any organisation or persons by myself or any other participant. This blog is to relate amusing stories that also can be learned from. And as an added bonus technical subjects will crop up here and there:)

Welcome and lets record those lessons!