On January 22, U.S. travelers heard the unwelcome news that all domestic United Airlines flights had been grounded due to a computer glitch. According to Reuters, flights were only grounded for about an hour, but on a Sunday evening, that meant thousands of stranded travelers. CNN reported that the outage was related to the airline’s communication system that transmits information like weight and balance, which pilots need to calculate the proper speed for takeoff. While this data can be calculated by hand, the airline halted flights as a precaution.
Having skilled IT pros on hand to not only select the best equipment options and configure them properly, but also keep them running, is becoming more and more important in every organization. In a recent CompTIA survey of IT pros, 35% responded that the rise of cloud computing has led to greater expectations around uptime.
Operating in an always-connected environment means that service disruptions can have a serious impact. However, casual use of cloud systems may lead executives to believe that high reliability is simply part of modern systems without realizing the serious investment needed to achieve uptime of the 99.999% variety.
We know that ultimately that we cannot protect everything 100% of the time. There are things that happen out of our control. We know we’re going to have issues, so we need to decide, how much of a problem can we tolerate?
Dwight Thomas, Jr., has first-hand experience with making constant uptime a reality. As an industrial network architect for Enbridge Energy, an oil and gas pipeline company, he’s responsible for networking all the devices that operate a pipeline so that they can be controlled remotely.
“We know that ultimately that we cannot protect everything 100% of the time,” Thomas said in a CompTIA webinar last fall. “There are things that happen out of our control. We plan the best that we can, but we look at it practically. We know we’re going to have issues, so we need to decide, how much of a problem can we tolerate?”
Thomas shared the story of an outage he experienced. He got a call in the middle of the night that communication with some of his remote valve sites had been lost. This communication relied on cellular networks due to the extremely remote location of the valves. It turns out the cellular network was down, which meant his valves were not communicating with the control center, so the operator could not monitor how much oil was flowing in and out. In response to that issue, Thomas added a third backup cellular vendor to the two he already had in place so that the chance of a repeated issue decreased even more.
“These are not some of the prettiest times, but they happen,” Thomas said. “You have to be very meticulous in your design and communicate well with your stakeholders and local carriers that you depend upon, the third parties, to get the best service.”
To hear more from Thomas, listen to the on-demand webinar, “Uptime All the Time: Tips, Tricks and Traps for Providing Mission-Critical Services.”