In this episode:
In May 1996, ValuJet Flight 592 crashed into the Florida Everglades six minutes after takeoff from Miami, killing all 110 people on board. Investigators traced the fire to chemical oxygen generators loaded into the forward cargo hold without safety caps on their firing pins. What they could not trace was a single point of failure, because there was not one.
We work through the layered collapse William Langewiesche documented in his landmark 1998 Atlantic article: work orders written in language the mechanics could not parse, safety caps that did not exist anywhere in the shop, paperwork signed off on work that was never done, a shipping clerk who put quotation marks around the word "Empty" on the manifest, and a copilot who recognized what he was looking at and said nothing. Drawing on Charles Perrow's theory of the normal accident and Diane Vaughan's concept of the normalization of deviance, we examine how the same mechanism that produced Challenger produced this, and where the two failures diverge.
The deeper question the episode keeps returning to is one Langewiesche raises and does not fully resolve: if the failure emerged from the gaps between organizations rather than from within any one of them, and if adding more procedure to a system can increase the complexity that makes these accidents possible, what actually closes the gap between what the paperwork says and what happened on the floor?
Links from the discussion:
"The Lessons of ValuJet 592" by William Langewiesche, The Atlantic (1998):
https://www.theatlantic.com/past/docs/issues/98mar/valujet1.htm
Normal Accidents: Living With High-Risk Technologies by Charles Perrow: https://press.princeton.edu/books/paperback/9780691004129/normal-accidents
The Limits of Safety: Organizations, Accidents, and Nuclear Weapons by Scott Sagan: https://press.princeton.edu/books/paperback/9780691021010/the-limits-of-safety
The Challenger Launch Decision by Diane Vaughan:
https://press.uchicago.edu/ucp/books/book/chicago/C/bo22781921.html
Transcript:
Welcome to Go/No-Go. I'm Jon Bruner.
And I'm Alex Hao.
This is a podcast about manufacturing, engineering, design, and the calls that can make or break great products. In this episode we're doing one of our recurring segments called Reconstructions, where we look back at an event in the history of engineering or manufacturing and product development that changed the approaches to those fields forever. Today's topic is ValuJet Flight 592 — an aviation disaster that's reverberated for decades, and that above its technical lessons teaches us a lot about organizational dynamics and technical communication. Before we get into that, a brief reminder that Go/No-Go is brought to you by Lumafield, which makes a manufacturing intelligence platform that gives engineers total visibility into their processes and complete confidence in the products that they ship. Alex, did you ever fly on ValuJet?
No. I had never heard of ValuJet before I read this article.
I flew on ValuJet. I was about ten years old and I was being sent to Space Camp in Florida. My parents drove my brother and me to the airport outside of Washington and put us on a ValuJet flight to Orlando. I don't remember much detail about the flight itself, but the airplane had this silly, friendly cartoon airplane on the side, which I thought was cute. And it was just very casual — in an era when you still had to walk up to the check-in desk and hand over a paper voucher, with a lot of stapling and carbon printing and dot matrix output from the computer, the ValuJet check-in involved handing over your voucher and getting not a real boarding pass back, but a laminated card that was taken from you at the door of the airplane and then reused.
That's really hard to imagine.
It was a very odd thing. And it comes up periodically when ValuJet is discussed. ValuJet was a pioneer in the discount airline market. It existed for only a few years in the 1990s and was really seen at the time as pushing the limits of how you could strip costs out of the airline business. It eventually merged with another airline called AirTran and renamed itself AirTran — I have also flown on AirTran. That was around for quite a bit longer. It was finally acquired by Southwest in 2011. So the legacy of ValuJet is in some ways in Southwest Airlines.
That makes sense.
So let's talk about what happened.
We're going to talk specifically about ValuJet Flight 592, and this is really based on an extensive piece by William Langewiesche, who I know you and I are both big fans of.
Langewiesche stans.
Yeah. And you were saying earlier this was the first Langewiesche piece you read.
That's right. My dad's a retired business school professor, and he recommended it to me as an important reflection on organizational dynamics and how standards can be upheld or not across large groups of people. I probably read it for the first time in my early 20s — that's how I became aware of Langewiesche.
That's such a big theme throughout all of his work. The first piece of his I read was on the MH370 disappearance.
And you got hooked.
Yeah. So ValuJet 592. It was May 11, 1996. Langewiesche's article opens with a fisherman seeing an enormous plane crashing into the Florida Everglades. Very quickly that sets off a large investigation to figure out why this occurred. The flight had only been airborne for about six minutes out of Miami before it crashed. Investigators were able to identify very quickly that what brought the plane down was a fire — and not an electrical fire or a problem with the plane itself, but oxygen generators that were in the hold. They had been removed from certain planes that ValuJet was replacing that component on, and they were combustible. That led to the toxic smoke that ultimately incapacitated everyone on board. The question then became: how did this happen? How did obviously hazardous cargo end up on a passenger flight?
The airline industry has been really aggressive, along with its regulators, at eliminating the obvious technical causes of disasters — things like metal fatigue and pilot error in different kinds of weather. The way these oxygen generators got on the plane is illustrative of a different type of failure.
Langewiesche lays out three kinds of airplane accidents. The most common ones are procedural — you make a single mistake, and you can understand what the mistake was and how it led to that outcome. The second kind is engineered: a wiring fault in the plane, or a fatigue element somewhere, like missing screws in a wing. This particular accident represents a system accident. Many, many things had to happen and had to go wrong for it to ultimately lead to this outcome.
Langewiesche lays out exactly what happened in order, and it's a lot of different things, none of which in itself seems catastrophic. Each of these, in isolation, would not have caused this plane to go down. They were all highly routine, mundane moments in the lives of the people who bore some responsibility for the accident.
An author Langewiesche loves to quote across all of his work is Charles Perrow, who wrote Normal Accidents. It's about normal accident theory — essentially, that as systems become increasingly complex, accidents are inevitable because there are so many tightly intercoupled parts. Perrow's key observation is that generally, things that can go wrong will go right, and that lulls people into a false sense of security. That's what happened with the Challenger, where the O-ring had flown and gotten damaged so many times that engineers became complacent because it hadn't failed, so probably wouldn't fail next time either.
Let's walk through what actually happened. It involves oxygen generators, which are a little different than oxygen canisters. Either can be used in airplanes — above the ceiling of the passenger compartment, connected to those masks that drop in an emergency. At the time of the crash, ValuJet's fleet was mostly made up of DC-9s, which even in the mid-90s were very old airplanes — a model manufactured between 1965 and 1982, so at least 14 years old when ValuJet was operating them, and many quite a bit older. ValuJet had recently bought a handful of newer MD-80s and was having them refurbished by a contractor called SabreTech at Miami's airport. As part of the refurbishment, the oxygen generators attached to the emergency oxygen masks needed to be replaced — they had expired. The way these generators work is that they have a solid chemical inside which is ignited by a small explosive when you tug down on the emergency oxygen mask, and that sets off a reaction that generates oxygen along with quite a lot of heat. When installed properly, the heat is ventilated off. But if they're not handled properly, they can generate enormous heat and set off a more problematic chain reaction.
When they're used and installed correctly, the outsides of those canisters reach 500 degrees. If they're in a box with a bunch of other canisters, they get even hotter. ValuJet had given instructions to SabreTech that they needed to put plastic safety caps on these canisters to prevent the firing pins from going off. But a couple of issues arose. First, there was the verbal nuance between "expired" and "expended" — all of them were generally expired, some were expended but not all, some were expended and not expired, some were both. And you have this very low-cost workforce, where SabreTech even had contractors of their own, everyone in a big rush without necessarily time to carefully parse the nuances of the language. The other problem was that SabreTech wasn't provided with the safety caps. The instructions said to put the cap on, but there were no caps. So rather than thinking too hard about it, they were removing the lanyards — the ones you traditionally pull to set off the oxygen in the plane — thinking that would prevent the generators from going off. But that wasn't true.
That doesn't prevent them from being jostled and fired that way. As you suggest, a lot of people in this process maybe could have read the technical fine print — although Langewiesche quotes quite a bit of it, and it is truly impenetrable. The technical manuals and work orders are very densely written in technically correct but very difficult language. And even if you do read them, you're still not incentivized to solve a particular technical problem. You're just being pushed to get to the point where you can pencil-whip the process — sign off in a cursory way on the forms and make the problem someone else's.
Another thing he mentions is that these containers should have had more aggressive labeling about the fire hazard. But there were so many things in that workspace labeled as hazardous that you become desensitized to it. Everything is a hazard. So you see that something is flammable, and the result is maybe just to move the box a little further away from the planes rather than to really think through the severity of that particular hazard.
And to your point about things that can go wrong usually going right — if you're one of these mechanics or technicians or shipping agents, you've probably bent a rule before and it's been fine. Maybe you've even been applauded for moving quickly and getting past a problem.
The SabreTech technicians put all these oxygen generators in boxes — five boxes — and left them on the floor. But SabreTech had another customer coming in for a tour, so the facilities managers were told to clean up so they'd be ready for the visit. The boxes were moved to the shipping area because that was where ValuJet's stuff was generally held. And of course, because it's in a shipping area, the employees there naturally assumed it needed to be shipped.
That's what you do if you're a shipping agent. You ship things that are in boxes.
Exactly. If you receive boxes without clear labels, you see they belong to ValuJet, and you're meant to get rid of the stuff — naturally your instinct is to return it to ValuJet in Atlanta.
Something that had been done before, and was always fine.
And this is another person who would need to navigate that linguistic sensitivity around "expired" and "expended," never mind the fact that things may have been mislabeled within the boxes themselves.
The technicians who packed the boxes had used inconsistent labeling as to what was in them. They also used green tags on the generators, which in other uses convey that a part is out of service or needs maintenance — not that it's hazardous or needs to be disposed of.
And naturally it would make sense for the shipping clerk to assume that if someone puts something in shipping, it is safe to ship. Everyone assumes someone else is making that critical safety decision. So they prepare the shipment. And there were many points where this could have been caught. "Oxygen canisters" was written on the label, even if the question of whether they were expired or expended wasn't accurate. The people loading the plane, as well as one of the pilots who discussed it with the person loading the boxes, should have noticed and made the call that it wasn't appropriate.
Ultimately the co-pilot had a discussion with the ramp agent about putting these boxes on the plane and quietly signed off on it. This is a workforce that even at the level of the pilots had been through a lot. Many of the pilots at ValuJet were refugees from the collapse of the regulated-era airlines — Pan Am, National Airlines, Eastern Airlines — big, well-compensated legacy carriers that had gone under in the late 80s and early 90s. A lot of their staff had moved on to other airlines where they were working longer hours with less training and less oversight. These pilots were under a lot of pressure and probably not incentivized to raise a flag and say, hey, we should talk to a supervisor about whether these generators are safe to go on a plane. You just want to keep things moving.
We know the oxygen generators were responsible for the fire. But it becomes less clear whose fault it is, who to blame. You can blame all the people who signed off along the way, but they were just doing what was expected of them. It really is more of a systematic series of values and priorities that leads to this outcome rather than one particular mistake or act of malice. Have you read Normal Accidents by Charles Perrow?
No, I haven't. Have you?
It's one of my favorite books.
Tell me about it.
All of the books Langewiesche cites in this article — Normal Accidents by Charles Perrow, The Limits of Safety by Scott Sagan, and the Diane Vaughan book we also referenced in the Challenger episode — I read all of them because of this article.
Really?
I bought a used copy of The Limits of Safety by Scott Sagan on Amazon. It was signed by him. It had been gifted to someone and I guess that person sold it.
Now it's yours. It's an artifact. Does Normal Accidents have any prescriptions in it, or is Perrow fatalistic — that it's impossible for an organization like an airline to ever truly be free of accidents?
I wouldn't say it's fatalistic. The key point Perrow makes is that you have some systems that are highly complex and tightly coupled — which means when something is going wrong, the operators have no way of understanding what it is or responding quickly. Airplanes are one example, nuclear weapons another. There are so many elements, and you can't keep track of all of them, and the cadence at which things happen is so quick that accidents become somewhat inevitable. But he's not super negative about it. One thing he points out that becomes really relevant here is that safety devices often actually create more problems. At Chernobyl and Three Mile Island, it was safety devices that failed and led to those accidents, because they become one more level of complexity in a system that needs to be monitored but can't be effectively, and you don't expect the backup to go wrong. That ultimately makes the system more incomprehensible and less preventable. Even in ValuJet — the oxygen generators themselves are a safety device that then killed 110 people.
It's especially difficult to maintain a culture of safety in an environment like the modern American aviation system, which is exceptionally safe. In the 1960s and 70s, when airline accidents were quite frequent, you had a constant feedback loop. You could actually measure whether things were making the system safer because there was real variation — after you addressed a technical issue or revised pilot training, you saw outcomes. Plus the participants in the system were constantly reminded of how dangerous aviation could be. There have been a few accidents on regional jets in the US in the last few years, including one near National Airport outside Washington last year. But the last fatal accident on an American mainline carrier was in 2001 — American Airlines Flight 587, which crashed in Long Island in November of that year. It's been almost 25 years since a full-size American mainline jet has had a fatal crash. That's very hard to use as a feedback loop, and it can create complacency.
And even when things are going wrong in a minor way, it can be hard to understand where exactly that's coming from. The instinct after any kind of crash is to ask what you can add, what additional regulations or tools can make things even safer. But you have to be conscious that can actually create more complexity and make things a little less safe in the long run if those measures aren't implemented well or aren't understood, or are simply overwhelming for the people operating the aircraft or running another complex piece of technology.
To your point about safety systems contributing to danger — this is a theme in Langewiesche's write-up of the ValuJet incident. If you have a system that requires paperwork as an outcome of work, you will get paperwork, and you might not actually get the things the paperwork is meant to enforce. This accident involved a number of people who were responsible for some part of the process, truly name-on-the-line, and they were asked to sign paperwork meant to enforce that they'd do the right thing. But in fact all it enforced was that they signed the paperwork. It's a pernicious feature of safety paperwork that people are tempted to just fly through it. Checklists are notoriously bad at managing processes because the more checklists you have, the more tiresome they become, and the more likely people are to just fill out the checklist rather than actually do what they're supposed to do.
The phrase Langewiesche uses for checklist compliance is pencil-whipping. He writes: "Good old-fashioned pencil-whipping is perhaps the most widespread form of Vaughan's normalization of deviance. The falsification they committed was part of a larger deception — the creation of an entire pretend reality that includes unworkable chains of command, unlearnable training programs, unreadable manuals, and the fiction of regulations, checks, and controls."
Something interesting to reflect on after reading an account written in 1998 about an airline accident in 1996: what's changed since then? Is this still how the industrial world works?
I don't know if you've ever seen that show Air Disasters.
When was it on? I stopped watching TV ten years ago.
There are like 24 seasons. It's one of those things that's always on — dramatic reenactments of air disasters, then they walk through the investigative process and resolve the cause. I just always end up watching these. You can put it on in the background all weekend. And it's pretty clear there have been incidents before and since.
Another example that comes to mind is American Airlines Flight 191, the crash in 1979 caused by one of the engines basically falling off the plane in flight. American Airlines had developed a shortcut for maintaining the engines on that plane. When you remove an engine for maintenance, you're supposed to position a specialized lift under it, detach the engine from its pylon, then detach the pylon from the wing separately, and work through it piece by piece. The technique American Airlines had developed used a forklift braced under the jet engine to remove the engine and pylon together. Because you're not using a specialized lift and doing this in stages, the whole thing depends on the forklift operator positioning exactly right. If it moves slightly during removal, you wrench the engine around and strain the connection between the engine, the pylon, and the wing. That's what happened — it created metal fatigue in the pylon and its mating with the wing, and when the plane took off and stress was applied, the engine dropped off. Shortcuts throughout the system. It's very difficult to build a cultural aversion to them.
Especially if they work for a long time — you've kind of proven them through experience.
That's right. To come back to the question of checklists — my hope is that in the modern era, as things become more digital, it becomes easier to actually inspect products and processes and have a source of truth outside the checklist. Computer vision, X-ray technology, AI to do quick analysis and understand whether work was done correctly — a digital backstop to what human workers will inevitably hedge on.
You think back to the paper manual those technicians were faced with — they didn't have a Ctrl+F to search for keywords about oxygen generators.
Exactly. They probably flipped through and shrugged and gave up. More importantly, there was no impersonal way of knowing whether the work was done correctly. No computer system able to detect an inconsistency in the way work was done. Just a hierarchy of technicians and managers prone to pencil-whipping.
Manufacturing is a great example of a process that's highly complex, but maybe less tightly coupled than aviation. You're able to stop the line and identify things at different points. But that's exactly why there's a real opportunity to focus on quality and safety there — so that by the time something gets into a tightly coupled system where things could go catastrophically wrong, you've hedged your bets beforehand.
And this is where X-rays are a promising way to look inside things as they move through factories. A lot of industrial inspection is superficial — you're looking for a second-order sign that something's been done right. The cap has been placed on the bottle and rotated far enough to suggest it's been screwed on properly. But you don't actually know if it's been screwed on properly. Newer digital tools starting to roll out in factories are promising as a way to actually know that something is correct. The question is whether any of this can be translated back to airlines, where you're talking about enormous pieces of capital equipment, thousands of workers at different levels of training and motivation, and a lot of pressure to get things out on time without disruption.
Something I've learned from watching too many hours of Air Disasters — how many things come from using the wrong size screw, or wiring something backwards and just resetting it a couple of times. It works the first two times, and then the third time it catches fire. Or covering up a weld area with a panel so no one else can inspect it and see that the weld is deficient.
Just trying to avoid something being your problem any longer. Even in this ValuJet accident, Langewiesche mentions that there had been a couple of electrical problems on the plane that same day. Maintenance overcame them by switching the breaker a couple of times, and it held after being switched back and forth. Good enough. Because the breaker had been faulty at a very mild level — nothing catastrophic — the pilots are heard on the voice recorder saying the breaker's acting up again. They were lulled into a false sense of security by knowing that this earlier problem had been there and had been overcome, but not very convincingly.
We're both Langewiesche fans. Do you still fly on discount airlines now that you're buying your own tickets and it's not your parents buying them for you?
I have to ask my parents whether they remember sending me off on that ValuJet flight. I did fly on AirTran a few times after they merged, when I was in college. I do fly on discount carriers sometimes. What about you?
I don't really. My stepdad was actually in a plane crash in the 80s.
No kidding?
He was on World Airways Flight 30 that crashed in Boston. It was going from Oakland, landed in Newark, then Newark to Boston, which is where he was based. The runway was super icy. Fortunately the pilots were able to identify there was a problem and steer the plane to avoid hitting a lighting tower at the end of the runway, because that would have probably killed everybody. They managed to turn the plane and it went off into the water — Logan's right on the water — and the whole front cockpit area cracked off.
Whoa.
At first they thought no one had died. Unfortunately there were two people who hadn't been on the manifest, who had been in the first passenger rows, who disappeared and were never recovered. But it's a really interesting story, even just the casualness of how he describes it. He was a wine salesperson, so he said first the flight attendants directed everyone off the plane on the wrong side — the plane is in the water, but they directed people into the water instead of onto the dry side. A lot of people got soaking wet and had to get back on the plane. He opened the overhead bin and grabbed his case of wine. Then they went off on the correct side of the plane, and he slipped on the icy runway and dropped his wine. My brother and I always ask him: did anyone compensate you? He says no, they didn't even get a voucher.
They didn't get like 5,000 World Airways points?
Nothing at all.
Was World Airways a discount carrier?
I don't think so. It was 1982 — I don't think it was a particularly discount airline.
That might have been the regulated era still, or the very end of it. So that would have been the newer configuration of airlines, post-deregulation.
It was interesting because he flew a lot of small planes — he was selling wine between the Cape and the islands, so he took a lot of tiny planes, and he said some of those were pretty scary. But the plane crash he was in was actually on a major airline in a huge plane in Boston.
Growing up in a smaller city in Virginia, I've also spent a lot of time on commuter planes with one or two seats on each side of the aisle and propellers. The operations of those commuter airlines can feel quite casual. The same employee is doing the check-in and loading the baggage and holding the wands to direct the pilot. Things may or may not take off within 30 minutes of what the screen shows. You don't expect that level of looseness at a mainline.
Every time he and my mom get on a small plane, he says, don't worry, I've already been in a plane crash — so we should be fine. Statistically it won't happen again.
Part of me wonders why people keep founding airlines. It is a legendarily unprofitable industry. Warren Buffett famously said that cumulatively across the entire existence of the US airline industry, from the Wright Brothers' first flight to today, it has lost money. The major airlines have been profitable for the last several years, but they've gone through decades of just hemorrhaging money. You have to raise so much to buy airplanes and secure routings and airport access. Most of them go under. But there are some habitual discount airline founders — David Neeleman, the founder of JetBlue, just can't stop himself. He founded one called Breeze Airways that's starting to expand. It's just not a very profitable business, and people are drawn to it by something other than the profitability — the romance and the fun of building something.
And now there are all these headlines about oil prices really hitting the airlines as well.
The airline business is structurally set up to be brutally competitive and not very profitable. You have a business with extremely high fixed costs — a lot of which is debt service on loans to buy passenger airliners that might cost a couple hundred million dollars apiece for a big widebody — and then very low marginal costs. The cost of letting one more person onto a plane that's already flying somewhere is very, very low. In a competitive environment you're always tempted to drop your prices to attract that one more fare-paying passenger, because after the plane takes off, your product has spoiled. There's nothing more you can do with that seat on that flight. The airline industry has competed itself out of profitability by offering this largely commodity service with high fixed costs and very low marginal costs, tending to compete each other down to the point of charging marginal cost for a ticket. Recently the airlines have wound up with less inventory after Covid, which has helped them push prices back up and achieve some profitability.
Another theme throughout a lot of Langewiesche's articles on planes is that companies may say safety is their priority, but their priority is trying to make money because it's a business. Safety is obviously important to be able to continue running — no one's going to fly on your planes if they're not safe. But it's finding that balance of what's acceptable, because there is so much margin pressure.
This ValuJet accident raised for some people a concern about how the FAA had a dual mandate at the time to both enforce air safety and promote the airline industry. Congress made a gesture at removing the promotion mandate. But something Langewiesche points out is that you need a regulator to enforce certain safety standards on the airlines. As much as airlines have it as a priority not to have a fatal crash because it's really terrible for their business, you can't compete with airlines that are cutting a little bit more out of their safety budget — that forces you to do the same. The airlines actually benefit from having a regulator that enforces a cost floor on safety measures. If listeners want to find the William Langewiesche article we've been discussing, we're going to link to it in the show notes along with all the books. The article is a classic — available on a very odd, archived, dusty corner of the Atlantic's website, a journey back into what the internet felt like in 1998. William Langewiesche died unfortunately just last year. His obituary in the New York Times is titled "The Steve McQueen of Journalism," which is pretty epic. As a former journalist myself, I was devastated to learn that title is taken. Of the books we've discussed, which is your favorite?
Normal Accidents is still my favorite. It's really the foundation of the others. If you want more of a story-based book, Scott Sagan's book has a lot of interesting material about our nuclear arsenal and all the times we almost got into nuclear war. It's a great, more realistic implementation of the theories laid out in Perrow's Normal Accidents.
Would you say you've applied learnings from any of those to your daily life? Do you get on a BART train in the morning and think, I hope the complex organization made up of the train driver and dispatchers and police holds it together?
Just today. I don't think it makes me nervous, but it makes me really appreciate the things we work on — being part of a system and playing a role in that kind of industrial and technological complex.
Well put. I should read that book. We'll wrap up this episode of Go/No-Go Reconstructions there. Be sure to tune in to our next episode, which will feature headlines from the worlds of manufacturing, engineering, and product development. Go/No-Go is brought to you by Lumafield, which makes a manufacturing intelligence platform — we use X-ray CT scans and artificial intelligence to give you complete confidence in the products you're shipping. You can learn more about Lumafield and see many of the X-ray CT scans we've done over the years at lumafield.com. And if you want to reach Alex and me, you can find us at gng@lumafield.com — we love getting your letters and we do reply. If you enjoyed this episode, leave us a review or a star rating on the podcast app of your choice. For Go/No-Go, I'm Jon Bruner.
And I'm Alex Hao.
Go/No-Go is brought to you by Lumafield, which was founded to upgrade manufacturing. You can learn more at lumafield.com. Go/No-Go is produced by Austin Carder and edited by Brian Tran, with additional assistance from Eric Petralia.
Articles Featured in the Episode
