+353 1 9055720 manoj@jumpstart.design

Lessons from Japan’s crippled Fukushima Daiichi nuclear plant

First things first,  The purpose of this blog is NOT to Blame any one for the disaster.

Anyone familiar with problems knows that there were a series of multiple events.  In fact we are quite lucky that things don’t go wrong more often and we know that many failures that happen in systems such as the recent financial crisis is that there is group-think and often unwillingness to listen to dissenting voices.  We depend on things going right, but we should also plan for things going wrong.  The better our planning for disaster, the better our recovery and the smaller the side effects/consequences.

In fact the Japanese people have dealt with the initial earthquake, tsunami and following earthquakes extremely well – Grace under pressure and my admiration to the Japanese People in how the have dealt with adversity.

A good system of governance always builds in scope for dissenting views be it the functioning of a jury system in court or the democratic process. Consensus needs to be open to dissenting views for a truly robust system.

Nuclear disasters can happen with less global consequences.  For example when the initial torpedo blew up inside the  Russian nuclear submarine Kursk, we know that there were a large number of failures (in humans and systems)  but we can be thankful that the reactor shut down safely.  (in fact the lack of power contributed to the death of the remaining sailors who had survived the numerous explosions and fires that followed the initial explosion (at a force of 4.5 on the Richter scale).

One important approach is a management and governance structure that makes it easier for a dissenting voice to be heard.  Groups need strong leadership and a willingness to listen to alternative views.

Black Swan events – ie events that may have a low probability of happening but have catastrophic consequences require a different approach in planning.  In addition when you become significantly dependant on a resources (eg nuclear power) then your planning and risk management needs to be different.  This applies as much to financial systems, nuclear power systems, airplanes or computer and online systems.  In these systems even 5 “9″s (ie 99.999) reliability may not be syficent.

In addition one needs a change in the way we design critical systems

There are two approaches and they can be both complementary. Fail Safe and Safe Fail.

Fail Safe often incorporates a depth in defence approach, where multiple systems in place to take over when a primary system fails.  So if power to the primary system fails, then a secondary system of dieselgenerators take over and if that fails then a battery backup takes over etc.  This approach and its failure eg in the case of Bhopal (where the back ups were seen as “spare” and used for other purposes) means that even though they have “defence in depth” can actually lead to complacency. Now no-one is saying that you should not put as many checks and balances and backup systems as possible but one other additional approach —-

Safe Fail. This approach assumes a failure will happen but when it does fail you can still be safe.  In getting a pilot licence we are taught how to recover from Stalls, Spins and engine failure.  In other words while everything is done to teach you to maintain a plane and train a pilot to fly safely, you all build into your critical and assumption the the worst will happen and you can still recover.

So to avoid systemic failure one has to assume to work of all scenarios, such as when the internet was originally conceived we assumed that a nuclear war would happen.  The choice of distributed vs centralised, chain of command, depth of deference and safe fail should all be considered when build, operating, maintaining and upgrading any system, process, product or organisation.

We need to understand that out operating model only works within certain parameters and envelopes.

We all like to think that everything works first time like clockwork, but things are often the result of trial and error.  Each product improves in every iteration.

Liked it? Take a second to support Manoj Chawla on Patreon!