Jump to Content

< back

Health and Safety Management Systems

(Source: The Manager Online Magazine, May 2000)
 

In the foyer outside the lift-well at the corporate head office of the petrochemical company Orica is an electronic board listing the number of days since the last injury resulting in lost time. The last time I checked, it was over two hundred.

To the organisational psychologist, Professor James Reason, it sends all the wrong messages. It suggests that safety may be equated with the avoidance of lost-time injuries. Secondly, it acts as a powerful deterrent to anyone reporting a lost time injury.

A focus on lost-time injuries developed in the eighties in response to an undoubtedly correct perception that prevailing rates were too high. Du Pont was seen as the model. It had cut its rate to a negligible level and companies trooped to Delaware to learn its methods.

"The problem is that you get the rate down to a level where you dont get signals about your performance, you only get noise. Individual accidents are very different from the accidents that happen to systems."

Reason notes that at a coal mine collapse at Moura in Australia, the injury rate had been halved in the preceding year. Lowering injury rates is a worthy goal, but it is not the same thing as operating a safe system.

Injury rates are a negative measure. "The other side of safety is the resilience to survive the slings and arrows of your business. One of the problems is that people tend to steer by negative outcomes." Resilience is a powerful concept in the small industry of academic study that is growing up around the study of safety. It suggests that an organisation has the capacity to recover from things going wrong.

Reason is a member of a team that has recently completed a study of neonatal cardio-thoracic surgery. The procedure examined was an arterial switch, in which the major heart arteries, the pulmonary and aorta, are relocated to overcome a congenital defect. It was the subject of a scandal at Bristol Hospital, where attempts were made to cover up the fact that two surgeons had abnormally high fatality rates on performing the procedure.

A member of Reasons team observed 165 operations over an 18 month period. Reason says the procedure is very difficult. There are very small coronary arteries, less than a millimetre in diameter, that supply blood for the heart to function, that must be relocated. There is considerable variation in the anatomy from one person to the next. The researcher found that there was an average of seven events in each operation, of which one was life-threatening. The others were minor events, such as a nurse supplying the wrong implement at the wrong moment, or difficulty in locating an artery.

The fatality rate overall was 6%. The difference, literally, between life and death was not whether there was a life threatening event. These were normal. The difference was in the ability of the surgeon to compensate for adverse events. The more minor things that went wrong, the less able was the surgeon to compensate for the major events. In all, 50% of the major events were not compensated for. Comparatively few of the minor events were compensated.

Reason cites another study suggesting that around the world, there are about 100 million errors on the flight decks of passenger aircraft each year. However there are only an average of 25 "hull losses". The key to safety is the ability of the flight deck to overcome those errors. Trying to reduce the error rate to zero is futile. In any industrial environment, there will always be things like gravity, storms and hydrocarbons that can go wrong. Moreover, there will always be human factors.

Between 80% and 90% of major accidents include human error, so it is not surprising that technical and operational managers see their job as trying to reduce the scope for human error. They do that by trying to reduce variability to ensure consistency of action.

Reason says there is a paradox here, as the variability which produces errors also produces the innovation that overcomes them. He quotes the example of what became known as the Gimli Glider case of a Boeing 767 which ran out of fuel on a flight from Montreal to Edmonton. There was a string of errors, such as confusion imperial and metric measures and fuel gauges not working, that led to the wrong amount of fuel being loaded. On average, there are about 4.5 different problems that happen simultaneously to cause accidents in large aircraft. The plane was fortunate to have a flight engineer who recalled that there was a disused military airstrip in the vicinity at a place called Gimli. It was also fortunate that the captain was a skilled glider pilot. The plane could only make one approach to the runway and it had to be successful. The challenge was to get the plane down to a level where it was in a position to land without it gathering too much speed on the way. The pilot used a manoeuvre used in gliding known as a "side slip", in which it descends side on, reducing height quickly without going into a dive, and then kicks the tail back into normal position for landing. It was successful, although the youths who were using the airstrip for drag-racing at the time got a surprise. It was certainly never included in any Boeing procedure manual.

Managing for reliability

The typical response of an organisation to a major accident is to find someone to blame, ideally the operator. It is the first cause, and consequently also appeals to the deterministic logic of judicial investigators. In the wake of the Chernobyl disaster, the chief investigator, Valeri Legasov, blamed the operators. Two years later, he committed suicide, leaving behind a tape recording in which he admitted he was wrong. The fault was the Soviet system. That may be true, but it is hardly helpful. The task is to find the right level of systematic intervention.

Reason says there are three types of organisational responses to danger. There are those that have a pathological approach to safety which says they dont want to know. For example, following the Chernobyl accident, the head of the British nuclear industry said it could never happen in England because their reactors were much safer and their operating staff were superbly trained.

Then there are the bureaucratic organisations that play things by the book. They will pay consultants to perform comprehensive safety audits and will place large signs urging staff to think about safety in prominent places. They will sleep well at night knowing they are doing all the right things.

And then there are organisations where they never sleep comfortably. There is a constant preoccupation with the possibility of failure. Reason cites the work of Karl Weick who describes reliability as a "dynamic non-event". The conditions that produce reliability must constantly be worked.

Reason says there are two aids for a company striving for reliability. First are the reactive measures, in which a company tracks near-misses, incidents and exceedences. The last are when minimum prescribed tolerances are exceeded and may often be automatically measured. Airlines routinely download data from flight recorders to test measure such things as whether landings have been too short, or heavy. Although such data may not provide any contextual information, it can highlight areas where there are clusters of problems. For example, British Rail measured instances of its signals not being followed and found that 93% of its 36,000 signals caused no problems, while 0.3% caused 15% of the problems. The measure of exceedences highlighted what Reason calls error traps.

Valuable contextual information comes from the analysis of incidents and near-misses. "Incidents can tell you about the breeches in your defences and about your organisational failures. However incidents have a real problem that they are dependant on people telling you and mostly they dont tell you because they will get punished."

Reason says a safe culture is one that is informed. People have to be able to know where the edge is, without falling over. A safe culture must also be just. People must know that they can admit honest mistakes without the least fear of punishment. At the same time, there must be clear procedures for dealing with flagrant breeches of guidelines. It is a difficult balance than can only be achieved if there is an agreed line between the acceptable and the unacceptable.

The other source of safety is a proactive process of identifying conditions most in need of correction leading to steady gains in a companys resistance to danger. Reason likens it to a long term fitness program. The correction programs focus on the most common causes of failure, such as hardware, design, maintenance management, procedures, error reinforcing conditions, housekeeping, incompatible goals, communications, organisation, training and defences.

Reason does not believe that efforts to measure the economic benefit of "the accidents you dont have" will produce the correct strategies. Although it may be the case that safety is good for profits, the stronger dynamic is that the forces of production and the forces of protection are contrary. The forces of production are managed with minute measures, procedures and rewards. They are intrinsically more powerful than the forces of protection, which typically produce few direct measures. It is for this reason that companies have focussed on the long term injury rate. Reason contends that a better approach is to treat safety as an exercise in due diligence. It is about ensuring that the conditions for the future survival of the business are preserved.

top^