Friday, March 2, 2012

Reactor accidents due to human failures

A–1. From time to time reports on incidents or reviews of a plant or utility serve as a reminder of the continuing need for vigilance and the contribution that all managerial levels play in achieving safe operation. In the first of the two sections below, two examples which illustrate this have been drawn from published major plant/utility reviews. The final section provides specific examples linked to elements of the model presented in Fig. 2 which have been obtained from two IAEA sources — the IRS database and the OSART review findings. They all serve to demonstrate the continuing need to learn from the experience of others in relation to human and organizational issues as well as in technical and engineering areas.
A–2. In focusing on problems, there is a danger of forgetting the many positive successes achieved over 9000 reactor-years of experience accumulated worldwide. The international nuclear community must continue to seek replication of the many excellent practices that have led to these successful achievements as well as learning from well publicized shortfalls. Both can serve as an impetus and as a motivation for change. For those Member States with established nuclear power programmes and systems to support their safe operation, it is frequently factors associated with organization and human behaviour for which significant further improvements can be gained.
A–3. The 1997 report of a review of a utility and its nuclear power plants identified management, process and equipment problems that had adversely impacted the performance of the organization and its operating stations. Although incidents and poor performance tend to focus attention at the plant operational level, they often arise as a result of weaknesses stemming from the higher organizational level, i.e. those responsible for defining the organization and specifying safety requirements. In this respect the review team found problems with organizational structures, practices, policies and systems.
A–4. These shortcomings inevitably had an adverse impact at the working level of the plants. Specific shortcomings in the planning, control and support activities were found and it was noted that ‘personnel have not incorporated an adequate safety culture into their normal activities’. One vital ingredient in an effective safety management system, namely an effective audit, review and feedback process, was also found not to be working satisfactorily. The utility has made a very positive response to the findings.
A–5. An evaluation (also in 1997) of a nuclear power plant in another Member State by a team from the national regulatory body followed a decline in performance that the regulator had noted and drawn to the attention of the plant a year earlier. Although the robust nature of the plant design, its relative newness and the limited period over which performance had declined were considered to be major factors in preventing significant degradation of plant equipment or an event of more serious consequence, a number of important deficiencies in the safety management system were identified.
A–6. The review concluded that management and leadership were generally ineffective in establishing expectations, communications, independent oversight, performance measurement and monitoring, decision making and human resource management. Programmes, processes and procedures were generally ineffective in self-assessments, corrective actions, root cause analyses, planning prioritization and scheduling. Human performance was found to be weak in procedural adherence, resource allocation and time management and prioritization.
A–7. The root causes of the problems were determined by the team to be:
—management generally did not establish and implement effective performance standards;
—the plant’s programmes, processes and procedures did not consistently provide defence in depth to assure plant activities were conducted in a safe manner;
—problem identification was inconsistent and evaluation and corrective actions were generally ineffective;
—management did not ensure that the infrastructure was suitable to support the major changes which the management were seeking to implement. The plant and utility have embarked on a plan to address the issues and their root causes.
A–8. The experiences of both utilities in implementing their recovery programmes provide valuable lessons to the international nuclear community.
A–9. Examples of incidents (as reported to the IRS) and weaknesses in systems that might become the direct or root cause of a future incident (OSART review findings) are provided as a reminder of the need to remain vigilant and to avoid complacency. The latter, in particular, also serves to demonstrate the benefits of periodic external review.
Definition of safety requirements and organization
Statements of safety policy (including standards, resources and targets)
A–10. In an OSART review it was found that many of the rules for the technical specifications of safety equipment surveillance which were in force had been submitted to the safety authorities but had not yet been authorized by them. For example, a programme of diesel generator tests had been submitted to the regulatory authorities by the utility in 1992 but, by the time of the review in 1998, had not yet been approved. A batch of plant modifications had been approved for implementation by the regulator, but not the corresponding changes in specifications for surveillance tests. In those cases where the changes had not been approved, the plant implemented the surveillance proposed to the regulator so that there would be no ambiguity for operators. Some defence in depth was lost because the external review had not taken place. In addition, the use of surveillance tests that had not been approved by the regulator comprised a further loss of defence in depth as a result of a failure to comply with procedural requirements.
Planning, control and support
Control of safety related activities
A–11. During commissioning work on a hot cell in a nuclear power plant, a real spent fuel assembly was disassembled by mistake instead of a dummy fuel assembly. Three members of the maintenance staff received external radiation doses in excess of the dose limit. This event occurred as a result of work being poorly organized. The permit for work did not mention the need for a comprehensive programme of testing and the acceptance testing of the equipment in the disassembly section. Nor was a copy of these programmes attached to the permit, and the members of the team were not informed about it. Nobody in authority had checked that the permit had been correctly drawn up. The permit also made no mention of measures to prepare the workplace (i.e. preparation of the dummy fuel assembly in the hot cell). Therefore the senior mechanical engineer allowed his team to start the work covered by the permit without performing the official procedure for handover of the workplace, including reporting to the works manager. The relevant team began to work under the impression that the hot cell transport apparatus contained a dummy fuel assembly.
A–12. In another example of failures of control, an Assessment of Safety Significant Events Team (ASSET) mission reviewed an incident where during a cold  scram test after the refuelling outage, one scram group failed to work. The line-up checks of the valves belonging to that scram group were missed after maintenance had been performed. In addition, a second independent position check of all valves before plant startup also did not detect the wrong system line-up. This event occurred directly as a result of the lack of a rigorous and questioning approach in respect of the maintenance of safety related systems. The root cause was a lack of emphasis by plant management on ensuring adequate control when dealing with safety related systems. There was no planned management or supervisory intervention to verify the stringency of the valve line-up checks. It was noted that management were seldom seen to be visibly endorsing the importance of a rigorous approach when dealing with safety related systems.
Ensuring competence
A–13. In an OSART review in 1997, it was found that material used for training people in the plant was not being systematically reviewed and revised. Most of the existing lecture materials had been developed between 1978 and 1984 and had not been revised to include necessary changes such as plant modifications, operating experience information or procedure changes. The OSART team commented that the use of training notes that are not up to date could result in trainees receiving incorrect information and could lead to mistakes.
Communication and team support
A–14. A reactor startup was terminated and a reactor shutdown was commenced to repair a leaking safety relief valve. The reactor began to depressurize because decay heat was insufficient to supply all auxiliary loads. As the reactor depressurized, the reactor coolant temperature decreased, adding positive reactivity. As long as the operator continued to insert control rods, the reactor was maintained subcritical. The operator stopped inserting control rods to review plant conditions and the reactor scrammed about a minute later. The licensee attributed the event to the control room team failing to recognize the actual plant conditions.
A–15. The examples in this category relate to human errors and are thus sometimes simply ascribed to individual failures. However, such issues frequently have their roots in organizational shortcomings which, if addressed, can minimize the extent of such human errors.
Questioning attitude
A–16. During a refuelling outage, one loop was isolated and drained to allow automatic in-service inspection of the steam generator tubes. In parallel, maintenance of the hot and cold leg main isolating valves (MIVs), gearboxes and electrical systems was in progress. One of the maintenance personnel noticed that the position indicator of the MIV wedge did not indicate the fully closed position. As the MIV was not fully secured against movement, he tried to close it. As he could not move the valve in the closed position he turned the wedge by mistake in the open direction. Water was then able to flow from the refuelling/spent fuel pool through the MIV and onto the floor through the open steam generator manhole. The refuelling pool level dropped by approximately 27 cm. Refuelling was stopped immediately and the MIV closed.
About 16.6 m3 of water was lost from the refuelling pool. This incident illustrates a failure to question the safety significance of a course of action when faced with difficult or ambiguous circumstances.
Rigorous and prudent approach
A–17. An OSART review found that the alarm response by reactor operators and radioactive waste control room operators at a plant was deficient. It was noted that several alarms were silenced, then allowed to flash for extended periods of time, including the power range monitor upscales and rod blocks, and alarms on a fire system panel. It was judged that this practice might have arisen because of the large number of alarms. For example, it was noted that over 50 alarms were lit in the radioactive waste control room.
A–18. Alarms are one of the first indications of a problem. Without an adequate response, degradation of plant systems may go undetected. The OSART team recommended that operations management should continuously reinforce expectations to improve operator alarm response. These expectations should include referring to alarm response procedures when an alarm is received, at least for the first time an individual alarm is received on a shift. They recommended that efforts to achieve a ‘black board’ concept should continue in order to reduce the number of distracting alarms.
A–19. Poor communication is very frequently an important contributor to incidents. In one example, preparation for refuelling was being performed and the reactor cavity was being filled with water. An examination of the sump area was planned by looking through the access door only. A worker was provided with a key to the sump area and was cautioned not to enter the sump area. The task was delayed until the next shift. The key was passed on but the caution was not. Two workers entered the sump area in spite of the warning on the door. One worker received a dose of 13 mSv (whole body) and the other received a dose of more than 2 mSv.
Audit, review and feedback
Measuring performance
A–20. An ASSET review found that on several occasions the unexpected activation of reactor protection system occurred when the reactor coolant pump was put into operation at 50% reactor power. The reactor power controller repeatedly failed to compensate for the reactivity increase induced by startup of the reactor coolant pump, allowing the neutron flux rate increase to exceed trip settings. This situation occurred on several occasions over a period of time but ineffective performance measuring caused plant management not to take appropriate and timely measures to avoid recurrence. In particular, no thorough analysis verifying the exact cause of the event was performed and no changes to reactor coolant pump procedures or reactor power controller designs response were considered.
A–21. Another ASSET mission found that the inoperability of a diesel generator due to oil cooler leakage was unnecessarily repeated. When the first oil cooler leakage occurred, the plant management decided to replace the tube bundle with one of similar material. A neighbouring power station had, however, previously suffered from an exactly similar problem and had demonstrated that the only solution was to replace the cooler with one of stainless steel. This information had been relayed to the original station but they still replaced the tube bundle with the original material and this again failed after a short time in operation.
Corrective actions and improvements
A–22. Several ASSET review have found corrective actions not being implemented in a timely manner, leading to numerous repeat events. The plants often have excellent computerized systems to store event databases and to analyse events systematically. However, the analysis of failures is often focused mainly on the direct cause and often only a specific area of the root cause is identified for correction. The specific corrective actions to eliminate the individual problem are implemented, but the broader generic lessons remained uncorrected.
[1] INTERNATIONAL NUCLEAR SAFETY ADVISORY GROUP, Basic Safety  Principles for Nuclear Power Plants, Safety Series No. 75-INSAG-3, IAEA, Vienna (1988); and the update, Basic Safety Principles for Nuclear Power Plants 75-INSAG-3 Rev. 1, INSAG-12, IAEA, Vienna (1999).
[2] INTERNATIONAL NUCLEAR SAFETY ADVISORY GROUP, Safety Culture, Safety Series No. 75-INSAG-4, IAEA, Vienna (1991).
[3] INTERNATIONAL ATOMIC ENERGY AGENCY, Quality Assurance for Safety in Nuclear Power Plants and other Nuclear Installations: Code and Safety Guides Q1–Q14, Safety Series No. 50-C/SG-Q, IAEA, Vienna (1996).

No comments:

Post a Comment