Reliability Growth Analysis

An Overview of Basic Concepts

What is Reliability Growth?

In general, the first prototypes produced during the development of a new complex system will contain design, manufacturing and/or engineering deficiencies. Because of these deficiencies the initial reliability of the prototypes may be below the system's reliability goal or requirement. In order to identify and correct these deficiencies, the prototypes are often subjected to a rigorous testing program. During testing, problem areas are identified and appropriate corrective actions (or redesign) are taken. Reliability growth is the improvement in the reliability of a product (component, subsystem or system) over a period of time due to changes in the product's design and/or the manufacturing process.

The concept of reliability growth is not just theoretical or absolute. Reliability growth is related to factors such as the management strategy toward taking corrective actions, effectiveness of the fixes, reliability requirements, the initial reliability level, reliability funding and competitive factors. For example, one management team may take corrective actions for 90% of the failures seen during testing, while another management team with the same design and test information may take corrective actions on only 65% of the failures seen during testing. Different management strategies may attain different reliability values with the same basic design. The effectiveness of the corrective actions is also relative when compared to the initial reliability at the beginning of testing. If corrective actions give a 400% improvement in reliability for equipment that initially had one tenth of the reliability goal, this is not as significant as a 50% improvement in reliability if the system initially had one half the reliability goal.

Elements of a Reliability Growth Program

In a formal reliability growth program a reliability goal (or goals) is set and should be achieved during the development testing program with the necessary allocation or reallocation of resources. Therefore, planning and evaluating are essential factors in a growth process program. A comprehensive reliability growth program needs well-structured planning of the assessment techniques. A reliability growth program differs from a conventional reliability program in that there is a more objectively developed growth standard against which assessment techniques are compared. A comparison between the assessment and the planned value provides a good estimate of whether or not the program is progressing as scheduled. If the program does not progress as planned, then new strategies should be considered. For example, a reexamination of the problem areas may result in changing the management strategy so that more problem failure modes surfaced during the testing actually receive a corrective action instead of a repair. Several important factors for an effective reliability growth program are:

Management strategy: decisions to correct problems or not correct problems and the effectiveness of the corrective actions
Testing: provides opportunities to identify the weaknesses and failure modes in the design and manufacturing process
Failure mode root cause identification: funding, personnel and procedures are provided to analyze, isolate and identify the cause of failures
Corrective action effectiveness: design resources to implement corrective actions that are effective and support attainment of the reliability goals
Valid reliability assessments

The management strategy may be driven by budget and schedule but it is defined by the actual actions of management in correcting reliability problems. If the reliability of a failure mode is known through analysis or testing, then management makes the decision either not to fix (no corrective action) or to fix (implement a corrective action) that failure mode. Generally, if the reliability of the failure mode meets the expectations of management, then no corrective actions would be expected. If the reliability of the failure mode is below expectations, the management strategy would generally call for the implementation of a corrective action.

Another part of the management strategy is the effectiveness of the corrective actions. A corrective action typically does not eliminate a failure mode from occurring again. It simply reduces its rate of occurrence. A corrective action, or fix, for a problem failure mode typically removes a certain amount of the failure mode's failure intensity, but a certain amount will remain in the system. The fraction decrease in the problem mode failure intensity due to the corrective action is called the effectiveness factor (EF). The EF will vary from failure mode to failure mode but a typical average for government and industry systems has been reported to be about 0.70. With an EF equal to 0.70, a corrective action for a failure mode removes about 70% of the failure intensity, but 30% remains in the system.

Corrective action implementation raises the following question: "What if some of the fixes cannot be incorporated during testing?" It is possible that only some fixes can be incorporated into the product during testing. However, others may be delayed until the end of the test since it may be too expensive to stop and then restart the test, or the equipment may be too complex for performing a complete teardown. Implementing delayed fixes usually results in a distinct jump in the reliability of the system at the end of the test phase. For corrective actions implemented during testing, the additional follow-on testing provides feedback on how effective the corrective actions are and provides opportunity to uncover additional problems to correct.

Evaluation of the delayed corrective actions is provided by projected reliability values. The demonstrated reliability is based on the actual current system performance and estimates the system reliability due to corrective actions incorporated during testing. The projected reliability is based on the impact of the delayed fixes that will be incorporated at the end of the test or between test phases.

When does a reliability growth program take place in the development process? Actually, there is more than one answer to this question. The modern approach to reliability realizes that typical reliability tasks often do not yield a system that has attained the reliability goals or attained the cost effective reliability potential in the system. Therefore, reliability growth may start very early in a program utilizing Integrated Reliability Growth Testing (IRGT). This approach recognizes that reliability problems often surface early in engineering tests. The focus of these engineering tests is typically on performance and not reliability. IRGT simply piggybacks reliability failure reporting, in an informal fashion, on all engineering tests. When a potential reliability problem is observed, reliability engineering is notified and appropriated design action is taken. IRGT will usually be implemented at the same time as the basic reliability tasks. In addition to IRGT, reliability growth may take place during early prototype testing, during dedicated system testing, during production testing, and from feedback from any manufacturing or quality testing or inspections. The formal dedicated testing or RGDT will typically take place after the basic reliability tasks have been completed.

Note that when testing and assessing against a product's specifications, the test environment must be consistent with the specified environmental conditions under which the product specifications are defined. In addition, when testing subsystems it is important to realize that interaction failure modes may not be generated until the subsystems are integrated into the total system.

Reliability Growth Analysis

Reliability growth analysis is the process of collecting, modeling, analyzing and interpreting data from the reliability growth development test program (development testing). In addition, reliability growth analysis can be done for data collected from the field (fielded systems). Fielded systems also includes the ability to analyze data of complex repairable systems. Depending on the metric(s) of interest and the data collection method, different models can be utilized (or developed) to analyze the growth processes.

For complete details see ReliaSoft's Reliability Growth and Repairable System Analysis Reference.