Determining Reliability for Complex Systems
Part 1 - Analytical Techniques

A complex system is one that cannot be broken down to groups of series and parallel components. In many cases it is not easy to recognize which components are in series and which are in parallel in a complex system. The following network is a good example of such a complex system:

Complex System Diagram

As the figure illustrates, this system cannot be broken down into a group of series and parallel systems.  This complicates the problem of determining the system's reliability. If the system can be broken down to series/parallel configurations, it is a relatively simple matter to determine the mathematical or analytical formula that describes the system's reliability. However, for a complex system, determination of the system reliability becomes more involved.  

In this article, we will look at some of the techniques that can be employed to determine the mathematical expression that expresses the reliability of the system in terms of the reliabilities of its components. It is assumed that the reliability values for the components have been determined using standard (or accelerated)  life data analysis techniques, so that the reliability function for each component is known. With this component-level reliability information available, it then becomes necessary to determine how these component reliability values are combined to determine the reliability function for the overall system.  

There are a number of advantages to using analytical techniques to determine system reliability, as opposed to the more common method of using simulation. The primary advantage of the analytical solution is that a mathematical expression that describes the reliability of the system is obtained. Once the system's reliability function has been determined, other calculations on the system can be performed. Such calculations include:

  • Determination of the system's pdf.
  • Determination of the warranty period.
  • Determination of the system's failure rate.
  • Determination of the system's MTTF.

In addition, optimization and reliability allocation techniques can be utilized to aid engineers in their design improvement efforts. Another advantage of using analytical techniques is the ability to perform static calculations and analyze systems with a mixture of static and time-dependent components. Finally, the reliability importance of components over time can be calculated with this methodology.

Several methods exist for analytically obtaining the reliability of a complex system:

  • Decomposition method
  • Event space method
  • Path-tracing method

We will examine each of these methods, illustrating the techniques involved with simple system examples.

Decomposition Method

The decomposition method is an application of the law of total probability. It involves choosing a "key" component and then calculating the reliability of the system twice: once as if the key component failed (R=0) and once as if the key component succeeded (R=1). These two probabilities are then combined to obtain the reliability of the system, since at any given time the key component will be failed or operating. Using probability theory, the equation is:

Equation

Assuming that the components are statistically independent, this reduces to:

Equation

Consider three units in series.

  • A is the event of Unit 1 success
  • B is the event of Unit 2 success
  • C is the event of Unit 3 success
  • s is the event of system success

First select a " key" component for the system. Selecting Unit 1, the probability of success of the system is:

Equation

If Unit 1 survives, then: 

Equation

That is, if Unit 1 is operating, the probability of the success of the system is the probability of Units 2 and 3 succeeding.

If Unit 1 fails, then: 

Equation

That is, if Unit 1 is not operating, the system has failed since a series system requires all of the components to be operating for the system to operate.

Thus the reliability of the system is:

Equation

Another Illustration of the Decomposition Method

Consider the following system:


Simple System Diagram

  • A is the event of Unit 1 success
  • B is the event of Unit 2 success
  • C is the event of Unit 3 success
  • s is the event of system success

Selecting Unit 3 as the key, the system reliability is: 

Equation

If Unit 3 survives, then: 

Equation

That is, since Unit 3 represents half of the parallel section of the system, as long as it is operating, the entire system operates.

If Unit 3 fails, then the system is reduced to:

Equation

The reliability of the system is given by: 

Equation

or: 

Equation

Event Space Method

The event space method is an application of the mutually exclusive events axiom. All mutually exclusive events are determined, and those which result in system success are considered. The reliability of the system is simply the probability of the union of all mutually exclusive events that yield a system success. Similarly, the unreliability is the probability of the union of all mutually exclusive events that yield a system failure. This is illustrated in the following example.

Consider the following system, with reliabilities R1, R2, and R3 for a given time:

Simple System Diagram

  • A is the event of Unit 1 success
  • B is the event of Unit 2 success
  • C is the event of Unit 3 success

The mutually exclusive system events are:

X1 = ABC - all units succeed

X2 = ABC - only Unit 1 fails

X3 = ABC - only Unit 2 fails

X4 = ABC - only Unit 3 fails

X5 = ABC - Units 1 and 2 fail

X6 = ABC - Units 1 and 3 fail

X7 = ABC - Units 2 and 3 fail

X8 = ABC - all units fail

System events X6, X7, and X8 result in system failure. Thus the probability of failure of the system is: 

Equation

Since events X6, X7, and X8 are mutually exclusive, then: 

Equation

And:

Equation

Combining terms yields: 

Equation

Since: 

Equation

then:

Equation

This is of course the same result as the one obtained previously using the decomposition method.

If R1 = 99.5%, R2 = 98.7%, and R3 = 97.3%, then:

Equation

or Rs = 99.95%.

Path-Tracing Method

With this method, every path from a starting point to an ending point is considered. Since system success involves having at least one path available from one end of the Reliability Block Diagram (RBD) to the other, as long as at least one path from the beginning to the end of the path is available, the system has not failed. One could consider the RBD to be a plumbing schematic. If a component in the system fails, the "water" can no longer flow through it. As long as there is at least one path for the "water" to flow from the start to the end of the system, the system is successful. This method involves identifying all of the paths the "water" could take and calculating the reliability of the path based on the components that lie along that path. The reliability of the system is simply the probability of the union of these paths. In order to maintain consistency of the analysis, starting and ending blocks for the system must be defined.

Consider the following system:

System Diagram

The successful paths for this system are X1 = ABD and X2 = ACD. The reliability of the system is simply the probability of the union of these paths.

Equation

Equation

Thus:

Equation

In the following system, a starting and an ending node must be defined.

System Diagram

Assume the following starting and ending nodes:

System Diagram with Start and End Nodes

The paths for this system are X1 = 1,2 and X2 = 3. The probability of success for the system is given by:

Equation

or:

Equation

A modified version of  this method is used by ReliaSoft BlockSim to calculate the analytical solution to system reliability diagrams.

The examples used to illustrate these techniques used fairly simple systems to simplify the mathematics involved. The same techniques can be used to determine the reliability of more complex systems. It should be fairly obvious that the expressions for the system reliability will get larger as the number of components in the system increases. The way the components are arranged reliability-wise will also have an effect on the size of the final system reliability term. In fact, even moderately-sized complex systems can prove to be too unwieldy to solve by hand. Computer programs can be employed to solve these large complex systems, but to the best of our knowledge, BlockSim is the only software package available that is capable of this type of analysis.  

While these analytical techniques for determining system reliability can yield results not available with other techniques, there are also some drawbacks. The biggest disadvantage of the analytical method is that formulations can become very complicated. The more complicated a system is, the larger and more difficult it will be to analytically formulate an expression for the system's reliability. For particularly detailed systems, this process can be quite time-consuming, even with the use of computers. Furthermore, when the maintainability of the system or some of its components must be taken into consideration, an analytical solution may be impossible to compute. In these situations, the use of simulation methods may be more advantageous than attempting to develop a solution analytically. We will take a look at these simulation methods in the next article in this series.