Sources of Reliability Data
Part 1 - Reliability Testing Basics

Reliability testing is the cornerstone of a reliability engineering program. It provides the most detailed forms of life data in that the conditions under which the data are collected can be carefully controlled and monitored. Furthermore, the reliability tests can be designed to uncover particular suspected failure modes and other problems. The type of reliability testing a product undergoes will change along different points of its life-cycle, but the overriding goal is to insure that data from all or most of the tests were generated under similar enough conditions so that an "apples-to-apples" comparison can be made of the product's reliability characteristics at different points in the product's life.

A properly designed series of tests, particularly during the product's earlier design stages, can generate data that would be useful in the implementation of a reliability growth tracking program. This will provide information helpful in making management decisions regarding scheduling, development cost projections and so forth. This information will also be useful in planning the development cycle of future products.

Reliability Test Design

Designing reliability tests can sometimes lead to a catch-22 situation in that a certain amount of information is required about the life of a product in order to design the most efficient life tests. Often, reliability or test engineers are looking for a magic formula that will allow them to obtain precise, accurate information on the life of their products by testing small numbers of units for short periods of time. Sadly, there will be no type of test plan that can meet all of these requirements. It must be kept in mind that skimping on test units or test time will almost always result in greater uncertainty in the results of the test.

Ideally, a reliability test would be one in which a relatively large number of units are tested to failure. Although the concept of a large number of failures on test may be an anathema to design engineers, the information that these tests produce are necessary to successfully model the life behavior of the product. The more failures that a reliability test produces, the more precise the results of the analysis will be. This is especially important for products that are new or otherwise have little historical information about their reliability. When developing tests for such products, it is advisable to test as many products as is feasible in order to obtain large quantities of information. This will help guarantee a precise early estimate of the product's reliability, which may be able to reduce the scope of testing further along in the development process.

Once detailed initial reliability information has been collected and analyzed, it can be used to design reliability acceptance or demonstration tests. These tests, which usually occur later in the development process, are used to demonstrate that the reliability of a product is no worse than a certain level. It is normally assumed that no failures will occur on such tests. However, in order to effectively design such tests, a certain amount of information about the product under test is required. At a minimum, one must be able to estimate the distribution that the life of the product follows, and the value of the shape parameter of that distribution. With this information, one can design a test that will demonstrate that the products have met a minimum reliability requirement at a given confidence, provided that there are no unanticipated failures during the test.

Customer Usage Profiling

An important requirement for designing useful reliability tests is to have a good idea of how the product is actually going to be used in the field. The tests should be based on a realistic expectation of the customer usage, rather than estimates or "gut feelings" about the way the customer will use the product. Tests based on mere speculation may result in a product that may not have been rigorously tested and consequently may run into operational difficulties due to use stress levels being higher than anticipated. On the other hand, tests that are designed with a strong basis of information on how the product will be used will be more realistic and result in an optimized design that will exhibit fewer failures in the field.

Customer usage profiles can be set up that actively gather information on how the customers are actually using an organization's product. This can range from a simple questionnaire to a sophisticated instrumentation of the product that feeds back detailed information about its operation. An incentive is often useful to get customers to sign on for a usage measurement program, particularly if it is an intrusive process that involves the installation of data collection equipment. However, customers are often eager to participate in the knowledge that the information that they provide will ultimately result in a more reliable and user-friendly product.

Test Types

In many cases, the type of testing that a product undergoes will change as the product's design becomes mature and the product moves from the initial design stages to final design release and production. Nevertheless, it is a good practice to continue to collect internally-generated data concerning the product's reliability performance throughout the life-cycle of the product. This will strengthen the reliability growth analysis and help provide correlation between internal test results and field data. A brief summary of various types of reliability tests is presented next.

Development Testing

Development testing occurs during the early phases of the product's life-cycle, usually from project inception to product design release. It is vital to be able to characterize the reliability of the product as it progresses through its initial design stages so that the reliability specifications will be met by the time the product is ready for release. With a multitude of design stages and changes that could affect the product's reliability, it is necessary to closely monitor how the product's reliability grows and changes as the product design matures. There are a number of different test types that can be run during this phase of a product's life-cycle to provide useful reliability information:

  • Component-level Testing - Although component-level testing can continue throughout the development phase of a product, it is most likely to occur very early on. This may be due to the lack of availability of parts in the early stages of the development program. There may also be special interest in the performance of a specific component if it has been radically redesigned, or if there is a separate or individual reliability specification for that component. In many cases, component-level testing is undertaken to begin characterizing a product's reliability even though full system-level test units are unavailable or prohibitively expensive. However, system-level reliability characterization can be achieved through component-level testing. This is possible if sufficient understanding exists to characterize the interaction of the components. If this is the case, the system-level reliability can be modeled based on the configuration of components and the result of component reliability testing, using such tools as ReliaSoft's BlockSim.
  • System-level Testing - Although the results of component-level tests can be used to characterize the reliability of the entire system, there is no substitute for testing the entire system, particularly if that is how the reliability is specified. That is, if the technical specifications state a reliability goal for a specific system or configuration of components, that entire system or configuration should be tested to compare the actual performance with the stated goal. Although early system-level test units may be difficult to obtain, it is advisable to be able to perform reliability tests at the system level as early as possible. At the very least, comprehensive system-level testing should be performed immediately prior to the product's release for manufacturing, in order to verify design reliability. During such system-level reliability testing, the units under test should be from a homogeneous population, and should be devoted solely to the reliability test. The results of the reliability test could be skewed or confounded by "piggybacking" other tests along with it, and this practice should be avoided. A properly run system-level reliability test will be able to provide valuable engineering information above and beyond the raw reliability data.
  • Environmental and Accelerated Testing - It may be necessary in some cases to institute a series of tests where the system is tested at extreme environmental conditions, or with other stress factors accelerated above the normal levels of use. It may be that the product would not normally fail within the time constraints of the test, and it needs to have the stress factors accelerated in order to get any meaningful data within a reasonable time. In other cases, it may be necessary to simulate different operating environments based on where the product is intended to be sold or operated. Regardless of the cause, tests like these should be designed, implemented and analyzed with care. Depending on the nature of the accelerating stress factors, it is easy to draw incorrect conclusions from the results of these tests. A good understanding of the proper accelerating stresses and the design limits of the product are necessary to be able to implement a meaningful accelerated reliability test. For example, one would not want to design an accelerated test that would overstress the product and introduce failure modes that would not normally be encountered in the field. Given that there have been a lot of incredible claims about the capability of accelerated testing and the improbably high acceleration factors that can supposedly be produced, care needs to be taken when setting up this type of reliability testing program. (SHAMELESS PLUG: ReliaSoft's ALTA software is one of the few applications solely dedicated to the analysis of accelerated test data. The new version, ALTA PRO, is the only commercial software capable of providing analytical results for time-varying stress tests, such as step-stress tests. For more information on ALTA and ALTA PRO, clickhere.)
  • Shipping Tests - Although shipping tests do not necessarily qualify as reliability tests per se, shipping tests or simulations should be a prerequisite to reliability testing. This is because the effects of shipping will often have an impact on the reliability of the product that the customer experiences. As such, it may be useful to incorporate shipping tests alongside the normal reliability testing. For example, it may be a good idea to put the units of a final design release reliability test through a non-destructive shipping test prior to the actual reliability testing in order to better simulate actual use conditions.

Manufacturing Testing

The testing that goes on after a product design has been released for production generally tends to measure the process rather than the product, under the assumption that the released product design is final and good. However, this is not necessarily the case, as post-release design changes or feature additions are not uncommon. That notwithstanding, it is still possible to obtain useful reliability information from manufacturing-type testing without diluting any of the process-oriented information that these tests are designed to produce.

  • Functionality Testing & Burn-In - This type of testing usually falls under the category of operation verification. A large proportion, if not all, of the products coming off of the assembly line are put on a very short test in order to verify that they are functioning. In some situations, they may be run for a predetermined "burn-in" time in order to weed out those units that would have early infantile failures in the field. Although it may not be possible to collect detailed reliability information from this type of testing, what is lost in quality is made up for in quantity. With the proper structuring, these tests can provide a fairly good picture of early-life reliability behavior of the product.
  • Extended Post-Production Testing- This type of testing usually gets implemented near the end or shortly after the product design release to production. It is useful to structure these types of tests to be identical to the final reliability verification tests conducted at the end of the design phase. This is to be able to assess the effects of the production process on the reliability of the product. In many cases, the test units that undergo reliability testing prior to the onset of actual production are hand-built or carefully adjust prior to the beginning of the reliability tests. By replicating these tests with actual production units, potential problems in the manufacturing process can be identified before many units get shipped.
  • Design/Process Change Verification - This type of testing is similar to the extended post-production testing in that it should closely emulate the reliability verification testing that takes place at the end of the design phase. This type of testing should occur at regular intervals during production, or immediately following a post-release design change or a change in the manufacturing process. These changes can have a potentially large effect on the reliability of the product, and these tests should be adequate - in terms of duration and sample size - to detect such changes.

This gives just a brief overview of some of the aspects of reliability testing. When it comes to designing and implementing these tests, the philosophy that "more is better" holds true - more units on test, more units run until failure. With more data, the results will be more precise, with less uncertainty. This will allow the engineer to make a better estimate of the product's "real life" behavior. However, there is only way to categorize a product's true behavior, and that is to investigate the reliability of the product in the hands of the users (your customers). In a subsequent article, we look at sources of field data.