AIAG/VDA’s FMEA Manual Is a Major Advance
New process is qualitative rather than quantitative
The Automotive Industry Action Group’s (AIAG’s) and German Association of the Automotive Industry’s (VDA’s) new Failure Mode and Effects Analysis Handbook (AIAG, 2019) offers significant advances over FMEA as practiced 15 or 20 years ago.1 The publication is definitely worth buying because the new approach includes valuable methodology; this article will cover the most important points and highlights.
New features
The new process is qualitative rather than quantitative, which overcomes a major drawback of the previous approach. The older occurrence ratings were based on the probability of a failure, and the older AIAG manuals even tabulated recommended nonconforming fraction ranges. If, for example, the failure was 50 percent or more likely, the occurrence rating was 10 (worst possible on a 1 to 10 scale), while one or fewer per 1.5 million opportunities earned a rating of 1. These probabilities can be estimated from a process capability study, assuming that one is available; otherwise, one might easily have to guess.
Qualitative does not, however, mean totally subjective. The new occurrence and detection ratings are based on the kinds of prevention and detection controls that are available. The manual makes it clear that engineering controls (or technical and machine-based controls) are superior to administrative or behavioral controls that rely on compliance and worker vigilance. Even the best behavioral (i.e., administrative) controls cannot earn an occurrence rating of less than two in Table P2, PFMEA Occurrence. Human inspection cannot earn a detection rating of less than six in Table P3, PFMEA Detection.
The new approach also dispenses with the risk priority number (RPN), which is the product of three ordinal numbers: the severity, occurrence, and detection ratings.2 Suppose we have two failure modes, both of which have RPNs of 240. The first has S = 10, O = 8, and D = 3, and the second has S = 3, O = 8, and D=10. Although the second will generate a lot of nonconformances (O = 8), few of which will be detected before they get shipped to the customer (D = 10), the worst they can do is cause the customer a minor annoyance (S = 3). However, while almost all the nonconformances from the first failure mode will be caught before shipment (D = 3), those that escape can kill the customer. The previous approach did recognize this issue and defined any failure mode with a severity of 9 or 10 as in the “legal zone” for which attention was required regardless of the RPN.3 The new AIAG/VDA publication uses an action priority matrix that gives the most weight to the severity rating, followed by occurrence and then detection.
The PFMEA process
Developing the process FMEA consists of seven steps, for which the AIAG/VDA manual provides extensive details; the following is a list of their key deliverables.
1. Planning and preparation: Scope of the FMEA, and identification of foundation or generic FMEAs based on product families or group technology
2. Structure analysis: Breakdown of the process into its operations (process steps) and identification of the factors (work elements or process elements) that might affect the process steps. This information can be presented as a tree structure.
3. Function analysis: Identification of the functions of the process, its operations, and the process elements, i.e., what they are supposed to do. This provides the basis for the failure analysis, or identifying what does not happen that should, and why. Process characteristics and product characteristics also are identified in this step, along with a parameter diagram that helps identify process inputs, outputs, and potential noise (i.e., variation) factors.
• The function of the process item (the highest level element of the structure tree) is the overall objective or deliverable of the process. Failure to achieve a process objective is the failure effect.
• The function of the process step (or operation) is how it contributes to achieving the process objective. Failure to achieve the process step, or the negative of the process step, is the failure mode.
• The function of the process work element (manpower, machine, material, environment, and also method and measurement) is what it does to enable completion of the process step. Failure of the process work element is the failure cause (formerly known as the “failure mechanism”).
4. Failure analysis: Identification of failure causes, failure modes, and failure effects
• The failure mode (considered first) is how the process step fails to achieve its function in fulfilling the objective of the process item. This is the focus element or starting point from which we identify the failure effect and failure cause.
• The failure cause is why the failure mode occurs.
• The failure effect is what happens that should not happen, or does not happen that should, regarding the process’s objectives and deliverables.
5. Risk analysis: Identification of prevention controls and detection controls, assignment of severity, occurrence, and detection ratings, and determination of the action priority
6. Optimization: Actions to remove or mitigate risks, and recalculation of the risk ratings and action priority
7. Documentation: The FMEA becomes a quality record and part of organizational knowledge, and lessons learned are deployed to similar processes.
Planning and preparation
The first step includes defining the FMEA’s scope, which can include occupational health and safety (OH&S) and continuity of operations as well as quality. Foundation or generic FMEAs offer a good starting point and are based on past experience for similar products. The concept of group technology or product families is important here. A member of a product family can be manufactured with a common set of tools and process steps, which means the risks and controls associated with those process steps are already familiar.
Structure analysis
Structure analysis is synergistic with the process approach of ISO 9001 and the related IATF 16949 automotive quality standard. The process flowchart, a basic quality tool, plays a central role in this activity. Its purpose is to identify the process items, process steps, and process elements for subsequent risk analysis.
The process item defines what the process is supposed to achieve, and it is the highest level element of the tree structure. The latter resembles a cause-and-effect diagram for each of the process steps, in which the planning process examines the process work elements that might cause the process step to fail to perform its function. The AIAG/VDA manual cites manpower, machine, material, and environment, but method and measurement also can be addressed.
Consider, for example, a process described by Shigeo Shingo4 in which a multi-axis drill press is supposed to make four reference holes in a flange. (Note: Shingo does not use this as a PFMEA example, or even mention PFMEA; I am using it as an illustrative case study, with inferences that do not appear in the book.) Improper alignment of the flange in the jig results in placement of the holes in the wrong locations, which presumably makes the flanges unusable in subsequent processes. Shingo explains, “Faulty mounting was prevented through worker vigilance,” and as soon as we see the phrase “worker vigilance” in a Shingo case study, we know that defects are being produced because administrative controls are of limited effectiveness regardless of how vigilant workers might be. Even if the flange is set correctly in the jig, it might leave the desired reference plane prior to contact with the drills, which would result in the holes being in the wrong positions.
The process item is therefore “Drill reference holes in a flange,” because this is what the process is supposed to accomplish. The process steps are 1) place flange in jig and ensure it contacts the three reference planes in the jig; 2) drill the holes; and 3) remove the finished part. The process elements for each step are manpower (the worker), machine (the jig and drill press, respectively), material (the flange), and environment (temperature and humidity). This gives us the following tree structure (figure 1). We are not going to pay much attention to the last step, removal of the part, because it is not a trouble source; if parts could be somehow damaged during handling, though, we would.
Note the resemblance of the tree structure to a set of cause-and-effect diagrams, in which the negative of the process step (e.g., “reference holes not drilled in the correct positions”) becomes the problem description with the head of the “fish” being at the left rather than the right of the traditional fishbone diagram (figure 2). This information would also be entered in the appropriate columns of the PFMEA.
Function analysis
This step defines the function of the process and process steps. The function of the process item describes what the overall process is supposed to achieve, which is in this case “drill reference holes (in correct positions).” The AIAG/VDA manual suggests using a verb followed by an object. The negative of the process item is the failure effect, e.g., “holes not drilled in the correct positions.”
Similarly, the function of the process step is how the step realizes the objective of the process, and its negative is the failure mode, i.e., how it fails to fulfill its intended function. The first process step is to place the flange in the jig so it contacts the three reference planes. If the flange is not placed in the jig so it contacts, and remains in contact with, the three reference planes, this is the failure mode that results in the failure effect.
The function of the process work element is what the work element does to complete the process step. The worker (manpower) puts the flange (material) in the jig (machine) so it contacts the reference planes. If the worker does not do this, or if the flange does not remain in contact with the reference planes, this is the failure cause (known previously as the failure mechanism). The failure cause is why the failure happens.
In summary:
• Failure effect = What happens that should not happen, or what doesn’t happen that should
• Failure mode = How the failure effect occurs
• Failure cause = Why the failure mode occurs (which is incidentally an open invitation to apply the 5 Whys to each potential failure cause)
The process step, and therefore the failure mode, is the focus element or starting point for the function analysis. We start by asking how the process step can fail to deliver the intended results and proceed from there to ask why this might happen (the failure cause) and what happens as a result (the failure effect). In this case, we begin with the first process step that says the flange must be placed in the jig so it contacts the three reference planes and remains in contact with them. The negative of this situation, that the flange does not contact the reference planes, is the failure mode. The failure effect is that the holes are not drilled in the right places. The potential failure causes include manpower (worker not vigilant) and machine (the jig does not hold the work correctly or reliably).
The AIAG/VDA manual also cites for this step the process characteristics and product characteristics, which tie in with the control plan described in AIAG’s APQP manual5 and Dean H. Stamatis’ book, Advanced Quality Planning (CRC Press, 2018).6 Process characteristics are measurable and/or controllable aspects of the process, such as tool speed, feed rate, pressure, and temperature. Product characteristics are measurable or otherwise appraisable aspects of the product, such as a dimension, surface finish, or presence (or absence) of a required feature. Process characteristics are therefore usually subject to prevention controls, or controls that prevent the creation of nonconforming work. Product characteristics are usually subject to detection controls that detect nonconforming work, preferably before it can reach the next internal or external customer.
If, however, feedback can be obtained from the product and used as a process control, this becomes a prevention control. This is common in the chemical process industry, where various product characteristics, such as temperature, pressure, and composition, are measurable during product realization and are usable as feedback by automatic process controllers that adjust the process characteristics accordingly.
These principles also tie in with Shigeo Shingo’s Zero Quality Control (Routledge, 1986), as well as Ford Motor Co.’s “don’t take it, don’t make it, don’t pass it along,” with “it” relating to poor quality.7 Similarly, AIAG CQI-18 (p. 41) says to 1) not accept a defect; 2) not create a defect; and 3) not pass a defect.8 Shingo’s vertical-source inspection checks incoming work to make sure it is acceptable. Prevention controls address the second item, which is to not create a defect. Detection controls support “don’t pass it along.” Figure 3 illustrates the concept. “Inspection” can be machine-based or automated; remember that inspections by people are often labor-intensive and also of limited reliability.
Risk analysis
This step identifies preventive and detection controls, and assigns severity, occurrence, and detection ratings to each failure mode and its associated cause. Severities can be considered not only for quality but also continuity of operations; that is, can it shut down our factory or the customer’s factory? If a failure mode can injure a worker, it gets a very high severity rating; 10 for “acute” and eight for “chronic,” per Table P1, PFMEA Severity. My personal preference (not formal engineering advice) would be to assign a nine for anything capable of harming a worker, and AIAG’s previous FMEA manual (fourth edition, 2008) assigned a nine to process failure modes capable of injuring a worker “with warning” while 10 was for the same “without warning.”
In this case, failure to drill the reference holes in the correct locations will result in inability to assemble the flange into a product, or even worse, assemble it out of position and thus make the entire product nonconforming. Consult table P1, PFMEA Severity, of the AIAG/VDA manual to get a severity of roughly seven, which means a portion of the product may have to be scrapped (as it is difficult to envision somehow filling in the improperly placed holes); if the defective flanges reach the customer, that may result in a line shutdown. Table P2, PFMEA Occurrence, yields an occurrence rating of about eight because the administrative control, which relies on worker vigilance, is “somewhat effective” in preventing the failure cause (worker does not place the flange properly, or the flange does not remain in place). Words like “somewhat” and “should” are not very reassuring.
The detection rating is from Table P3, PFMEA Detection. This looks like about eight because there appears to be no machine-based inspection or autonomation to catch nonconforming work, and human inspection “should” detect the problem, which is no more reassuring than a statement that errors or defects “were prevented through worker vigilance” in a Shingo case study. The next step is to look at Table AP, Action Priority for DFMEA and PFMEA. S = 7, O = 8, and D = 8 yields an action priority of high on a scale of low, medium, and high. The ratings are telling us essentially that, if the failure mode happens, the product will be unusable and nonreworkable, and is likely to cause trouble further down the line. In addition, the existing prevention controls (worker vigilance) cannot be relied upon to prevent the failure, and there are no really good detection controls, either. This combination means we have an obvious problem, so the action priority is high.
Optimization
Optimization means changes that 1) eliminate the failure effect; 2) reduce the occurrence rating; or 3) reduce the detection rating. Shingo describes how electric wires were connected, via bolts, to the three reference planes in the jig. If current did not flow through all three reference planes, it meant the part was not aligned properly, and the drill would therefore not operate. This is a form of jidoka or autonomation, in which the machine can detect an abnormal or undesirable condition and stop itself until the problem is corrected. This reduces the detection rating to two, which requires a machine-based method (i.e., an engineering control as opposed to an administrative control) that will detect the failure cause automatically and prevent generation of the nonconformance. This detection control cost, incidentally, about $25 in the money of the 1980s.
The detection rating has, however, the least effect on the action priority. The new ratings (S = 7, O = 8, D = 2) still result in a high action priority even though the chance of generating, much less shipping, nonconforming work is now negligible. Shingo reported, in fact, “mounting errors disappeared,” which suggests that the detection control was 100-percent effective in preventing the generation of nonconforming work.
From what I can infer from the case study, a remaining issue might have been the need for the worker to intervene and reposition flanges that did not contact the three reference planes. This is a form of what Gen. Carl von Clausewitz called friction,9 a seemingly minor annoyance or inefficiency whose sufficiently frequent repetition will degrade organizational performance. This concept carries over into minor stoppages that, if sufficiently frequent, have major and undesirable effects on performance and efficiencies, and are otherwise known as muda or waste. James Halpin described the issue as “…the little things that get under a worker’s skin but are never quite important enough to make him come to management for a change.”10
The fact that the process no longer generates nonconforming work might therefore lull people into ignoring the fact that the worker still has to intervene frequently to correct misalignment of the part in the jig, except for the fact that the action priority table won’t give us a low result if the occurrence rating remains unacceptable. The action priority table will, however, return a low action priority when O = 1 (“Failure mode cannot be physically produced due to the failure cause”) even for S = 10 (the failure mode can cause death or serious injury) and D = 10 (the failure mode cannot be detected). This underscores the much greater weight the action priority table gives the prevention controls over the detection controls. If the failure mode cannot be generated (as in a probability of zero), we can ignore it safely. The nice thing about many forms of error-proofing, such as part geometries that make it physically impossible to assemble parts backward, is their ability to achieve the O = 1 criterion in the AIAG/VDA manual: “Failure mode cannot be physically produced due to the failure cause.”
Suppose, for example, mechanically operated plungers can be added to press the flange from two directions to force it into contact with the reference planes, which would confirm contact through the detection control that Shingo described. This would ideally reduce O to two or three (“highly effective” in preventing the failure cause), which in combination with D = 2, reduces the action priority to low.
Documentation
The completed PFMEA becomes a quality record that supports ISO 9001:2015, Clause 7.1.6—Organizational knowledge. Any lessons learned should also be deployed to similar processes. The Shingo example addresses a problem with the reliable positioning of a part on a jig. Does the factory have other machining processes that rely on work-holding jigs, clamps, or fixtures? The Shingo reference, in fact (example 87, p. 235) has a case study in which the operator of a press-punching machine had to set the tool position manually. Hole slippage resulted from incorrect positioning of the arms of the press-punching die. Adding a positioning jig (a prevention control) prevented the slippage, and correct positioning was announced with a signal light (visual detection control). Shingo’s statement that nonconforming work was eliminated suggests that the occurrence rating was reduced to three or less (“highly effective” or even “failure mode cannot be physically produced”).
In addition, the PFMEA must be reassessed when anything new or different is added to the process. The words “new,” “changed,” or “different” are warnings that new failure modes and failure causes may now be present.
Tie-in with other processes of the quality management system
PFMEA is synergistic with other processes of a quality management system, including the control plan that is required by IATF 16949:2016, Clause 8.5.1.1, which relates to ISO 9001:2015’s Clause 8.5.1—Control of production and service provision. Clause 8.5.1.1 is not found in ISO 9001, which is a strong argument for users of this standard to purchase IATF 16949 and identify the numerous clauses that might enhance the effectiveness of their quality management system.11 A control plan can be appended to a PFMEA to create a dynamic control plan. The control plan, like the PFMEA, addresses process and product characteristics as well as associated prevention and detection controls (in the form of control methods). Another column is reserved for the reaction plan(known as the out-of-control action plan, or OCAP, in statistical process control), which tells the worker what to do if there is a problem.
There is also a connection with corrective and preventive action (CAPA). AIAG’s Effective Problem Solving Guide (AIAG, 2018)12 says to look for not one but three root causes for any problem:
• The occurrence root cause is why the problem happened, which suggests inadequate prevention controls. The occurrence root cause also relates to “don’t make it” with “it” meaning poor quality. If the corrective action is to apply a new or better prevention control, the PFMEA should be updated accordingly.
• The escape root cause is why the nonconformance reached the next internal or external customer (if it did), which relates to the detection controls, as in “don’t pass it along.” Corrective action for the escape root cause is likely to involve a new or better detection control, for which the PFMEA also requires modification.
• The systemic root cause is why the planning process did not anticipate the problem in the first place, and this could include (as but one example) failure to share lessons learned with related activities.
Improved controls will also improve the risk ratings, and therefore possibly the action priority. In addition, if the CAPA process discovers a new failure mode or cause that did not appear in the original PFMEA, the PFMEA must be updated to include it.
Font: Quality Digest