Failure Mode & Effects Analysis [FMEA]
FMEA is a procedure in product development and operations management for analysis of potential failure modes within a system for classification by the severity and likelihood of the failures. A successful FMEA activity helps a team to identify potential failure modes based on past experience with similar products or processes, enabling the team to design those failures out of the system with the minimum of effort and resource expenditure, thereby reducing development time and costs. It is widely used in manufacturing industries in various phases of the product life cycle and is now increasingly finding use in the service industry. Failure modes are any errors or defects in a process, design, or item, especially those that affect the customer, and can be potential or actual. Effects analysis refers to studying the consequences of those failures.
In FMEA, failures are prioritized according to how serious their consequences are, how frequently they occur and how easily they can be detected. An FMEA also documents current knowledge and actions about the risks of failures for use in continuous improvement. FMEA is used during the design stage with an aim to avoid future failures (sometimes called DFMEA in that case). Later it is used for process control, before and during ongoing operation of the process. Ideally, FMEA begins during the earliest conceptual stages of design and continues throughout the life of the product or service.
The outcome of an FMEA development is actions to prevent or reduce the severity or likelihood of failures, starting with the highest-priority ones. It may be used to evaluate risk management priorities for mitigating known threat vulnerabilities. FMEA helps select remedial actions that reduce cumulative impacts of life-cycle consequences (risks) from a systems failure (fault).
PRE-WORK
The process for conducting an FMEA is straightforward. It is developed in three main phases, in which appropriate actions need to be defined. But before starting with an FMEA, it is important to complete some pre-work to confirm that robustness and past history are included in the analysis.
A robustness analysis can be obtained from interface matrices, boundary diagrams, and parameter diagrams. Many failures are due to noise factors and shared interfaces with other parts and/or systems, because engineers tend to focus on what they control directly.
To start it is necessary to describe the system and its function. A good understanding simplifies further analysis. This way an engineer can see which uses of the system are desirable and which are not. It is important to consider both intentional and unintentional uses. Unintentional uses are a form of hostile environment.
Then, a block diagram of the system needs to be created. This diagram gives an overview of the major components or process steps and how they are related. These are called logical relations around which the FMEA can be developed. It is useful to create a coding system to identify the different system elements. The block diagram should always be included with the FMEA.
Before starting the actual FMEA, a worksheet needs to be created, which contains the important information about the system, such as the revision date or the names of the components. On this worksheet all the items or functions of the subject should be listed in a logical manner, based on the block diagram.
STEP 1 : OCCURENCE
In this step it is necessary to look at the cause of a failure mode and how many times it occurs. This can be done by looking at similar products or processes and the failure modes that have been documented for them. A failure cause is looked upon as a design weakness. All the potential causes for a failure mode should be identified and documented. Again this should be in technical terms. Examples of causes are: erroneous algorithms, excessive voltage or improper operating conditions. A failure mode is given an occurrence ranking (O), again 1–10. Actions need to be determined if the occurrence is high (meaning > 4 for non-safety failure modes and > 1 when the severity-number from step 1 is 9 or 10). This step is called the detailed development section of the FMEA process. Occurrence also can be defined as %. If a non-safety issue happened less than 1%, we can give 1 to it. It is based on your product and customer specification.
STEP 2 : SEVERITY
Determine all failure modes based on the functional requirements and their effects. Examples of failure modes are: Electrical short-circuiting, corrosion or deformation. A failure mode in one component can lead to a failure mode in another component, therefore each failure mode should be listed in technical terms and for function. Hereafter the ultimate effect of each failure mode needs to be considered. A failure effect is defined as the result of a failure mode on the function of the system as perceived by the user. In this way it is convenient to write these effects down in terms of what the user might see or experience. Examples of failure effects are: degraded performance, noise or even injury to a user. Each effect is given a severity number (S) from 1 (no danger) to 10 (critical). These numbers help an engineer to prioritize the failure modes and their effects. If the severity of an effect has a number 9 or 10, actions are considered to change the design by eliminating the failure mode, if possible, or protecting the user from the effect. A severity rating of 9 or 10 is generally reserved for those effects which would cause injury to a user or otherwise result in litigation.
STEP 3 : DETECTION
When appropriate actions are determined, it is necessary to test their efficiency. In addition, design verification is needed. The proper inspection methods need to be chosen. First, an engineer should look at the current controls of the system, that prevent failure modes from occurring or which detect the failure before it reaches the customer. Hereafter one should identify testing, analysis, monitoring and other techniques that can be or have been used on similar systems to detect failures. From these controls an engineer can learn how likely it is for a failure to be identified or detected. Each combination from the previous 2 steps receives a detection number (D). This ranks the ability of planned tests and inspections to remove defects or detect failure modes in time. The assigned detection number measures the risk that the failure will escape detection.
A high detection number indicates that the chances are high that the failure will escape detection, or in other words, that the chances of detection are low.
RISK PRIORITY NUMBERS
RPN play an important part in the choice of an action against failure modes. They are threshold values in the evaluation of these actions.
After ranking the severity, occurrence and detectability the RPN can be easily calculated by multiplying :
RPN = S × O × D
This has to be done for the entire process and/or design. Once this is done it is easy to determine the areas of greatest concern. The failure modes that have the highest RPN should be given the highest priority for corrective action. This means it is not always the failure modes with the highest severity numbers that should be treated first. There could be less severe failures, but which occur more often and are less detectable.
After these values are allocated, recommended actions with targets, responsibility and dates of implementation are noted. These actions can include specific inspection, testing or quality procedures, redesign (such as selection of new components), adding more redundancy and limiting environmental stresses or operating range. Once the actions have been implemented in the design/process, the new RPN should be checked, to confirm the improvements. These tests are often put in graphs, for easy visualization. Whenever a design or a process changes, an FMEA should be updated.
A few logical but important thoughts come in mind:
Try to eliminate the failure mode (some failures are more preventable than others)
Minimize the severity of the failure
Reduce the occurrence of the failure mode
Improve the detection
When we would need uses of FMEA :
- Development of system requirements that minimize the likelihood of failures.
- Development of methods to design and test systems to ensure that the failures have been eliminated.
- Evaluation of the requirements of the customer to ensure that those do not give rise to potential failures.
- Identification of certain design characteristics that contribute to failures, and minimize or eliminate those effects.
- Tracking and managing potential risks in the design. This helps avoid the same failures in future projects.
- Ensuring that any failure that could occur will not injure the customer or seriously impact a system.
- To produce world class quality products
Advantages could be gained by using FMEA :
- Improve the quality, reliability and safety of a product/process
- Improve company image and competitiveness
- Increase user satisfaction
- Reduce system development timing and cost
- Collect information to reduce future failures, capture engineering knowledge
- Reduce the potential for warranty concerns
- Early identification and elimination of potential failure modes
- Emphasize problem prevention
- Minimize late changes and associated cost
- Catalyst for teamwork and idea exchange between functions
- Reduce the possibility of same kind of failure in future
- Reduce impact of profit margin company
- Reduce possible scrap in production
Limitations of FMEA :
Since FMEA is effectively dependent on the members of the committee which examines product failures, it is limited by their experience of previous failures. If a failure mode cannot be identified, then external help is needed from consultants who are aware of the many different types of product failure. FMEA is thus part of a larger system of quality control, where documentation is vital to implementation. General texts and detailed publications are available in forensic engineering and failure analysis. It is a general requirement of many specific national and international standards that FMEA is used in evaluating product integrity
Additionally, the multiplication of the severity, occurrence and detection rankings may result in rank reversals, where a less serious failure mode receives a higher RPN than a more serious failure mode. The reason for this is that the rankings are ordinal scale numbers, and multiplication is not defined for ordinal numbers. The ordinal rankings only say that one ranking is better or worse than another, but not by how much. For instance, a ranking of "2" may not be twice as bad as a ranking of "1," or an "8" may not be twice as bad as a "4," but multiplication treats them as though they are.
Anyhow, these are just tools, we should use it for an improvement, and not to limit our creativity .
In FMEA, failures are prioritized according to how serious their consequences are, how frequently they occur and how easily they can be detected. An FMEA also documents current knowledge and actions about the risks of failures for use in continuous improvement. FMEA is used during the design stage with an aim to avoid future failures (sometimes called DFMEA in that case). Later it is used for process control, before and during ongoing operation of the process. Ideally, FMEA begins during the earliest conceptual stages of design and continues throughout the life of the product or service.
The outcome of an FMEA development is actions to prevent or reduce the severity or likelihood of failures, starting with the highest-priority ones. It may be used to evaluate risk management priorities for mitigating known threat vulnerabilities. FMEA helps select remedial actions that reduce cumulative impacts of life-cycle consequences (risks) from a systems failure (fault).
PRE-WORK
The process for conducting an FMEA is straightforward. It is developed in three main phases, in which appropriate actions need to be defined. But before starting with an FMEA, it is important to complete some pre-work to confirm that robustness and past history are included in the analysis.
A robustness analysis can be obtained from interface matrices, boundary diagrams, and parameter diagrams. Many failures are due to noise factors and shared interfaces with other parts and/or systems, because engineers tend to focus on what they control directly.
To start it is necessary to describe the system and its function. A good understanding simplifies further analysis. This way an engineer can see which uses of the system are desirable and which are not. It is important to consider both intentional and unintentional uses. Unintentional uses are a form of hostile environment.
Then, a block diagram of the system needs to be created. This diagram gives an overview of the major components or process steps and how they are related. These are called logical relations around which the FMEA can be developed. It is useful to create a coding system to identify the different system elements. The block diagram should always be included with the FMEA.
Before starting the actual FMEA, a worksheet needs to be created, which contains the important information about the system, such as the revision date or the names of the components. On this worksheet all the items or functions of the subject should be listed in a logical manner, based on the block diagram.
Example of FMEA Worksheet |
In this step it is necessary to look at the cause of a failure mode and how many times it occurs. This can be done by looking at similar products or processes and the failure modes that have been documented for them. A failure cause is looked upon as a design weakness. All the potential causes for a failure mode should be identified and documented. Again this should be in technical terms. Examples of causes are: erroneous algorithms, excessive voltage or improper operating conditions. A failure mode is given an occurrence ranking (O), again 1–10. Actions need to be determined if the occurrence is high (meaning > 4 for non-safety failure modes and > 1 when the severity-number from step 1 is 9 or 10). This step is called the detailed development section of the FMEA process. Occurrence also can be defined as %. If a non-safety issue happened less than 1%, we can give 1 to it. It is based on your product and customer specification.
STEP 2 : SEVERITY
Determine all failure modes based on the functional requirements and their effects. Examples of failure modes are: Electrical short-circuiting, corrosion or deformation. A failure mode in one component can lead to a failure mode in another component, therefore each failure mode should be listed in technical terms and for function. Hereafter the ultimate effect of each failure mode needs to be considered. A failure effect is defined as the result of a failure mode on the function of the system as perceived by the user. In this way it is convenient to write these effects down in terms of what the user might see or experience. Examples of failure effects are: degraded performance, noise or even injury to a user. Each effect is given a severity number (S) from 1 (no danger) to 10 (critical). These numbers help an engineer to prioritize the failure modes and their effects. If the severity of an effect has a number 9 or 10, actions are considered to change the design by eliminating the failure mode, if possible, or protecting the user from the effect. A severity rating of 9 or 10 is generally reserved for those effects which would cause injury to a user or otherwise result in litigation.
STEP 3 : DETECTION
When appropriate actions are determined, it is necessary to test their efficiency. In addition, design verification is needed. The proper inspection methods need to be chosen. First, an engineer should look at the current controls of the system, that prevent failure modes from occurring or which detect the failure before it reaches the customer. Hereafter one should identify testing, analysis, monitoring and other techniques that can be or have been used on similar systems to detect failures. From these controls an engineer can learn how likely it is for a failure to be identified or detected. Each combination from the previous 2 steps receives a detection number (D). This ranks the ability of planned tests and inspections to remove defects or detect failure modes in time. The assigned detection number measures the risk that the failure will escape detection.
A high detection number indicates that the chances are high that the failure will escape detection, or in other words, that the chances of detection are low.
RISK PRIORITY NUMBERS
RPN play an important part in the choice of an action against failure modes. They are threshold values in the evaluation of these actions.
After ranking the severity, occurrence and detectability the RPN can be easily calculated by multiplying :
RPN = S × O × D
This has to be done for the entire process and/or design. Once this is done it is easy to determine the areas of greatest concern. The failure modes that have the highest RPN should be given the highest priority for corrective action. This means it is not always the failure modes with the highest severity numbers that should be treated first. There could be less severe failures, but which occur more often and are less detectable.
After these values are allocated, recommended actions with targets, responsibility and dates of implementation are noted. These actions can include specific inspection, testing or quality procedures, redesign (such as selection of new components), adding more redundancy and limiting environmental stresses or operating range. Once the actions have been implemented in the design/process, the new RPN should be checked, to confirm the improvements. These tests are often put in graphs, for easy visualization. Whenever a design or a process changes, an FMEA should be updated.
A few logical but important thoughts come in mind:
Try to eliminate the failure mode (some failures are more preventable than others)
Minimize the severity of the failure
Reduce the occurrence of the failure mode
Improve the detection
When we would need uses of FMEA :
- Development of system requirements that minimize the likelihood of failures.
- Development of methods to design and test systems to ensure that the failures have been eliminated.
- Evaluation of the requirements of the customer to ensure that those do not give rise to potential failures.
- Identification of certain design characteristics that contribute to failures, and minimize or eliminate those effects.
- Tracking and managing potential risks in the design. This helps avoid the same failures in future projects.
- Ensuring that any failure that could occur will not injure the customer or seriously impact a system.
- To produce world class quality products
Advantages could be gained by using FMEA :
- Improve the quality, reliability and safety of a product/process
- Improve company image and competitiveness
- Increase user satisfaction
- Reduce system development timing and cost
- Collect information to reduce future failures, capture engineering knowledge
- Reduce the potential for warranty concerns
- Early identification and elimination of potential failure modes
- Emphasize problem prevention
- Minimize late changes and associated cost
- Catalyst for teamwork and idea exchange between functions
- Reduce the possibility of same kind of failure in future
- Reduce impact of profit margin company
- Reduce possible scrap in production
Limitations of FMEA :
Since FMEA is effectively dependent on the members of the committee which examines product failures, it is limited by their experience of previous failures. If a failure mode cannot be identified, then external help is needed from consultants who are aware of the many different types of product failure. FMEA is thus part of a larger system of quality control, where documentation is vital to implementation. General texts and detailed publications are available in forensic engineering and failure analysis. It is a general requirement of many specific national and international standards that FMEA is used in evaluating product integrity
Additionally, the multiplication of the severity, occurrence and detection rankings may result in rank reversals, where a less serious failure mode receives a higher RPN than a more serious failure mode. The reason for this is that the rankings are ordinal scale numbers, and multiplication is not defined for ordinal numbers. The ordinal rankings only say that one ranking is better or worse than another, but not by how much. For instance, a ranking of "2" may not be twice as bad as a ranking of "1," or an "8" may not be twice as bad as a "4," but multiplication treats them as though they are.
Anyhow, these are just tools, we should use it for an improvement, and not to limit our creativity .
0 comments: