In the development of complex electronics (or any system), several sets of processes and activities are performed that support the development. This required support plays an integral part and has a distinct purpose that affects the overall success and quality of the complex electronics. Each process applies across the life cycle and supports the entire development effort.
The supporting processes for complex electronics development are:
The purpose of implementing a system safety process is to analyze a device or system in a proactive effort to prevent accidents from occurring. "System Safety is the application of engineering and management principles, criteria, and techniques to optimize all aspects of safety." 1 System Safety is a deliberate decision by management to prevent accidents from occurring in an operation, and is designed to lead to a safer, more effective organization.
The system safety process is applied at the start of system development by identifying all potential risks. The design can then be modified in order to eliminate the risks through engineering changes or administrative controls. Hazard controls can be implemented after the risks are identified. This may involve the creation and placement of safety devices or warning devices. There might be risks that cannot be eliminated but can be addressed by using engineering controls in a manner that will prevent an accident.
For complex electronics, the system safety engineer may not be aware of the device as "soft hardware". It is important for the system safety engineer to analyze the functions the device will perform as part of the hazard analysis. If the complex electronics is used as part of a hazard control, the assurance engineer needs to know this information. Communication between the system safety and assurance engineers is vital to ensuring a safe system.
Whenever the complex electronics is significantly changed, the system safety engineer should be informed, in order to assess whether the change adds or modifies a hazard, or affects a hazard control. The assurance engineer is likely to be the first to know about the potential changes to the design or how the device will be used. Any safety concerns the assurance engineer has about the complex electronics development or design should be shared with system safety.
The assurance engineer should review the outputs of the system safety process, such as the Preliminary Hazard Analysis and subsequent analyses. Look for any elements (components) that interface with the complex electronics. If the complex electronics fails or has a design flaw, can it affect a safety control? Analyses performed by the assurance engineer, such as criticality mapping and traceability analysis, may provide information back to system safety, for incorporation in the system level analyses.
1 Harold E. Roland, Brian Moriarity (1990). System Safety Engineering and Management (2nd Ed.). John Wiley & Sons, Inc.
NPR 7120.5C, NASA Program and Project Management Processes and Requirements, defines Risk Management as "An organized, systematic decision making process that efficiently identifies, analyzes, plans, tracks, controls, communicates, and documents risk to increase the likelihood of achieving program/project goals." Risk Management is a project-wide process that considers the most important risks to the success of the project and works to eliminate or mitigate them.
Risk management is an important tool that projects can use in reducing the probability or impact of risks. Complex electronics has some similarities to software, including the fluidity of the requirements, interface problems with other elements of the system, integration issues (often a result of the interface problems), and the need to create a complex "program" within a defined period of time. These types of issues are ideal for risk management mitigation.
As the project moves through the life cycle, the risks will change. For complex electronics, the first risks relate to the choice of technology and tools: Will the device be adequate for the currently known and anticipated functions? What problems might my choice of tools cause, including schedule problems due to a high learning curve and tool-induced inefficiencies or errors in the netlist? Other risks, such as inability to test in an operational mode due to hardware limitations, or the addition of new requirements, become of concern at various times in the project life cycle. Risks need to be periodically evaluated and mitigations updated.
More information on Risk Management within NASA can be found at http://pbma.hq.nasa.gov.
Configuration Management is, unfortunately, not often used for complex electronics design artifacts. The final design is usually saved, but the intermediate development artifacts are under the control of the designer. While formal configuration management might not be necessary until the design is finalized (baselined), some form of informal control (e.g., use of a version management system) is recommended. Being able to revert to a previous version of the design is useful when problems are discovered during development. Being able to recreate versions of the design might also be useful to help narrow down when a problem was introduced.
Once the design is baselined, formal configuration management should be applied to the design. CM includes change control. This means that a process is in place for any changes to be approved prior to the changes being implemented. Often a Configuration Control Board or an engineering board is used to review and approve (or disallow) the changes. Change control assures that:
For complex electronics, the following information and documentation should be configuration managed:
The purpose of problem reporting, tracking, and corrective action is to record problems and ensure correct disposition and resolution. Problems may be process oriented (non-compliance with plans and standards or deficiencies in life cycle process outputs) or product oriented (anomalous behavior of products, defects in the tools). Having a problem reporting and resolution system provides
Problem reporting and resolution is a project-wide function. Complex electronics may be identified as a cause of the problem, or may be modified to work around problems that are too expensive to fix in other parts of the system. The assurance engineer's responsibility is to review the problem database and identify problems that relate to the complex electronics and verify that they are correctly dispositioned. Trend analysis may show trouble spots in the system. If the complex electronics is a trouble spot, then additional assurance effort needs to be focused on the complex electronics to ensure it is design, programmed, and tested sufficiently.
Most reliability studies look at the hardware failure rates for the devices in a system. While failure of the actual device (e.g., FPGA) can be known, the failures related to design errors or unexpected interactions within the FPGA, once it is programmed, are not easy to determine. Most reliability evaluations ignore software for this very reason.
While there is currently no good way to assess the reliability of a complex electronic design, the fact that there may be design errors should be considered by the reliability engineer. At a minimum, the confidence in the resulting numbers (mean-time-to-failure, system reliability) is lowered.