Skip to main content
← Back to Blog

FMEA for Spacecraft Programs: A Practical Guide

Failure Modes and Effects Analysis is required on every formal space program. Most are useless. Here is how to run an FMEA that actually surfaces real risks.

Failure Modes and Effects Analysis (FMEA) is a structured technique for identifying every way a system can fail and assessing the consequences of each failure. Done well, it produces a prioritized list of design improvements before you build anything. Done poorly, it produces a 200-row spreadsheet that no one reads, satisfies a contractual deliverable, and contributes nothing to mission success.

The goal of FMEA is not the document. The goal is the conversation. The right FMEA is a structured group exercise where the people who designed each subsystem walk through every component, name every failure mode they can imagine, and rate how bad each failure would be and how likely it is. The output is a list of failures that need design changes, additional testing, or operational mitigation. The document is just the artifact that captures the conversation.

The most common FMEA failure mode is doing it too late. By CDR, the design is largely frozen. Identifying a risky failure mode at CDR means you either accept the risk or absorb a costly redesign. The right time to start FMEA is at PDR, when the architecture is set but the details are still negotiable. Run a draft FMEA at PDR, refine it through CDR, and update it at TRR with as-built information.

Start with the functional decomposition. Every FMEA needs a clear hierarchy: subsystem → assembly → component → function. For each function, ask: what happens if this function fails? Then drill into the failure modes that could cause each functional failure. A reaction wheel has the function "produce torque on command." Failure modes include: bearing seizure, motor winding failure, encoder failure, electronics failure, lubrication degradation. Each failure mode has a different cause, a different likelihood, and a different mitigation.

Severity rating is a 1-5 or 1-10 scale describing how bad the consequence is. Standard categories: catastrophic (loss of mission), critical (loss of major capability), marginal (loss of redundancy or performance), minor (no operational impact). Pick a scale and be consistent. The exact numbers matter less than applying them consistently across the whole FMEA so that high-severity items stand out.

Likelihood rating is the probability that the failure occurs over the mission lifetime. Categories typically range from "frequent" (expected to occur multiple times) through "remote" (possible but unlikely) to "improbable" (cannot be predicted to occur). Likelihood is the hardest part of FMEA to do well, because you usually do not have flight data for new components. Use vendor failure rates, MIL-HDBK-217 estimates, or analogy to similar components on prior missions.

Detection rating is how likely the failure is to be caught before it causes mission impact. A high detection rating (easy to detect) means you can mitigate the failure operationally; a low detection rating (hard to detect) means the failure progresses silently and may cascade. Some FMEA frameworks (FMECA) combine severity, likelihood, and detection into a Risk Priority Number (RPN). Others rank by severity-likelihood pairs. Either approach works as long as it is consistent.

Single-point failures are the most important output of any spacecraft FMEA. A single-point failure is a failure of a single component that causes mission loss with no redundancy. NASA-STD-8729.1 requires every single-point failure to be either eliminated through redundancy or formally accepted with a justified rationale. Identifying single-point failures at PDR is non-negotiable for crewed and high-value robotic missions.

Common spacecraft single-point failures to look for: mechanical deployment mechanisms (one-shot, no second chance), launch vehicle separation systems, primary battery cells (in series), main propulsion valves, single-string flight computers in early designs, antenna feeds, and solar array drive mechanisms. Each of these requires explicit risk acceptance or a redundant design path.

Use the FMEA to drive verification planning. Every high-severity failure mode should have a corresponding verification activity: a qualification test, an analysis with conservative margins, or an inspection. The verification matrix should be cross-referenced to the FMEA so reviewers can see that every credible failure has been addressed. This is what review boards look for in FMEA — not the document itself, but evidence that the failures it identifies have been mitigated or accepted.

Common FMEA pitfalls: rating severity based on the worst-case consequence rather than the likely one (causing every line item to be "catastrophic"); missing failure modes because the FMEA team did not include the people who built the hardware; treating FMEA as a one-time deliverable rather than a living document; failing to close the loop by updating the design when high-RPN items are identified.

Software FMEA deserves special attention. Software failures do not have the same stochastic likelihood as hardware — software either has a bug or it does not. A useful software FMEA approach is to identify functions, identify the inputs that could cause incorrect outputs, and require code review and testing for each. Coverage targets (e.g., 100% MC/DC for safety-critical paths) replace the likelihood column.

FMEA is not a substitute for testing. It is a planning tool that tells you where to focus testing effort. A good FMEA identifies the failures most likely to cause mission impact and ensures that the test program covers them. Programs that rely on FMEA alone, without environmental qualification and acceptance testing, will discover failure modes during integration that the FMEA missed.

SMAD Portal's Risk Register and FMEA modules let you maintain failure modes alongside the requirements they trace to. Each risk has a likelihood, consequence, mitigation plan, and owner. When a requirement changes, the connected risks are flagged for review. The risk matrix is generated automatically from the live data, so the picture you show at PDR is the same picture the data tells at CDR — only updated.

More from the Blog

How to Build a Requirements Traceability Matrix (RTM)

A requirements traceability matrix connects every requirement to its parent, verification method, and responsible engineer. Here is how to build one that actually works.

The SMAD Methodology: A Practical Guide for Mission Engineers

SMAD is the standard reference for space mission engineering. Here is how its methodology translates from textbook concepts to daily engineering practice.

Satellite Link Budget Fundamentals: A Practical Guide

A link budget is the single most important communications analysis on a spacecraft program. Here is what every term means, how to compute it, and where teams get it wrong.

Ready to try it?

Start a free 30-day pilot and see how SMAD Portal handles your mission data.

Start Your PilotSee Features