Authors: Kunj Patel and Vishvam Mazumdar Platform: STM32F407VGT6 (Cortex-M4F) with FreeRTOS Original Context: MTE 546 Labs 1-3 (Multi-Sensor Fusion Course)
This document presents a comprehensive technical exploration of a real-time multi-sensor data fusion system, originally developed as part of the MTE 546 course (Multi-Sensor Fusion) and subsequently ported to the STM32F407 microcontroller platform. The project demonstrates the complete pipeline from theoretical foundations to embedded firmware implementation, bridging control theory, stochastic estimation, experimental sensor characterization, and real-time systems engineering.
Project Evolution: The original work consisted of three laboratory experiments conducted in MATLAB, progressing through the Joint Directors of Laboratories (JDL) Data Fusion Model:
- Lab 1 (JDL Level 0): Sensor characterization and empirical modeling of Sharp IR distance sensors
- Lab 2 (JDL Level 1): Extended Kalman Filter implementation for dynamic state estimation
- Lab 3 (JDL Level 2): Occupancy grid mapping with heterogeneous sensor fusion
The STM32 port translates these MATLAB simulations into production-quality embedded firmware, addressing the critical challenge of maintaining mathematical validity under real-time constraints on resource-limited hardware.
Core Technical Achievement: The system fuses heterogeneous sensor data from Sharp IR distance sensors (geometric information) and thermocouples (thermal information) to accomplish two objectives: (1) estimate the dynamic state (position and velocity) of a moving object using an Extended Kalman Filter, and (2) construct a spatial occupancy probability map representing environmental knowledge. The embedded implementation ensures deterministic execution timing, manages asynchronous sensor streams through queue-based inter-process communication, optimizes floating-point operations for ARM Cortex-M4F hardware, and implements robust fault handling mechanisms.
From MATLAB Validity to Embedded Correctness:
The core challenge of porting this project from MATLAB to an STM32 microcontroller was not computational feasibility but preserving mathematical validity under real-time constraints. In MATLAB, the Extended Kalman Filter implicitly assumes a perfectly uniform sampling interval. On embedded hardware, this assumption becomes fragile: execution jitter directly invalidates the state transition model. The STM32 implementation therefore treats timing determinism as a correctness requirement rather than an optimization. By using a fixed 20 ms update period enforced with osDelayUntil, the EKF's discrete-time dynamics remain physically consistent with the constant-velocity model validated in Lab 2. This preserves equivalence between the original simulation and the embedded implementation despite radically different execution environments.
The Joint Directors of Laboratories (JDL) Data Fusion Model provides a hierarchical framework for organizing fusion systems into manageable layers:
- Level 0 (Sub-Object Data Assessment): Sensor characterization, signal processing, and feature extraction. Raw measurements are converted into calibrated, meaningful quantities.
- Level 1 (Object Assessment): State estimation and tracking. Combines observations over time to estimate object position, velocity, and identity.
- Level 2 (Situation Assessment): Environmental mapping and context understanding. Aggregates object-level information into a coherent spatial representation.
- Level 3 (Impact Assessment): Threat evaluation and projection of future states.
- Level 4 (Process Refinement): Meta-level optimization of sensor management and fusion strategies.
This project implements Levels 0, 1, and 2, progressing from raw sensor voltages to calibrated distance measurements to state estimates to occupancy maps.
Hardware Configuration: The original laboratory experiments used a controlled testbed consisting of:
-
Sharp IR Distance Sensors (3 variants):
- Short Range: GP2Y0A41SK0F (4-30 cm operating range)
- Medium Range: GP2Y0A21YK0F (10-80 cm operating range)
- Long Range: GP2Y0A02YK0F (20-150 cm operating range)
- All sensors output analog voltage proportional to distance via optical triangulation
-
Thermocouples (5 sensors):
- Type-K thermocouples measuring surface temperature
- Positioned at strategic locations: four corners and center of a 10×10 cm aluminum plate
- Used to detect heat source location through thermal gradient analysis
-
Test Object:
- Wooden or aluminum block placed on the aluminum plate
- Block manually moved along a linear track for Lab 2 (state estimation experiments)
- Block positioned at various locations for Lab 3 (mapping experiments)
-
Data Acquisition:
- MATLAB-based data collection system
- Sensors sampled at different rates based on their natural bandwidth
- IR sensors: ~50 Hz (limited by ADC and processing)
- Thermocouples: ~5 Hz (limited by thermal inertia)
Experimental Progression:
Lab 1 Objective: Characterize the non-linear voltage-distance relationship of IR sensors and quantify measurement noise. This establishes the observation model
Lab 2 Objective: Implement an Extended Kalman Filter in MATLAB to track a moving block using fused IR sensor data. Validate filter performance under different tuning parameters (
Lab 3 Objective: Construct occupancy grid maps by fusing IR geometric data with thermocouple thermal data. Demonstrate that heterogeneous fusion resolves ambiguities that neither sensor alone can solve (e.g., distinguishing between two thermally equidistant locations using IR geometry).
Why Multi-Sensor Fusion? The necessity for fusion arises from fundamental sensor limitations:
- IR sensors provide high-speed, spatially precise distance measurements but suffer from non-linear response curves, sensitivity to surface reflectivity (albedo), line-of-sight occlusion, and Gaussian noise that increases with distance.
- Thermocouples provide robust environmental data immune to lighting and occlusion but exhibit extremely slow response times (thermal inertia of the aluminum plate causes 25-160 second delays), low spatial precision (temperature gradients are diffuse), and provide only indirect positional information.
By fusing these orthogonal modalities—geometry from optics and thermodynamics from contact sensors—the system overcomes the spatial sparsity of thermal data and the noise/occlusion limitations of optical ranging, achieving robustness impossible with single-source architectures.
The foundation of any robust estimator is an accurate sensor model. Lab 1 focused on characterizing the relationship between measured voltage and physical distance for three Sharp IR sensor variants, deriving mathematical models suitable for Extended Kalman Filter implementation.
Sharp IR sensors operate via optical triangulation. An infrared LED emits a narrow beam that reflects off an object and impinges on a Position Sensitive Detector (PSD) located at a fixed baseline distance from the emitter. The voltage output is a function of the angle of incidence on the PSD.
Geometrically, the position
However, empirical data deviates from pure geometric models due to:
- Electronic non-linearities in signal processing circuitry
- Surface reflectivity variations (albedo effects)
- Ambient IR interference
- ADC quantization (12-bit resolution)
- Thermal drift in analog components
Therefore, a data-driven empirical approach is necessary.
Data collection involved placing each sensor at precisely measured distances and recording voltage outputs over hundreds of samples:
Short Range Sensor (GP2Y0A41SK0F):
- Range: 4 cm to 30 cm
- Voltage: 2.34 V (at 4 cm) → 0.34 V (at 30 cm)
- Observations: Sharp voltage drop at close range, pronounced "knee" in curve
Medium Range Sensor (GP2Y0A21YK0F):
- Range: 10 cm to 80 cm
- Voltage: 1.99 V (at 10 cm) → 0.14 V (at 80 cm)
- Observations: Most consistent Gaussian noise characteristics
Long Range Sensor (GP2Y0A02YK0F):
- Range: 20 cm to 150 cm
- Voltage: 2.50 V (at 20 cm) → 0.40 V (at 150 cm)
- Observations: Highest noise variance at maximum range
Multiple regression models were evaluated:
| Model | Mathematical Form | Advantages | Disadvantages |
|---|---|---|---|
| Linear | Computationally trivial | Completely fails for hyperbolic response | |
| Inverse Linear | Captures general trend | Poor fit at extremes, no asymptotic offset | |
| Polynomial | Fast (MAC instructions) | Runge's phenomenon (oscillation at boundaries) | |
| Power 2 | Physics-motivated, excellent fit | Requires powf (50-100 CPU cycles) |
The Power 2 model was selected as optimal because:
-
Physical Justification: The exponent
$b \approx -1$ aligns with inverse geometric relationship -
Three Degrees of Freedom: Scaling (
$a$ ), curvature ($b$ ), and offset ($c$ ) allow precise fitting -
Asymptotic Behavior: The offset
$c$ correctly models the sensor's quiescent voltage at infinite distance
Non-linear least squares regression yielded:
Short Range:
Medium Range:
Long Range:
Key Observations:
- Negative exponents (
$b < 0$ ) confirm inverse relationship - Exponents deviate from ideal
$-1.0$ due to lens distortion and PSD non-linearities - Some negative offsets (
$c < 0$ ) are mathematical artifacts; firmware clamps voltages to 0-3.3V
The Extended Kalman Filter assumes measurement noise is zero-mean Gaussian:
Residual Analysis:
Residuals
| Sensor | Mean Error | Max Error | Std Dev ( |
|---|---|---|---|
| Long Range | ≈0 V | 0.15 V | 0.089 V |
| Medium Range | ≈0 V | 0.03 V | 0.015 V |
| Short Range | ≈0 V | 0.034 V | 0.026 V |
Near-zero mean validates the model is unbiased. Standard deviations provide values for the measurement noise covariance matrix
Histogram Analysis:
- Medium/Long Range: Bell-shaped distributions supporting Gaussian assumption
- Short Range Anomaly: Bimodal "two-spike" distribution for measurements at 24 cm, outlier at 12 cm
- Interpretation: Sensor toggles between discrete states (edge effects or ADC aliasing)
- Implication: Violates unimodal Gaussian assumption; requires outlier rejection (innovation gating)
Quantile-Quantile (QQ) Plots: QQ plots compared empirical error distributions against standard normal distribution:
-
Linear alignment along
$y=x$ within ±1σ indicates normality - Tail deviations ("heavy tails") indicate extreme noise events occur more frequently than pure Gaussian predicts
- Conclusion: Gaussian approximation is sufficient for EKF, but gating is recommended
Chi-Square Goodness of Fit: Chi-square tests formally tested normality:
- Passed (value 0) at many distances
- Failed (value 1) at range extremes
- Conclusion: While useful engineering approximation, true noise is likely a mixture of Gaussian and systematic effects (quantization, non-linearity)
For EKF implementation, the measurement Jacobian
For
Long Range Sensor Jacobian:
Medium Range Sensor Jacobian:
Physical Interpretation: The Jacobian represents sensor sensitivity:
- At close range (
$d$ small):$|H|$ is large → high sensitivity → measurements carry high information - At long range (
$d$ large):$|H| \to 0$ → low sensitivity → measurements automatically down-weighted by EKF
Embedded Optimization:
Note that powf() with multiply-divide, reducing CPU cycles significantly.
Lab 2 implemented an Extended Kalman Filter in MATLAB to track the position and velocity of a moving block using the sensor models characterized in Lab 1. The lab explored the effects of tuning parameters (
State Vector Definition:
The state captures the kinematics of the moving block:
$$x_k = \begin{bmatrix} d_k \ \dot{d}_k \end{bmatrix}$$
where
Process Model (Prediction Step):
Assumes a Constant Velocity Model discretized with time step
-
Lab 2 (MATLAB):
$T = 0.1$ s (100ms period, 10 Hz) -
STM32 Implementation:
$T = 0.02$ s (20ms period, 50 Hz)
State Transition Matrix: $$A = \begin{bmatrix} 1 & T \ 0 & 1 \end{bmatrix}$$
For Lab 2: $A = \begin{bmatrix} 1 & 0.1 \ 0 & 1 \end{bmatrix}$
For STM32: $A = \begin{bmatrix} 1 & 0.02 \ 0 & 1 \end{bmatrix}$
Process Noise Covariance Matrix
Lab 2 explored values:
- Initial guess: $Q = \begin{bmatrix} 0.2 & 0 \ 0 & 0.3 \end{bmatrix}$
- Optimized: $Q = \begin{bmatrix} 0.002 & 0 \ 0 & 0.03 \end{bmatrix}$
Physical Justification: Higher
The system uses two sensors simultaneously (Medium and Long Range IR) to correct state estimates.
Observation Function (Non-Linear):
Jacobian Matrix (Linearization): $$H_k = \begin{bmatrix} \frac{\partial V_{long}}{\partial d} & \frac{\partial V_{long}}{\partial \dot{d}} \ \frac{\partial V_{medium}}{\partial d} & \frac{\partial V_{medium}}{\partial \dot{d}} \end{bmatrix} = \begin{bmatrix} -29.088 \cdot d^{-1.873} & 0 \ -8.594 \cdot d^{-1.7114} & 0 \end{bmatrix}$$
Second column is zero because voltage depends only on distance, not velocity.
Measurement Noise Covariance Matrix
Diagonal structure assumes sensors are uncorrelated (independent hardware with separate signal paths).
Prediction Step:
- Propagate state: $\hat{x}k^- = A \hat{x}{k-1}$
- Propagate covariance:
$P_k^- = A P_{k-1} A^T + Q$
Update Step:
- Compute innovation:
$y_k = z_k - h(\hat{x}_k^-)$ - Linearize measurement:
$H_k = \frac{\partial h}{\partial x}\big|_{\hat{x}_k^-}$ - Compute innovation covariance:
$S_k = H_k P_k^- H_k^T + R$ - Compute Kalman gain:
$K_k = P_k^- H_k^T S_k^{-1}$ - Update state:
$\hat{x}_k = \hat{x}_k^- + K_k y_k$ - Update covariance:
$P_k = (I - K_k H_k) P_k^-$ (or Joseph form for stability)
Scenario 1: Constant Velocity Motion (Good Parameters)
- Parameters: $Q \approx \begin{bmatrix} 0.002 & 0 \ 0 & 0.03 \end{bmatrix}$, $R \approx \begin{bmatrix} 0.0006 & 0 \ 0 & 0.0006 \end{bmatrix}$
- Result: EKF effectively tracked object, smoothing sensor noise while maintaining correct trajectory
- Observation: Filter effectively averaged two sensors, providing better estimate than either alone
Scenario 2: High Sensor Noise (
- Parameters: $R = \begin{bmatrix} 0.09 & 0 \ 0 & 0.1 \end{bmatrix}$ (150x increase)
- Result: Filter heavily relies on prediction, placing less trust in measurements
- Effect: Smoother estimates (less jitter) but introduces phase lag
- Implication: Slow reaction to real velocity changes—catastrophic for real-time collision avoidance
-
Kalman Gain:
$K$ decreases → measurements down-weighted
Scenario 3: High Process Noise (
- Parameters: $Q = \begin{bmatrix} 0.6 & 0 \ 0 & 0.85 \end{bmatrix}$ (300x increase)
- Result: Filter distrusts physics model, heavily weights incoming measurements
- Effect: Jittery estimates tracking sensor noise rather than true state
- Implication: Filter degrades to raw sensor pass-through, losing smoothing benefits
-
Kalman Gain:
$K$ increases → measurements heavily weighted
Scenario 4: Stationary Object (Poor Initialization)
- Initialization: $x_{init} = \begin{bmatrix} 50 \ 0 \end{bmatrix}$ cm (True position: 70 cm)
- Result: EKF rapidly converged to true position
- Mechanism: Large innovation (difference between measured and predicted voltage) drove correction
- Observation: Position error corrected first, velocity error converged to zero more slowly
Scenario 5: Random Motion (Back-and-Forth)
- Motion: Manual random movement along track
- Result: EKF successfully tracked motion
- Artifact: Noticeable time delay between true motion and estimate
- Cause: Low-pass filtering characteristic—filter requires consistent measurements to "believe" direction change
- Explanation: Constant velocity model explicitly predicts no direction change; filter resists contradicting model until evidence accumulates
-
$Q$ /$R$ Ratio Controls Responsiveness vs. Smoothness:- Low
$Q$ /high$R$ → Trust model → Smooth but laggy - High
$Q$ /low$R$ → Trust sensors → Responsive but noisy
- Low
-
Dual-Sensor Fusion Improves Accuracy:
- Fusing Long Range + Medium Range provides redundancy
- Outliers from one sensor are rejected when other sensor disagrees
- Information gain maximized when sensors have orthogonal error characteristics
-
EKF Handles Non-Linearity Well:
- Despite severe non-linearity (
$V \propto d^{-0.87}$ ), local linearization via Jacobian is sufficient - Filter remains stable and converges correctly
- Despite severe non-linearity (
-
Inherent Phase Lag is Unavoidable:
- All causal filters exhibit delay between input change and output response
- Trade-off between lag and noise rejection is fundamental
- Cannot be eliminated, only tuned via
$Q$ and$R$
Lab 3 transitioned from temporal state estimation (tracking a single object over time) to spatial situation assessment (mapping the environment). The objective was to fuse IR geometric data with thermocouple thermal data to construct a probabilistic occupancy grid representing where objects are located in 2D space.
Grid Configuration: The experiment utilized a 10×10 occupancy grid with the origin at the top-left corner (0,0). Each cell represents a 1 cm × 1 cm area, covering the entire 10 cm × 10 cm aluminum plate.
Goal: Estimate the posterior probability of the map
Decomposition Assumption:
Due to high dimensionality (thousands of cells), the map is decomposed into independent cells:
Each cell
Log-Odds Representation: Standard Bayesian updates multiply probabilities, leading to numerical underflow on embedded systems. The log-odds transformation converts multiplication to addition:
Recursive Update Rule:
where
Why Log-Odds for Embedded Systems?
- Speed: Addition is significantly faster than floating-point division/multiplication
-
Numerical Range: Log-odds span
$-\infty$ to$+\infty$ , avoiding saturation near 0 and 1 -
Memory: Can be stored as
int8_t(-127 to +127), compressing maps by 4x vsfloat
Inverse sensor models map measurements to occupancy probability updates for each grid cell.
IR Sensor Inverse Model (Geometric):
The IR sensor provides distance
-
Cells at distance
$d$ (Endpoint): High probability of occupancy ($P=0.9$ , log-odds$\approx +2.2$ ) -
Cells
$< d$ (Free Space): Low probability of occupancy ($P=0.1$ , log-odds$\approx -2.2$ )- Ray Tracing: Mark all cells along the beam from sensor to endpoint as free
-
Cells
$> d$ (Occluded/Unknown): No information ($P=0.5$ , log-odds$= 0$ )
Lab 3 Simplification: Used endpoint model without full ray tracing—computationally cheaper but less informative.
Thermocouple Inverse Model (Thermal Gradient):
Thermocouples provide point temperatures
Temperature Decay Model:
Temperature at grid cell
Fusion of Multiple Thermocouples: The 5 thermocouple maps are combined using exponential weighting: $$\text{Map}{thermal} = \sum{i=1}^5 e^{\text{Map}_i(x,y)}$$
This creates a potential field where high-temperature regions indicate likely object locations. The exponential weighting emphasizes cells close to hot sensors.
Arithmetic Fusion (Used in Lab 3): $$\text{Occupancy}{final}(x,y) = \text{Occupancy}{IR}(x,y) + \text{Occupancy}_{thermal}(x,y)$$
- Pros: Simple, computationally fast
- Cons: Biases toward occupancy; not probabilistically rigorous
Bayesian Fusion (Optimal): $$l(m_i){fused} = l(m_i){IR} + l(m_i)_{thermal} - l_0$$
- Pros: Treats sensors as independent evidence sources; correctly handles conflicts
- Example: If IR sees "Empty" (log-odds -2) but Thermal sees "Hot" (+2), sum is 0 (Unknown)—correctly reflects ambiguity
- Cons: Requires proper log-odds inverse models for both modalities
IR-Only Localization (Figures 2-4):
- Convergence Time: ~3 seconds
- Characteristics: Sharp geometric boundaries, rapid localization
- Limitation: Line-of-sight only; fails with occlusion or low reflectivity
Thermal-Only Localization (Figures 5-7):
- Convergence Time: 25-160 seconds (extreme lag!)
- Cause: Thermal inertia of aluminum plate—heat propagation is diffusive, not instantaneous
- Physical Limit: No algorithm can overcome fundamental thermal conductivity/specific heat constraints
- Characteristics: Diffuse probability distributions, slow but robust to occlusion
Fused System (Figures 8-13): The fused system demonstrated synergistic benefits where the combination outperformed either sensor alone.
Critical Case Study: Heat Source at Bottom Center (Figure 7 vs. 13)
- Problem: Source equidistant between two thermocouples
- Thermal-Only Result: Bimodal distribution (two peaks)—ambiguity unresolved
- Why: Temperature gradient creates "saddle point" between sensors; both peaks equally likely
- Fused Result: IR sensor provided sharp geometric constraint (distance from sensor array)
- How Fusion Helps: IR measurement slices through thermal ambiguity, collapsing distribution to single correct peak
- Conclusion: Empirical proof that heterogeneous fusion resolves ambiguities neither sensor solves alone
Experimental Hardware Results (Figures 14-21): Real hardware showed higher noise and "smearing" compared to simulation:
- IR Artifacts: Noise increased at longer distances (as predicted by Power 2 model variance analysis)
- Thermal Artifacts: Air currents and uneven thermal contact caused drifting "hot spots"
- Fusion Performance: Despite noise, fused map (Figures 19, 21) successfully localized block
- Optimal Geometry: Fusion most effective at Top Left corner where IR beams intersect orthogonally (maximizes information gain)
The three laboratory experiments established the complete theoretical and empirical foundation for multi-sensor data fusion:
From Lab 1 (Sensor Modeling):
- Empirical characterization is essential; theoretical models alone are insufficient
- Power 2 fit provides optimal balance between physics motivation and fit quality
- Noise is approximately Gaussian but exhibits outliers requiring gating
- Measurement noise covariance
$R$ directly impacts filter performance
From Lab 2 (State Estimation):
- EKF successfully handles severe non-linearity via local linearization
-
$Q$ /$R$ ratio controls fundamental responsiveness vs. smoothness trade-off - Dual-sensor fusion provides redundancy and outlier rejection
- Phase lag is inherent to causal filtering; cannot be eliminated, only tuned
- Poor initialization is recoverable; filter converges given correct models
From Lab 3 (Spatial Mapping):
- Heterogeneous fusion (geometry + thermal) resolves ambiguities neither sensor solves alone
- Log-odds representation is superior to probabilities for embedded implementation
- Thermal sensors exhibit extreme lag (25-160s) due to physical constraints
- Arithmetic fusion is simple; Bayesian fusion is optimal
- Sensor geometry (intersection angles) critically affects information gain
Key Takeaway for Embedded Port: The MATLAB experiments validated the mathematical models and algorithmic approaches. The STM32 port must preserve this mathematical validity while meeting real-time constraints on resource-limited hardware. This requires careful attention to timing determinism, numerical stability, computational optimization, and fault tolerance.
The transition from MATLAB simulation to embedded firmware is not merely a translation of code—it fundamentally changes the design constraints and requires addressing challenges absent in simulation:
The "Time is Truth" Contract:
The EKF state transition matrix $A = \begin{bmatrix} 1 & \Delta t \ 0 & 1 \end{bmatrix}$ explicitly assumes a fixed time step. In the STM32 implementation,
Resource Constraints:
| Resource | MATLAB | STM32F407 | Implication |
|---|---|---|---|
| Memory | Unlimited | 192 KB SRAM | Must use static allocation, minimize heap fragmentation |
| Computation | GHz CPU, SIMD | 168 MHz, single FPU | Must optimize matrix operations, use CMSIS-DSP |
| Precision | Double (64-bit) default | Float (32-bit) hardware | Must validate numerical stability with reduced precision |
| Concurrency | Single-threaded | Multi-threaded RTOS | Must manage inter-task communication, priority inversion |
Key Design Principles for Embedded Port:
- Determinism Over Performance: Predictable timing beats average-case speed
- Static Over Dynamic: All memory allocated at compile-time; no malloc in real-time loops
- Asynchrony Over Synchrony: Sensors run at natural rates; queues decouple producers/consumers
-
Robustness Over Optimality: Conservative
$R$ values and gating prevent divergence from outliers - Validation at Every Layer: Test numerical precision, timing jitter, fault recovery independently
The STM32F407VGT6 microcontroller was selected as the target platform for several compelling reasons that align with the project's requirements for real-time deterministic execution, floating-point computation, and multi-tasking support.
The processor core is an ARM Cortex-M4F, which is specifically designed for digital signal processing and control applications. The "F" designation indicates the presence of a hardware Floating Point Unit capable of single-precision IEEE-754 operations. This FPU is absolutely critical for the Extended Kalman Filter implementation because EKF updates involve matrix multiplication, inversion, and transcendental functions. Without hardware FPU support, these operations would be emulated in software at a severe performance penalty—often 50 to 100 times slower—making real-time execution at 50Hz infeasible.
The device operates at a maximum frequency of 168 MHz, providing ample computational headroom for the filter operations while leaving CPU resources available for sensor acquisition, mapping tasks, and telemetry. The clock configuration uses the internal HSI oscillator at 16 MHz as the source, which is multiplied through the PLL to achieve the 168 MHz system clock. The AHB bus runs at 168 MHz for maximum memory bandwidth, while the APB1 and APB2 peripheral buses run at 42 MHz and 84 MHz respectively to stay within their maximum ratings.
Memory resources include 1 MB of Flash for program storage and 192 KB of SRAM for variables, stack, and heap. The Flash capacity is more than sufficient for the filter code, lookup tables, and RTOS kernel. The SRAM allocation is more constrained and requires careful management: the occupancy grid alone could consume 10 KB for a 100x100 grid at one byte per cell, the FreeRTOS heap is configured at 32 KB to support multiple task stacks, and the EKF state and covariance matrices require approximately 100 bytes.
The device includes a 12-bit ADC for reading analog sensor outputs, configurable up to 2.4 million samples per second though the project uses much lower rates. USART peripherals provide serial communication for telemetry output and command input. The availability of multiple timers allows precise periodic task triggering if needed beyond the RTOS scheduler.
FreeRTOS was selected as the Real-Time Operating System for this project because it provides the deterministic scheduling, inter-process communication primitives, and timing control required to maintain the EKF's mathematical validity. The fundamental problem addressed by the RTOS is the "time is truth" contract: the state transition matrix A in the EKF assumes a fixed time step delta-t, and if firmware execution jitter causes the actual time between updates to vary, the prediction becomes physically invalid and estimation error accumulates.
The RTOS configuration uses the CMSIS-V2 interface, which provides a standardized API abstraction over the underlying FreeRTOS kernel. This abstraction improves code portability across different RTOS implementations and ARM Cortex-M devices. Critical configuration parameters include enabling FPU context switching, which ensures that the floating-point registers s0-s31 are saved and restored during task switches; enabling newlib reentrancy, which makes standard C library functions like sprintf thread-safe; and allocating a 32 KB heap, which provides sufficient memory for task stacks and dynamic objects.
The task decomposition follows a priority-driven architecture where each functional component—sensor acquisition, state estimation, mapping, telemetry—runs as an independent task with priority assigned based on criticality and frequency. The highest priority task is the Fusion Task, which implements the EKF prediction and update steps. This task must run with absolute determinism at exactly 50 Hz, corresponding to a 20 millisecond period. The task implementation uses osDelayUntil, which causes the scheduler to wake the task at precise intervals regardless of execution time variations, preventing drift that would occur with a simple delay.
The IR Acquisition Task runs at the same 50 Hz rate with high priority just below the Fusion Task. This task reads the ADC, converts the raw digital value to voltage, applies the inverse sensor model to obtain distance, and enqueues the result for the Fusion Task. Running at the same rate as the filter ensures measurements are available exactly when needed.
The Velocity Task, if present, runs at a lower 20 Hz rate because velocity information arrives more slowly or is computed from position derivatives. The Thermocouple Task runs at only 5 Hz because thermal dynamics are inherently slow and faster sampling would only capture noise rather than meaningful signal changes.
The Mapping Task runs at just 1 Hz with low priority because occupancy grid updates involve heavy computation—iterating over a 10x10 grid, computing distances from sensor positions, updating log-odds values—but are not safety-critical and can tolerate latency. Decoupling this task from high-frequency state estimation prevents the computationally intensive mapping logic from starving sensor acquisition.
The Telemetry Task has the lowest priority and runs asynchronously, triggered by incoming data rather than a fixed period. This task formats state estimates and diagnostics into ASCII strings and transmits them over UART for visualization. UART I/O is inherently slow—at 115200 baud, transmitting 64 bytes requires over 5 milliseconds—and must strictly be decoupled from real-time critical paths.
The communication architecture between tasks uses FreeRTOS queues rather than shared memory protected by mutexes, a design decision driven by several important considerations.
Queues provide temporal decoupling: fast sensor tasks can burst data into the queue without waiting for slower consumer tasks to process it, and the queue buffering prevents data loss during transient load spikes. Queues preserve ordering: measurements are processed in the sequence they were received, which is critical for maintaining causal relationships. Queues avoid priority inversion: the mutex-based approach can cause the high-priority EKF task to block waiting for a low-priority task to release a shared resource, violating real-time constraints.
The IR data queue has a depth of 10 elements to handle bursts where multiple samples arrive before the Fusion Task can process them. Each element is an IR_Data_t structure containing voltage readings and validity flags for both long-range and medium-range sensors. The Thermocouple data queue has a depth of 5 elements, which is ample for the 5 Hz sensor rate. The state estimate queue has a depth of 5 for publishing EKF results to consumer tasks.
A global state mutex does exist to protect the shared GlobalState_t structure, which contains the current position estimate, velocity, thermocouple temperatures, and occupancy maps. This mutex is necessary because multiple tasks—Fusion writing position, Velocity writing velocity, Mapping reading both—access this structure concurrently. However, the design minimizes mutex contention by using a snapshot pattern: tasks acquire the mutex, copy the data they need into local variables, and immediately release the mutex, performing heavy computations on the local copy without holding the lock.
Priority inheritance is enabled on all mutexes to prevent unbounded priority inversion. If a low-priority task holds a mutex and a high-priority task blocks waiting for it, the low-priority task temporarily inherits the high priority to expedite lock release.
Hardware Configuration: The ADC1 peripheral is configured to read analog voltages from a single IR sensor connected to pin PA0 (ADC_IN0). The .ioc file confirms only one ADC channel is active:
- ADC1.Channel-0: ADC_CHANNEL_0
- ADC1.NbrOfConversionFlag: 1 (single conversion)
- ADC1.SamplingTime: 3 ADC clock cycles
The ADC operates in 12-bit resolution mode, providing digital values from 0 to 4095 corresponding to the 0V to 3.3V input range. The configuration uses single-conversion mode rather than continuous mode because we want precise control over sampling timing—each sample is triggered explicitly by the IR Acquisition Task at the start of its 20-millisecond period.
Current Implementation Status: The system is currently configured for single-sensor operation. In the IR Acquisition Task (main.c:438-470):
// Read Long Sensor
HAL_ADC_Start(&hadc1);
if (HAL_ADC_PollForConversion(&hadc1, 5) == HAL_OK) {
packet.voltage_long = (float)raw * (3.3f / 4095.0f);
packet.valid_long = 1;
}
// Medium Sensor (Not configured in hardware)
packet.voltage_med = 0.0f;
packet.valid_med = 0; // Explicitly disabledThe code is honest about this limitation with explicit comments: "Simulated for this demo if only 1 channel active. In real dual-sensor setup, configure Rank 2 in CubeMX."
Critical Design Decision: EKF Operates in Voltage Space Unlike the initial description which suggested inverting the sensor model first, the actual implementation is theoretically superior: the EKF operates directly in voltage space. In the Fusion Task (main.c:410-418):
ekf_update_scalar(ir_packet.voltage_long, h_long, H_long, R_LONG);The measurement z is the raw voltage, and the measurement function h_long(d) predicts voltage from the current distance estimate. This means:
- The innovation
y = V_measured - h(d_estimate)is computed in voltage space - The Jacobian
Hlinearizes in voltage space - The Gaussian noise assumption remains valid (noise is additive in voltage, not transformed through inverse functions)
This is the correct theoretical approach and avoids the noise transformation issues that would occur if we inverted the sensor model first. The distance estimate lives in the state ekf.x, and the EKF naturally handles the non-linear voltage-distance relationship through local linearization.
Extension to Dual-Sensor Configuration: Adding a second IR sensor requires:
- Configuring ADC1 Rank 2 in CubeMX for the second channel
- Setting
packet.valid_med = 1in the IR Acquisition Task - No changes to the EKF logic—the sequential update architecture already supports it
The architecture demonstrates staged development: the single-sensor implementation validates the EKF mathematics, RTOS scheduling, and system integration, with dual-sensor fusion being a straightforward hardware extension rather than a fundamental redesign.
The thermocouple interface in the current implementation is simulated—the BSP_Read_Thermocouple function returns a constant 25.0°C—because the focus is on architectural correctness rather than hardware integration. In a production system, thermocouples would be interfaced through an SPI or I2C amplifier IC with cold-junction compensation, and the read function would perform actual peripheral communication.
The USART2 peripheral is configured for asynchronous serial communication at 115200 baud, which provides a good balance between transmission speed and noise immunity. The Telemetry Task uses HAL_UART_Transmit to send formatted strings containing timestamps, position, velocity, and covariance values. The format is deliberately compact and parseable—comma-separated values on a single line—to minimize transmission time while allowing easy plotting and analysis in Python or MATLAB.
The EKF implementation in main.c demonstrates several optimization techniques critical for embedded systems while maintaining mathematical correctness.
The state representation uses a structure containing position x, velocity v, and four covariance elements p00, p01, p10, p11. This explicit structure is more cache-friendly than a separate matrix allocation and makes the code self-documenting. The covariance is stored as individual elements rather than a 2x2 array to avoid pointer indirection overhead during frequent access.
The ekf_init function initializes position to 20.0 cm (a reasonable mid-range value), velocity to zero, and covariance to the identity matrix representing complete initial uncertainty. In a production system with known initial conditions, the initial covariance would be set smaller to reflect that prior knowledge.
The ekf_predict function implements the discrete-time constant-velocity state propagation. The position update x_new = x + v * DT is straightforward Euler integration. The covariance update expands the matrix equation P = A * P * A^T + Q using the explicit 2x2 structure, avoiding general matrix multiplication routines. The expanded form p00 = p00 + DT*(p10 + p01) + DT^2*p11 + Q_POS implements the exact matrix product but uses only scalar operations. After the update, symmetry is explicitly enforced by averaging the off-diagonal elements—this numerical stability technique prevents roundoff error accumulation from causing the covariance to become non-symmetric, which would violate positive-definiteness and potentially cause filter divergence.
The ekf_update_scalar function processes a single scalar measurement using the Joseph form covariance update, which provides superior numerical stability compared to the naive (I - KH) * P form. The Joseph form P_new = (I - KH) * P * (I - K*H)^T + K * R * K^T is guaranteed to produce a positive-definite covariance even in the presence of roundoff error. The implementation explicitly expands this matrix equation into scalar operations, again avoiding general matrix libraries.
The sensor model functions h_long and h_med implement the forward voltage-distance relationship with clamping to prevent evaluation at distances below 1 cm where the model becomes numerically unstable. The Jacobian functions H_long and H_med compute the derivative analytically—for h(d) = a * d^b + c, the derivative is H(d) = a * b * d^(b-1). These derivatives are clamped to prevent blow-up at small distances.
Sequential Update Strategy: The Fusion Task implements a sequential scalar update architecture (main.c:405-420) that processes sensors independently:
// Step 2a: Update with Long Range Sensor (if valid)
if (ir_packet.valid_long) {
ekf_update_scalar(ir_packet.voltage_long, h_long, H_long, R_LONG);
}
// Step 2b: Update with Medium Range Sensor (if valid)
if (ir_packet.valid_med) {
ekf_update_scalar(ir_packet.voltage_med, h_med, H_med, R_MED);
}Current Operational Mode:
In the single-sensor configuration, only valid_long is true, so only one scalar update executes per cycle. This validates the core EKF mathematics without the complexity of multi-sensor fusion.
Dual-Sensor Equivalence Conditions: When the second sensor is enabled, this sequential approach is mathematically equivalent to a joint vector update if and only if:
- Sensors are uncorrelated (independent noise sources) — ✓ satisfied by separate signal paths
- Measurements arrive at the same time — ✓ satisfied by processing within same 20ms task iteration
The advantage over joint updates is avoiding 2x2 matrix inversion (computationally expensive) while achieving identical results when these conditions hold. The scalar innovation covariance S = H*p00*H + R requires only division, not matrix inversion.
The Mapping Task implements the occupancy grid update using a dual-map fusion strategy that combines thermal gradient information with IR geometric data.
The Update_Dual_Map function takes as input the five thermocouple temperatures, an IR occupancy map (which would be populated based on IR sensor history), and produces a fused occupancy map. The algorithm begins by constructing a thermal gradient map: for each grid cell, it computes the distance to each thermocouple sensor and adds the temperature weighted by distance, effectively creating a potential field where cells closer to hot sensors have higher values.
The sensor positions are defined in the THERMO_SENSOR_COORDS array with coordinates in centimeters matching the physical layout—sensors at the corners and center of the 10x10 cm grid. For each cell at position (x, y), the distance to sensor s is computed as sqrt((sensor_x - cell_x)^2 + (sensor_y - cell_y)^2), and the expected temperature at that cell is modeled as the sensor temperature minus a linear drop with distance: temp = sensor_temp - dist * TEMP_DROP_CM.
After accumulating contributions from all five sensors, the thermal gradient map is normalized to the range [0, 1] by finding the minimum and maximum values and linearly scaling. This normalization is essential because the absolute temperature values are less meaningful than the relative differences—a gradient indicates where thermal sources likely exist.
The fusion step combines the normalized thermal map with the IR map. The fusion rule is a weighted sum: fused = norm_thermal + IR_WEIGHT * ir_map, where IR_WEIGHT (0.3 in this implementation) determines how much the IR data modulates the thermal baseline. If the IR map indicates high occupancy probability at a location, the fused value is boosted; if IR indicates free space, the fused value remains primarily thermal.
A final normalization step scales the fused map back to [0, 1] for consistent interpretation as probabilities. This two-stage normalization—first thermal, then fused—ensures numerical stability and prevents any one modality from overwhelming the other.
Important Clarification on Log-Odds Implementation: The current implementation uses arithmetic fusion rather than full Bayesian log-odds fusion. While the document discusses log-odds representation extensively as the theoretically optimal approach, the actual STM32 code stores occupancy values as floating-point probabilities [0, 1] and combines them via weighted addition. The log-odds approach is explained as a planned upgrade that would provide better numerical properties and memory efficiency (int8_t storage), but it is not yet implemented in the current firmware. For interview discussions, you should frame this as: "The current implementation uses normalized probability fusion for simplicity and initial validation. A production enhancement would convert to log-odds storage with proper Bayesian updates, which would reduce memory by 4x and avoid numerical saturation issues."
Memory optimization for the occupancy grid could include sparse representations where only cells with significant evidence (log-odds far from zero) are stored explicitly, using a hashmap or quadtree data structure. This would dramatically reduce memory usage for large environments where most cells are empty.
Critical Achievement: Preserving Mathematical Validity Through Real-Time Guarantees
The most critical aspect of the STM32 port is maintaining the mathematical validity of the EKF under real-time constraints. This is achieved through deterministic timing that preserves the exact time step Δt assumed by the filter's physics model, ensuring the state transition matrix A remains physically correct at every iteration.
The deterministic timing that preserves the EKF's mathematical validity is achieved through careful use of the RTOS delay primitives and priority assignment.
Each periodic task maintains a local variable last_wake_time initialized to osKernelGetTickCount() at task start. At the end of each iteration, the task calls osDelayUntil(last_wake_time + period_ticks), then increments last_wake_time by period_ticks. This pattern implements absolute periodic scheduling: the task wakes at precisely timed intervals based on the original start time, not relative to when it finished execution.
The alternative approach using osDelay(period_ticks) would implement relative delays, causing drift: if the task takes 18 ms to execute and delays for 20 ms, the actual period becomes 38 ms. Worse, if execution time varies, the period becomes non-deterministic. The osDelayUntil approach guarantees that if the task finishes in 18 ms, it delays only 2 ms to maintain the exact 20 ms period.
The MS_TO_TICKS macro converts milliseconds to RTOS ticks accounting for the configurable tick frequency: MS_TO_TICKS(ms) = (ms * osKernelGetTickFreq()) / 1000. This ensures the code remains portable across different tick configurations—whether the system uses 1 ms ticks, 10 ms ticks, or custom values.
Task priorities are assigned according to Rate Monotonic Scheduling principles: higher-frequency tasks receive higher priority. This is provably optimal for fixed-priority preemptive scheduling under certain conditions. The Fusion Task and IR Acquisition Task both run at 50 Hz and have the highest priorities, with Fusion slightly above IR to ensure filter updates preempt sensor reads. Lower-frequency tasks have progressively lower priorities.
Execution time budgets were estimated through profiling and worst-case analysis. The Fusion Task's EKF update involves approximately 50 floating-point operations (multiplications, additions, divisions), which on the 168 MHz Cortex-M4F requires roughly 500 cycles or 3 microseconds. Even with function call overhead, queue operations, and mutex locking, total execution time remains well under 1 millisecond, leaving 19 milliseconds of slack in the 20-millisecond period.
The Mapping Task's grid update involves 10x10 cells times 5 sensors = 500 distance calculations, each requiring two multiplications and a square root. At approximately 50 cycles per cell, this totals 25,000 cycles or 150 microseconds, well within the 1-second period.
The system implements several fault tolerance mechanisms to maintain operation under degraded conditions.
Sensor dropout is detected when the sensor data queue remains empty. The Fusion Task checks queue status with a zero timeout—if osMessageQueueGet returns a failure status, no measurement update is performed and the iteration proceeds with prediction only. This prediction-only mode allows the system to continue estimating state based on the physics model, though the covariance P grows to reflect increasing uncertainty.
In a safety-critical system, unbounded covariance growth is unacceptable. The implementation would monitor the trace of P (the sum of diagonal elements p00 + p11) and trigger a failsafe when uncertainty exceeds a threshold. Possible failsafe actions include reducing velocity, requesting sensor diagnostics, switching to backup sensors, or performing an emergency stop.
Queue overflow is handled by the FreeRTOS queue implementation: when a producer attempts to enqueue an item to a full queue with zero timeout, the operation fails immediately and the oldest data is lost. The Telemetry Task logs queue depth as a diagnostic, allowing detection of scenarios where sensor data is arriving faster than it can be processed—potentially indicating insufficient CPU resources or excessive execution time in the Fusion Task.
Innovation gating could be implemented to reject outlier measurements. The Mahalanobis distance D = (y^T * S^-1 * y) measures how many standard deviations the innovation is from zero. If D exceeds a threshold (typically 9 for 3-sigma rejection), the measurement is discarded as likely faulty rather than incorporated into the state estimate.
The command interface mentioned in the original design guides allows UART reception of commands like "reset", "pause", "resume", "noise high", "noise low" to dynamically reconfigure the system or inject faults for testing. This would be implemented by the Telemetry Task reading USART2 receive buffer, parsing simple string commands, and setting global flags or calling configuration functions.
Before diving into specific design decisions, it's important to understand what the STM32 port successfully preserved from the MATLAB implementation and where deliberate simplifications were made.
Voltage-Space EKF as a Theoretical Improvement: Unlike the MATLAB implementation—which inverted the IR sensor model to estimate distance before filtering—the STM32 EKF operates directly in voltage space. This is a deliberate theoretical improvement. Measurement noise in the Sharp IR sensors is additive and approximately Gaussian in voltage, not in distance. Inverting the sensor model would distort the noise distribution and violate the EKF's Gaussian assumption. By keeping the measurement function as h(d) = a·d^b + c and computing innovations in voltage space, the embedded filter preserves the statistical assumptions underpinning Kalman filtering. This design choice improves estimator consistency and robustness without increasing computational complexity.
Embedded EKF Stability and Numerical Integrity: The EKF implementation emphasizes numerical stability, not just functional correctness. The covariance update uses the Joseph form rather than the simplified (I - K·H)·P expression, guaranteeing positive semi-definiteness even in the presence of floating-point roundoff. Covariance symmetry is explicitly enforced after each update to prevent slow numerical drift. These precautions are essential for long-running embedded systems where filters may operate indefinitely without reset. While such details are often omitted in academic prototypes, their inclusion here reflects production-grade estimator design rather than a classroom demonstration.
Multi-Rate Fusion Through Asynchronous Architecture: The STM32 system preserves the heterogeneous sensor philosophy validated in Labs 1–3 by allowing each sensor modality to operate at its natural rate. IR sensors update at 50 Hz, thermocouples at 5 Hz, and occupancy mapping at 1 Hz. FreeRTOS queues decouple these producers and consumers, preventing slow tasks from blocking time-critical estimation. This architecture mirrors the conceptual separation between JDL Level 1 (object assessment) and Level 2 (situation assessment) established in the original experiments, while ensuring that real-time constraints do not distort estimator behavior.
Staged Development, Not Partial Implementation: Some features—dual-sensor ADC configuration, hardware thermocouples, and log-odds occupancy grids—are intentionally staged rather than absent. The current firmware validates the EKF core, real-time scheduling, and architectural correctness using a single IR sensor and simulated thermal data. Crucially, the software architecture already supports dual-sensor fusion and Bayesian log-odds updates without structural changes. This reflects an engineering approach focused on incremental validation, where mathematical correctness and timing determinism are proven first before expanding hardware complexity.
Mathematical Validity Preserved (Exceptional Strengths):
- Δt Preservation: The time step is correctly scaled from Lab 2's 0.1s to STM32's 0.02s (DT = 0.02f in main.c:75), maintaining the physical validity of the state transition matrix A throughout the system.
- Voltage-Space EKF (Superior to MATLAB): The implementation operates directly in voltage space, passing raw voltage measurements to the EKF. This preserves the Gaussian noise assumption and is theoretically superior to inverting the sensor model first (which would transform noise characteristics).
- Joseph Form Covariance Update: The numerically stable Joseph form (main.c:229-260) is correctly implemented with explicit symmetry enforcement, ensuring positive-definiteness of the covariance matrix and preventing filter divergence. This is rare in student projects and demonstrates strong understanding.
- Multi-Rate Asynchronous Architecture: IR sensors run at 50 Hz while thermocouples run at 5 Hz. The queue-based RTOS design elegantly handles this asynchrony without forcing artificial synchronization.
- Deterministic Real-Time Scheduling: osDelayUntil ensures strict periodicity, making the EKF time-step assumption physically valid. This is a correctness requirement, not a performance optimization.
- Sensor Model Fidelity: The Power 2 fit models (coefficients, exponents, offsets) are precisely replicated from Lab 1 characterization in the sensor model functions.
Current Implementation Status (Staged Development):
- Single-Sensor Configuration: The .ioc file confirms only one ADC channel (ADC1_IN0 on PA0) is configured. The code explicitly sets
valid_med = 0with comments: "Simulated for this demo if only 1 channel active." - Dual-Sensor Architecture in Place: The sequential scalar update logic (main.c:405-420) is fully implemented and ready for the second sensor. Enabling it requires only configuring ADC Rank 2 in CubeMX—no code changes needed.
- Validation Baseline: Hardware results should be compared against single-sensor MATLAB runs from Lab 1/2, not the dual-sensor experiments.
Deliberate Simplifications (Not Errors, But Documented Trade-offs):
- Arithmetic vs. Log-Odds Fusion: The occupancy grid uses normalized probability arithmetic fusion rather than full Bayesian log-odds updates. This simplifies initial validation while the log-odds approach remains a planned enhancement for memory efficiency and numerical robustness.
- Simulated Thermocouples: BSP_Read_Thermocouple returns constant 25.0°C. The thermal fusion logic is architecturally correct but not hardware-validated.
Decision: Asynchronous Multi-Rate Architecture Justification: The IR sensor operates at 50 Hz while the thermocouple operates at 5 Hz. Forcing both to run synchronously would throttle the fast sensor to the slow sensor's rate, wasting bandwidth and increasing latency. The asynchronous design allows each sensor to run at its natural rate, with the EKF consuming IR data as fast as it arrives while the mapping task consumes thermocouple data when available. This maximizes information throughput while allowing independent optimization of each data path.
Decision: Queue-Based Rather Than Mutex-Based IPC Justification: Shared memory protected by mutexes seems simpler but creates several problems. If the high-priority Fusion Task blocks on a mutex held by the low-priority Telemetry Task, priority inversion occurs and the estimator misses its deadline. Queues inherently prevent this because producers and consumers never block each other—they only interact through lock-free ring buffers. Additionally, queues preserve temporal ordering and provide buffering to smooth transient load spikes.
Decision: Separate Estimation and Mapping Tasks Justification: The EKF answers "where is it now?" and must run with real-time guarantees at high frequency. The occupancy grid answers "what is the environment?" and is computationally heavy but not time-critical. Coupling them would force the high-frequency loop to wait for slow grid updates. Decoupling allows the estimator to maintain deterministic execution while the mapping runs as CPU permits.
Decision: CMSIS-V2 RTOS Interface Justification: Direct FreeRTOS API calls would couple the code to a specific RTOS implementation. CMSIS-V2 provides an abstraction layer that works across FreeRTOS, RTX, Zephyr, and other RTOS systems, improving portability. The abstraction cost is negligible because CMSIS-V2 is a thin wrapper that compiles to nearly identical machine code.
Optimization: Sequential Scalar Updates Instead of Joint Vector Update When both IR sensors provide measurements, the naive approach forms a 2D measurement vector z = [V_long; V_med], requires a 2x2 Jacobian H, and inverts a 2x2 innovation covariance S. This matrix inversion requires 8 multiplications and 1 division. The sequential approach processes each measurement independently with scalar innovation variance S = H * p00 * H + R, requiring only 1 division per sensor. This reduces the update from 16 FLOPs to 10 FLOPs while producing mathematically equivalent results when sensors are uncorrelated.
Optimization: Analytical Jacobian Computation The Jacobian H = dh/dx could be computed numerically using finite differences: H ≈ (h(x + epsilon) - h(x)) / epsilon. This requires evaluating the sensor model twice per update with careful epsilon selection. The analytical approach derives H symbolically—for h(d) = a * d^b + c, we have H = a * b * d^(b-1)—and evaluates it directly. This is faster (1 powf instead of 2), more accurate (no finite difference error), and numerically stable with appropriate clamping.
Optimization: Joseph Form Covariance Update The standard covariance update P = (I - KH) * P is susceptible to numerical error causing the covariance to become non-symmetric or negative-definite due to floating-point roundoff. The Joseph form P = (I - KH) * P * (I - K*H)^T + K * R * K^T explicitly maintains symmetry and positive-definiteness by construction. While requiring more FLOPs, this robustness is essential for long-running filters where roundoff accumulates.
Optimization: Explicit Symmetry Enforcement After every covariance update, the off-diagonal elements are explicitly symmetrized: p01 = p10 = (p01 + p10) / 2. This guards against roundoff-induced asymmetry that would otherwise accumulate and eventually cause matrix decomposition failures.
Optimization: Log-Odds Occupancy Representation Storing probabilities directly requires 4-byte floats and multiplying probabilities causes underflow. Converting to log-odds l = log(p / (1-p)) allows using 1-byte signed integers (int8_t) with range [-127, 127] representing probabilities from ~0 to ~1, reducing memory 4x. Updates become addition instead of multiplication, which is faster and more cache-friendly.
Optimization: Single-Precision Floating-Point Only All floating-point variables use the float type rather than double. On Cortex-M4F, single-precision operations execute in hardware at 1-2 cycles while double-precision operations are emulated in software at 50-100 cycles. The precision of single-precision (7 significant digits) is more than sufficient for sensor measurements accurate to at best 1%.
Optimization: Compiler Floating-Point Settings The compiler flag -mfloat-abi=hard instructs the compiler to pass floating-point arguments in FPU registers rather than general-purpose registers and to use hardware FPU instructions. This seemingly minor flag can improve performance 10x by enabling full hardware acceleration.
Optimization: Static Memory Allocation All data structures—queues, mutexes, state variables, arrays—are allocated statically at compile time rather than dynamically with malloc. This eliminates heap fragmentation, provides deterministic allocation time (zero), and allows the linker to verify at build time that memory constraints are satisfied.
Optimization: Lookup Tables for Non-Linear Functions Though not implemented in the current code, replacing powf(d, -0.873) with a 256-entry lookup table indexed by voltage would reduce execution time from ~50 cycles to ~5 cycles (array access + linear interpolation). The Flash cost is 1 KB per sensor—acceptable on a 1 MB device. This would be essential on Cortex-M0/M3 devices without FPU.
Optimization: Function Inlining Small frequently-called functions like sensor model evaluation are candidates for inline expansion. Modern compilers do this automatically with link-time optimization, but explicit inline hints ensure it occurs. This eliminates function call overhead (4-10 cycles per call) at the cost of code size.
Process Noise Covariance Q Q_pos = 0.002 and Q_vel = 0.03 were selected through experimental tuning on the original project. Increasing Q makes the filter more responsive to rapid changes but noisier; decreasing Q makes it smoother but laggier. The selected values balanced 50-millisecond step response time with sub-millimeter steady-state jitter.
Measurement Noise Covariance R R_long = 0.00063 and R_med = 0.00054 were derived from empirical standard deviations measured during sensor characterization, squared to convert from standard deviation to variance. These values represent the actual sensor noise and should only be tuned if the characterization was incorrect or operating conditions changed.
Grid Resolution and Size A 10x10 grid at 1 cm per cell covers a 10x10 cm area—appropriate for a small tabletop experiment. Increasing resolution to 100x100 improves spatial detail but requires 100x more computation and memory. Decreasing to 5x5 reduces cost but loses detail. The selected resolution balances resolution with resource constraints.
IR Weight in Fusion IR_WEIGHT = 0.3 determines how much the IR modality influences the fused map. Higher weights make the map sharper and more responsive to geometric features but more sensitive to IR noise. Lower weights make it smoother and more thermally dominated but less spatially precise.
Temperature Drop Per Centimeter TEMP_DROP_CM = 0.3 models how temperature decays with distance from a source. This parameter depends on thermal conductivity of the medium and would be calibrated empirically by measuring temperature at known distances from a heat source.
Q: Why use an Extended Kalman Filter instead of a standard Kalman Filter or Unscented Kalman Filter (UKF)?
A: The standard Kalman Filter is mathematically provable to be optimal only for linear systems where measurements are linear functions of the state. Our Sharp IR sensor exhibits a highly non-linear voltage-distance relationship following an inverse power law, V = 33.32 * d^(-0.873) + 0.08654. When a Gaussian probability distribution is passed through a non-linear function, it becomes distorted and is no longer Gaussian, violating the fundamental assumption that allows the Kalman filter recursion to work.
The Extended Kalman Filter addresses this by linearizing the measurement function locally around the current state estimate using the Jacobian matrix, which contains the partial derivatives of the measurement function. Essentially, we approximate the curved non-linear function as a straight line tangent to the curve at the current estimate. This local linear approximation preserves the Gaussian property well enough for the filter to remain consistent as long as the non-linearity is not too severe and the uncertainty is not too large.
Why EKF over UKF? An EKF was selected over an Unscented Kalman Filter due to strict real-time constraints and the availability of accurate analytical Jacobians for the infrared sensor models. The nonlinearity in the measurement function is smooth and well-behaved over the operating range, and the EKF linearization error remains small relative to measurement noise. In contrast, a UKF would increase computational cost (requiring evaluation of multiple sigma points), memory usage (storing sigma point states), and scheduling jitter without providing significant estimation benefits in this regime. The EKF therefore offers the best trade-off between accuracy, predictability, and real-time determinism.
Defense-grade expansion: The Unscented Kalman Filter provides better accuracy for strongly nonlinear systems by propagating sigma points through the nonlinear function, but this comes at a significant computational and scheduling cost. In this system, the measurement nonlinearity is smooth, monotonic, and well-characterized, with analytically derived Jacobians. The EKF linearization error remains small relative to sensor noise, as demonstrated in the MATLAB experiments. Using a UKF would increase CPU load, memory usage, and—critically—execution-time variability, which directly threatens the fixed-Δt assumption of the state transition model. The EKF is therefore not just "good enough"—it is the optimal choice for this application given the real-time constraints.
Q: How did we validate that the sensor noise is actually Gaussian as assumed?
A: Validation involved multiple statistical tests on the residuals, which are the differences between actual measurements and model predictions. First, we plotted histograms of residuals for hundreds of samples at different distances—a Gaussian distribution should produce a symmetric bell curve centered at zero. Second, we generated quantile-quantile plots comparing the empirical distribution to a theoretical Normal distribution—if the noise is Gaussian, these should form a straight line. Third, we performed chi-square goodness-of-fit tests to quantify the deviation from Gaussianity.
The results showed that the medium and long-range sensors exhibited approximately Gaussian noise within one to two standard deviations, with some heavy-tail behavior at the extremes likely due to quantization and outliers. The short-range sensor showed bimodal behavior at certain distances, suggesting it may switch between internal states. For the EKF, approximate Gaussianity within the central region is acceptable because most measurements fall within one standard deviation. Outliers can be handled through innovation gating, where we compute the Mahalanobis distance of the innovation and reject measurements that fall beyond a threshold, typically three standard deviations.
Q: Why include the offset term 'c' in the sensor model, and why does it sometimes come out negative?
A: The offset term represents the sensor's baseline output voltage at infinite distance, accounting for quiescent biases in the analog circuitry. Physically, when no object is present, the sensor still outputs some small voltage due to internal reference voltages and ambient IR. Including this term in the model improves fit accuracy by allowing the power law to match the actual sensor response across the full range.
The offset sometimes emerges negative from the non-linear least squares regression as a mathematical artifact. This occurs because the optimization algorithm is unconstrained and finds the parameter values that minimize residual error over the calibrated range, which might mathematically require a negative offset. However, this is not a physical impossibility because we never actually evaluate the model at infinite distance. In firmware, we clamp both the input distance and the output voltage to physically realizable ranges: distance is limited to the sensor's specified operating range, and voltage is clamped to 0-3.3V. The negative offset simply allows the power law function to better fit the curvature in the valid range.
Q: What does the Jacobian matrix physically represent?
A: The Jacobian matrix H contains the partial derivatives of the measurement function with respect to each state variable, which physically represents the sensor's sensitivity to changes in the state. For our IR sensor, H = [-29.088 * d^(-1.873), 0] indicates that the voltage output depends on position but not directly on velocity, and the magnitude of the position derivative tells us how strongly voltage changes with small distance changes.
At close range, the derivative has large magnitude: a one-centimeter change in distance causes a large voltage change, meaning the sensor is very sensitive and provides high information content. The EKF automatically assigns high weight to these measurements. At long range, the derivative approaches zero: distance changes barely affect the voltage, meaning the sensor provides little information. The EKF down-weights these measurements automatically through the Kalman gain calculation.
This adaptive weighting is one of the key advantages of the EKF over hand-tuned filters. We do not need to manually specify different measurement noise values at different ranges—the linearization through the Jacobian naturally captures the distance-dependent sensor precision.
Q: Where in Lab 2 were the initial Q guess versus optimized Q values documented, and what were they?
A: This comparison appears in the Lab 2 documentation in the 'Linear System Model' and 'Simulation Results' sections. The initial guess was $Q = \begin{bmatrix} 0.2 & 0 \ 0 & 0.3 \end{bmatrix}$, based on a rough estimation of manual hand tremors and friction during block movement. The optimized values found through simulation (Figure 2) were $Q = \begin{bmatrix} 0.002 & 0 \ 0 & 0.03 \end{bmatrix}$. The significantly lower values (100x reduction for position, 10x for velocity) indicate that the constant velocity model was actually more trustworthy than the initial conservative guess implied. This demonstrates the importance of empirical tuning—intuitive guesses can be orders of magnitude off, leading to either excessive lag (Q too low) or excessive jitter (Q too high).
Q: Why model the system with constant velocity instead of including acceleration as a state?
A: The system models target motion using a discrete-time constant-velocity process with state x = [position, velocity]ᵀ, consistent with the original MATLAB experiments. This choice reflects the physical setup, where motion is smooth relative to the 20 ms sampling period and acceleration is treated as bounded uncertainty rather than an explicit state.
Why not model acceleration explicitly? Acceleration is treated as unmodeled dynamics captured by process noise Q. Given the short sampling interval and smooth manual motion, adding acceleration as a state would:
- Increase sensitivity to noise: Differentiating noisy measurements twice amplifies noise quadratically
- Reduce observability: The sensors measure position (indirectly via voltage), not acceleration, making it poorly observable
- Complicate tuning: A 3-state filter requires tuning a 3×3 Q matrix with 6 independent parameters vs. 2 for the current model
The constant-velocity model offers a better bias-variance trade-off in this regime. Process noise is injected independently into position and velocity states, matching the discrete noise assumptions used in the offline experiments rather than a continuous white-acceleration model. This preserves equivalence between the MATLAB and embedded implementations while keeping computation minimal and deterministic.
Q: Why use FreeRTOS instead of implementing everything in a super-loop?
A: A super-loop architecture has variable and unpredictable execution time because the time from one iteration to the next depends on which code paths execute. If the IR sensor takes 1 millisecond to read but the thermocouple takes 10 milliseconds, the loop period varies between 1 and 10 milliseconds depending on which sensors are polled. This violates the EKF's fundamental assumption that the time step delta-t is fixed, causing the state transition matrix A to be physically incorrect.
FreeRTOS provides preemptive priority-based scheduling with precise timing control through vTaskDelayUntil, which guarantees that high-priority tasks run at exact intervals regardless of lower-priority task execution time. The Fusion Task can wake every 20.000 milliseconds with microsecond precision, maintaining the physics model validity. Additionally, the RTOS automatically handles task switching, stack management, and inter-process communication, reducing the likelihood of subtle bugs compared to manually managing state machines in a super-loop.
The trade-off is increased code complexity and memory overhead—FreeRTOS adds approximately 10 KB of code and requires dedicated RAM for the kernel and task stacks. However, for real-time state estimation where timing correctness is critical, this cost is justified.
Q: How do you prevent priority inversion when multiple tasks access shared state?
A: Priority inversion occurs when a high-priority task blocks waiting for a resource held by a low-priority task. For example, if the low-priority Telemetry Task holds the state mutex and is preempted by a medium-priority task, the high-priority Fusion Task will block waiting for Telemetry to finish, even though it has higher priority. This can cause the Fusion Task to miss its deadline.
We prevent this through two mechanisms. First, we minimize the use of shared mutable state by using message queues for most communication rather than mutexes protecting global variables. Queues are lock-free from the perspective of the task API—producers and consumers never block each other. Second, for the unavoidable global state mutex, we enable priority inheritance: when a high-priority task blocks on a mutex held by a low-priority task, the low-priority task temporarily inherits the high priority until it releases the mutex. This ensures it executes quickly and releases the lock, bounding the inversion duration.
Additionally, we minimize mutex hold time by using a snapshot pattern: tasks acquire the mutex, copy data into local variables, release the mutex immediately, and then perform heavy computations on the local copy without holding the lock.
Q: What happens if the time step Δt has jitter or is inconsistent?
A: This is a critical correctness issue, not just a performance concern. The discrete state transition and covariance propagation become inconsistent with the physical model. Even small timing jitter causes systematic errors in predicted state and uncertainty growth, which the EKF interprets as process noise.
Specifically:
- The state transition matrix A = [[1, Δt], [0, 1]] assumes a fixed Δt. If actual time between updates varies (e.g., 19ms, 21ms, 20ms), the predicted position x' = x + v·Δt becomes physically incorrect.
- The covariance propagation P' = A·P·Aᵀ + Q assumes consistent dynamics. Variable Δt causes the predicted uncertainty to misrepresent actual error growth.
- Over time, this leads to covariance miscalibration (P doesn't reflect true uncertainty) and biased estimates (filter trusts the wrong information sources).
That's why strict periodic execution is enforced using osDelayUntil. The EKF predict–update cycle is executed by a dedicated real-time FreeRTOS task with strict periodicity. Maintaining a fixed time step is not negotiable: the discrete state transition matrix and covariance propagation are only valid under the assumed sampling interval. Task priorities are assigned based on estimation criticality rather than frequency, ensuring that sensor fusion remains deterministic even under system load. If timing jitter were to occur, estimation consistency—not just accuracy—would degrade due to model mismatch.
Q: How do you measure and debug timing violations?
A: Timing violations are detected through multiple mechanisms. First, each periodic task records its wake time using osKernelGetTickCount and compares the actual period to the expected period. Jitter is the absolute difference between actual and expected, which is logged to UART. Sustained jitter exceeding a threshold indicates the task is not completing within its allocated time slice.
Second, FreeRTOS provides stack high-water mark monitoring through uxTaskGetStackHighWaterMark, which reports the minimum remaining stack space. If this approaches zero, the task is on the verge of stack overflow, which could indicate excessive local variables or deep function call chains.
Third, the telemetry stream includes queue depths. If the IR sensor queue depth consistently approaches maximum, it indicates the Fusion Task is not consuming data fast enough. If the queue depth grows unbounded, the system is fundamentally under-resourced and requires either faster hardware or algorithm optimization.
For development, I would use hardware debug outputs: toggling a GPIO at the start and end of the Fusion Task allows measuring execution time with an oscilloscope or logic analyzer at microsecond precision, independent of software instrumentation overhead.
Q: The system has only one IR sensor configured. How does this affect the dual-sensor fusion architecture?
A: The .ioc file confirms only ADC1 Channel 0 is configured, and the code explicitly sets valid_med = 0 in the IR Acquisition Task. This means the system currently operates in single-sensor mode—only the long-range sensor update executes in the Fusion Task.
However, the architecture is fully designed for dual-sensor fusion. The sequential scalar update logic checks both valid_long and valid_med flags and processes whichever sensors have data available. Adding the second sensor requires only:
- Configuring ADC1 Rank 2 in CubeMX for the second channel
- Reading both channels in the IR Acquisition Task
- Setting
valid_med = 1when the second sensor provides valid data
No changes to the EKF logic, task scheduling, or queue architecture are needed. This is an example of staged development: the single-sensor configuration validates the core EKF mathematics, RTOS determinism, and system integration, with dual-sensor fusion being a straightforward hardware extension.
For interview discussions, this should be framed as: "The current hardware validates the EKF core with one sensor. The software architecture already supports dual sensors—it's a CubeMX configuration change, not a redesign."
Q: Why use sequential scalar updates instead of joint vector updates when dual sensors are enabled?
A: When both sensors are active, the sequential approach is mathematically equivalent to a joint vector update if and only if two conditions hold:
- Sensors are uncorrelated (R is diagonal): ✓ Satisfied—separate signal paths, independent noise sources
- Measurements arrive at the same time: ✓ Satisfied—both read within the same 20ms task iteration
Given these conditions, sequential processing avoids the computational cost of forming a 2D measurement vector, constructing a 2x2 Jacobian, and inverting the 2x2 innovation covariance S. The joint update requires approximately 30 FLOPs including matrix inversion, while two sequential scalar updates require about 20 FLOPs using only division.
Additionally, scalar division is numerically more stable than 2x2 matrix inversion, which can become ill-conditioned if the sensors provide nearly redundant information (e.g., both measure the same distance with similar noise).
The sequential architecture also provides graceful degradation: if one sensor fails, the filter continues with the remaining sensor without special handling. A joint update architecture would require separate code paths for single-sensor vs dual-sensor modes.
Defense-ready summary: Infrared measurements are incorporated directly in voltage space using a nonlinear forward sensor model derived from empirical calibration. Each sensor update is applied sequentially using a scalar EKF update, which is mathematically equivalent to a multi-sensor update when measurement noise is uncorrelated and measurements correspond to the same time step. This approach avoids matrix inversion, reduces computational load, and improves numerical robustness.
Q: Why does the EKF operate in voltage space rather than inverting the sensor model first?
A: This is a critical design decision that demonstrates strong understanding of estimation theory. The implementation passes raw voltage measurements directly to the EKF:
ekf_update_scalar(ir_packet.voltage_long, h_long, H_long, R_LONG);Where h_long(d) is the forward model that predicts voltage from distance. This approach is theoretically superior to inverting the sensor model because:
-
Preserves Gaussian Noise: Lab 1 characterized noise in voltage space as approximately Gaussian: V_measured = h(d_true) + v, where v ~ N(0, R_voltage). If we inverted the model to get d = h_inv(V), we would transform Gaussian voltage noise through a non-linear function, producing non-Gaussian noise in distance space (skewed and heteroscedastic).
-
Correct Linearization: The EKF innovation y = V_measured - h(d_estimate) is computed in the original voltage domain where the noise assumptions are valid. The Jacobian H linearizes the forward model, not the inverse.
-
No Information Loss: Working directly with measurements in their native domain avoids numerical issues from inverting near-singular regions of the sensor response curve.
The alternative approach—invert first, then feed distances to the EKF—would require re-characterizing noise in distance space and potentially using an Unscented Kalman Filter to handle the transformed non-Gaussian distributions. The voltage-space approach is both simpler and more rigorous.
Q: Why use the Joseph form covariance update instead of the simpler (I - K·H)·P formula?
A: The Joseph form covariance update is implemented in the ekf_update_scalar function (main.c:229-260) and provides critical numerical stability for long-running embedded filters.
The standard covariance update P_new = (I - K·H)·P is susceptible to numerical error causing the covariance to become non-symmetric or negative-definite due to floating-point roundoff. This can cause catastrophic filter divergence where the covariance matrix becomes singular and the filter "locks up" or produces NaN values.
The Joseph form P_new = (I - K·H)·P·(I - K·H)ᵀ + K·R·Kᵀ guarantees symmetry and positive semi-definiteness of the covariance matrix under finite-precision arithmetic. Even if rounding errors occur, the symmetry enforcement (p01 = p10 = (p01 + p10)/2) and the explicit re-injection of measurement noise (K·R·Kᵀ) prevent the covariance from degrading.
On embedded hardware using single-precision floats (7 significant digits), this is critical for long-term numerical stability, especially with sequential updates where rounding errors can accumulate over thousands of iterations. The Joseph form adds approximately 10 extra FLOPs per update but is non-negotiable for production systems running continuously for hours or days.
Q: Why is the mapping implementation not fully Bayesian like the MATLAB version?
A: The embedded implementation preserves the structure and intent of probabilistic fusion but relaxes strict Bayesian semantics to meet memory and timing constraints. The focus is on spatial consistency and real-time responsiveness rather than exact posterior probabilities.
Specifically:
- MATLAB Version: Used full Bayesian log-odds updates with inverse sensor models for both IR and thermal modalities, providing theoretically optimal belief propagation
- STM32 Version: Uses normalized probability arithmetic fusion (weighted addition) rather than log-odds, and thermal gradient modeling rather than formal inverse sensor models
Why this simplification?
- Memory Constraints: True log-odds with int8_t storage would reduce memory 4x, but requires careful saturation handling and lookup tables for the log/exp conversions
- Computational Constraints: Full Bayesian updates for a 10×10 grid with 5 thermal sensors require ~500 log/exp operations per update, which would violate the 1Hz mapping task deadline
- Validation Strategy: The arithmetic fusion validates the spatial structure and multi-modality architecture while remaining deterministic and debuggable
The current approach maintains spatial structure, multi-sensor fusion concepts, and produces qualitatively correct occupancy maps. The log-odds upgrade path is explicitly preserved in the architecture and remains a documented enhancement for production deployment.
Defense-ready framing: "The mapping task preserves probabilistic fusion structure from MATLAB while meeting embedded constraints. Converting to full log-odds Bayesian updates is a straightforward enhancement that doesn't require architectural changes."
Q: What happens if the sensor model is wrong?
A: An incorrect sensor model leads to biased innovation, which systematically corrupts the state estimate. If the model predicts higher voltage than reality for a given distance, the innovation y = z_measured - h(x_predicted) will consistently be negative, causing the filter to overestimate distance. Over time, the state estimate drifts away from truth.
The effect manifests in several observable ways. First, the residuals—the differences between measurements and predicted measurements—will have non-zero mean rather than being white noise centered at zero. Second, the innovation consistency check fails: the innovations should have the predicted covariance S, but with a biased model, the empirical innovation variance will differ from S. Third, the estimated covariance P will not accurately reflect actual error—the filter may report high confidence even though the estimate is wrong.
Detection involves monitoring innovation statistics. If the normalized innovation squared (y^T * S^(-1) * y) consistently exceeds its expected value, the model is incorrect. Mitigation requires recalibrating the sensor model using new data or switching to an adaptive estimation technique that adjusts the model online.
Q: What would fail first if this system ran continuously for hours or days?
A: Numerical covariance drift or unbounded uncertainty during sensor dropout. That's why Joseph form updates, symmetry enforcement, and covariance monitoring are included.
Specifically, the failure modes in order of likelihood are:
-
Covariance Degradation (Without Joseph Form): The standard covariance update P = (I - K·H)·P accumulates floating-point roundoff errors over thousands of iterations. Eventually, P becomes non-symmetric or loses positive-definiteness, causing matrix decomposition failures and filter divergence (NaN propagation). The Joseph form prevents this.
-
Sensor Dropout with Unbounded P Growth: If sensors fail, the filter runs in prediction-only mode where P grows with each iteration as Q is added. Without bounds checking, P elements can grow to the point where numerical precision is lost (e.g., p00 = 1e38), making subsequent updates meaningless. Mitigation: Monitor trace(P) and trigger failsafe when uncertainty exceeds threshold.
-
Persistent Model Mismatch: If the sensor calibration drifts due to temperature or aging, the innovation y = z - h(x) develops systematic bias. The filter converges to a biased estimate with artificially low covariance (overconfident wrongness). Detection: Monitor normalized innovation squared (NIS) statistics; consistent values >> expected chi-square indicate model error.
-
Stack Overflow in Low-Priority Tasks: If the Mapping Task or Telemetry Task experiences unexpected execution time increases (e.g., due to queue overflow handling), their stacks could overflow. Mitigation: FreeRTOS stack watermark monitoring (uxTaskGetStackHighWaterMark) detects this before corruption.
In a production system, I would implement:
- Trace(P) monitoring with automatic reset or failsafe trigger
- Innovation consistency checking (chi-square test on NIS)
- Watchdog timer requiring periodic refresh from Fusion Task
- Non-volatile fault logging for post-mortem analysis
The Joseph form and symmetry enforcement specifically address the #1 failure mode, which is why they are non-negotiable for long-running systems.
Q: Why use single-precision float instead of double or fixed-point?
A: The choice depends on the hardware and accuracy requirements. Double-precision (64-bit) floating-point provides 15 significant digits of precision, far more than necessary for sensors accurate to at best three significant figures. On the Cortex-M4F, double operations are emulated in software at 50-100x slower than single-precision hardware operations, making them prohibitive for real-time loops.
Single-precision (32-bit) floating-point provides 7 significant digits, which is more than sufficient for representing distances to within 0.01%, and executes in hardware at 1-2 cycles per operation. This is the optimal choice for this platform.
Fixed-point arithmetic using scaled integers (Q15 or Q31 format) would be necessary on Cortex-M0/M3 devices without an FPU. Fixed-point requires manual management of the radix point, is prone to overflow if scaling is incorrect, and complicates transcendental functions like powf. However, it provides deterministic cycle counts and can be faster than software floating-point emulation. For this project on Cortex-M4F, single-precision float offers the best combination of ease of implementation and performance.
Q: What happens if a sensor fails or is disconnected?
A: When a sensor fails, the data queue for that sensor stops receiving new elements. The Fusion Task attempts to read from the queue with zero timeout, and when the read fails, it skips the measurement update for that iteration. The filter continues operating in prediction-only mode, propagating the state forward using the physics model: x_new = x + v * delta_t.
However, without measurement corrections, the error covariance P grows with each prediction step because process noise Q is added. The covariance reflects increasing uncertainty—we trust our estimate less and less as time passes without measurements. If the sensor dropout is brief, the covariance remains bounded and the filter recovers immediately when measurements resume. If the dropout is sustained, the covariance grows unbounded, signaling that we have essentially lost state observability.
In a safety-critical system, we would monitor the trace of P (sum of diagonal elements) and trigger a failsafe when it exceeds a threshold. Failsafe actions might include reducing velocity to a safe crawl, requesting sensor diagnostics, switching to redundant backup sensors if available, or performing a controlled emergency stop.
The asynchronous architecture is inherently robust to sensor failure because no task blocks waiting for sensor data—the system continues running with whatever information is available.
Q: How would you detect a sensor giving incorrect but plausible values?
A: This is more insidious than outright failure because the filter cannot detect bias from the measurement alone—if a sensor consistently reads 5 cm too high, the filter will converge to an estimate that is 5 cm too high and report high confidence.
Detection requires consistency checks between redundant information sources. If we have two sensors measuring the same quantity, we can compare their innovations: if one sensor consistently disagrees with the prediction while the other agrees, the disagreeing sensor is likely faulty. The chi-square consistency test computes the normalized innovation squared and checks whether it follows the expected distribution—persistent large values indicate model mismatch or sensor bias.
Cross-validation against the physics model also helps: if the sensor data implies physically impossible accelerations or velocities, it is likely faulty. For example, if consecutive position measurements imply an acceleration exceeding 10 g, we know something is wrong.
Mitigation involves adaptive noise estimation, where the filter adjusts R online based on innovation statistics, or using a bank of filters with different sensor models and selecting the most consistent one.
The current STM32 implementation successfully demonstrates the core architectural principles: deterministic real-time scheduling through FreeRTOS, queue-based asynchronous sensor fusion, Extended Kalman Filter state estimation with proper covariance propagation, and dual-modality occupancy mapping. The code compiles, configures the hardware correctly, and implements all six tasks with appropriate priorities and periods.
However, several components are simulated rather than connected to actual hardware. The thermocouple reading returns a constant value rather than performing I2C or SPI communication with a thermocouple amplifier. The IR sensor model assumes a single-channel ADC reading, whereas a production system would have multiple sensors feeding different ADC channels or a multiplexed input. The occupancy grid IR map is initialized to zeros because there is no position history tracking to populate it from sensor readings as the object moves.
The telemetry output streams CSV-formatted data but does not implement the full command interface for runtime reconfiguration and fault injection. The system does not yet include innovation gating for outlier rejection or covariance monitoring for failsafe triggering.
This project represents a complete journey from theoretical foundations in stochastic estimation through empirical sensor characterization to production firmware implementation on resource-constrained embedded hardware. The Extended Kalman Filter successfully fuses heterogeneous sensor modalities to produce state estimates and environmental maps that are more accurate and robust than any individual sensor could provide.
The STM32 port demonstrates that sophisticated probabilistic algorithms can be deployed on microcontrollers when careful attention is paid to deterministic timing, numerical stability, memory efficiency, and computational optimization. The architecture achieves real-time performance through priority-driven scheduling, asynchronous queue-based communication, and embedded-specific optimizations while maintaining mathematical correctness.