Incipient Short-Circuit Fault Detection and Location in Battery Energy Storage Systems: A Data-Driven Eigen-Decomposition Approach

The global energy landscape is undergoing a significant transformation driven by concerns over energy security and environmental pollution. The development and utilization of new energy sources have consequently garnered immense worldwide attention. Currently, the proportion of renewable energy consumption, such as wind and solar power, within the overall electricity mix is steadily increasing. While these new energy sources offer advantages like mitigating traditional energy crises and lower pollution, they are inherently intermittent and volatile, posing challenges to the stable operation of power grids. To address this issue, enhance the utilization efficiency of existing generation systems, and further promote the widespread adoption of renewables, large-scale Battery Energy Storage Systems (BESS) have emerged as a critical solution.

The safe and efficient operation of a BESS relies heavily on its Battery Management System (BMS). The BMS is responsible for ensuring safety, improving efficiency, and extending the lifespan of the battery pack by continuously monitoring key state variables during operation. Its core tasks include data acquisition from battery monitoring, fault diagnosis, information logging and processing, self-testing, information interaction, and real-time communication. Among these, fault detection is a paramount function. Short-circuit faults represent a common and severe failure mode during the operational life of batteries, often serving as a critical precursor stage leading to thermal runaway. Therefore, developing effective methods for detecting and locating such faults, especially in their incipient stages, is of vital importance for the safety and reliability of any battery energy storage system.

Existing research has proposed various methods for short-circuit fault detection, which can be broadly categorized into model-based, data-driven, and knowledge-based approaches. Model-based and data-driven methods are often termed quantitative analyses. For instance, some studies employ electro-thermal coupling models to simulate internal short circuits, capturing correlations with data like voltage and temperature, transforming detection into a parameter estimation problem. However, such methods often struggle to meet real-time requirements. Other approaches use experimental data to build improved thermal models for estimating the onset of thermal runaway but still lack precision for incipient fault detection. Data-driven strategies often focus on analyzing data features from normal and faulty operations. A common technique involves designing voltage calculation loops to capture correlation coefficients between cells to identify the faulty one. Some advanced methods propose interleaved measurement topologies where each cell’s terminal voltage is associated with two sensors, improving diagnostic capabilities. Nevertheless, many data-driven methods face challenges because the initial fault signal is weak and easily masked by measurement noise and inherent cell inconsistencies, potentially leading to missed detections or false alarms.

To overcome these limitations, this article presents a novel fault detection and location method specifically designed for incipient short-circuit faults within a battery energy storage system. The core of the method lies in a modified interleaved voltage measurement topology combined with a statistical feature decomposition technique to amplify the fault signature. The process begins with normalizing data sampled via a sliding window from the interleaved measurements. A fault detection indicator is then designed based on the eigen-decomposition of the normalized data set’s covariance matrix. Crucially, a theoretical derivation establishes a quantitative relationship between the fault detection threshold and the minimum detectable fault amplitude. Finally, once a fault is detected, its location is pinpointed by evaluating the contribution of abnormal eigenvectors. Simulation and experimental validation on a battery pack demonstrate that the proposed method can accurately detect and locate an incipient short-circuit condition with an equivalent current of 0.5C within 6.4 seconds, verifying its effectiveness and timeliness.

1. Interleaved Topology and Fault Detection Indicator

1.1 Modified Interleaved Voltage Measurement Topology

The foundation of the proposed method is a modified non-redundant interleaved voltage measurement topology. In this configuration, voltage sensors are connected in an交错 pattern. The i-th voltage sensor connects to the anode of the i-th parallel-connected battery module and the cathode of the (i+1)-th module, measuring the sum of their voltages. A key modification is applied to the last sensor: the n-th sensor connects to the anode of the n-th module and the cathode of the 1st module, forming a circular measurement structure. This design inherently provides a degree of data redundancy for fault diagnosis. For a battery energy storage system comprising ‘n’ series-connected modules, the interleaved measurement voltage vector, \( \mathbf{V^s} \), can be expressed in terms of the actual module voltages, \( \mathbf{V^c} \), through a structured connection matrix \(\mathbf{H}\):

$$
\mathbf{V^s} = \begin{bmatrix} V^s_1 \\ V^s_2 \\ \vdots \\ V^s_{n-1} \\ V^s_n \end{bmatrix} = \begin{bmatrix}
1 & 1 & 0 & \cdots & 0 & 0 \\
0 & 1 & 1 & \cdots & 0 & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & \cdots & 1 & 1 \\
1 & 0 & 0 & \cdots & 0 & 1
\end{bmatrix} \begin{bmatrix} V^c_1 \\ V^c_2 \\ \vdots \\ V^c_{n-1} \\ V^c_n \end{bmatrix} = \mathbf{H} \mathbf{V^c}
$$

To mitigate the influence of dynamic operating conditions (e.g., changing load current) on the baseline signals, a differential operation is applied. This generates a differential interleaved voltage vector, \( \Delta\mathbf{V^s} \):

$$
\Delta\mathbf{V^s} = \begin{bmatrix} \Delta V^s_{1,2} \\ \Delta V^s_{2,3} \\ \vdots \\ \Delta V^s_{n-1,n} \\ \Delta V^s_{n,1} \end{bmatrix} = \begin{bmatrix}
1 & 0 & -1 & 0 & \ldots & 0 \\
0 & 1 & 0 & -1 & \ldots & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
0 & -1 & 0 & 0 & \ldots & 1
\end{bmatrix} \begin{bmatrix} V^c_1 \\ V^c_2 \\ \vdots \\ V^c_{n-1} \\ V^c_n \end{bmatrix} = \mathbf{A} \mathbf{V^c}
$$

Here, \( \Delta V^s_{i,i+1} = V^s_i – V^s_{i+1} \), and the matrix \(\mathbf{A} \in \mathbb{R}^{n \times n}\) (with a rank of n-1) is the differential mapping matrix. This differential step effectively removes common-mode voltage drifts caused by overall state-of-charge changes, making the signal more sensitive to relative deviations between adjacent modules—a key indicator of a fault in a single module within the battery energy storage system.

1.2 Eigen-Decomposition Based Fault Detection Indicator

Data is processed in a sliding window manner for real-time monitoring. Let \( \mathbf{x}_k = \Delta\mathbf{V^s}_k \in \mathbb{R}^{n \times 1} \) be the differential voltage vector at time index \(k\). A sliding window matrix \( \mathbf{X}_k \in \mathbb{R}^{w \times n} \) is formed from the past \(w\) samples:

$$
\mathbf{X}_k = \begin{bmatrix}
\mathbf{x}_{k-w+1}^T \\
\mathbf{x}_{k-w+2}^T \\
\vdots \\
\mathbf{x}_{k}^T
\end{bmatrix} = \mathbf{S}_k \mathbf{A}^T
$$

where \( \mathbf{S}_k \in \mathbb{R}^{w \times n} \) is the matrix of actual module voltages over the window. This matrix is then normalized using the mean \( \boldsymbol{\mu}_0 \) and standard deviation \( \boldsymbol{\xi}_0 = \text{diag}(\xi_1, \xi_2, …, \xi_n) \) estimated from historical healthy operation data of the battery energy storage system:

$$
\bar{\mathbf{X}}_k = (\mathbf{X}_k – \mathbf{1}_w \boldsymbol{\mu}_0^T) \boldsymbol{\xi}_0^{-1}
$$

where \( \mathbf{1}_w \) is a column vector of ones with length \(w\). The core of the detection algorithm involves analyzing the covariance matrix \( \mathbf{C}_k \) of this normalized data matrix:

$$
\mathbf{C}_k \approx \frac{1}{w} \bar{\mathbf{X}}_k^T \bar{\mathbf{X}}_k
$$

An eigen-decomposition is performed on \( \mathbf{C}_k \):

$$
\mathbf{C}_k = \mathbf{V}_k \boldsymbol{\Lambda}_k \mathbf{V}_k^T
$$

where \( \boldsymbol{\Lambda}_k = \text{diag}(\lambda_{1,k}, \lambda_{2,k}, …, \lambda_{n,k}) \) contains the eigenvalues in descending order (\( \lambda_{1,k} \ge \lambda_{2,k} \ge … \)), and \( \mathbf{V}_k \) is the corresponding eigenvector matrix. The fundamental premise is that an incipient fault introduces a specific, abnormal structure into the data, which will predominantly affect the principal components of the covariance matrix, particularly the largest eigenvalue \( \lambda_{1,k} \).

The fault detection indicator \( D_k \) is thus designed as the standardized value of the largest eigenvalue:

$$
D_k = \frac{\lambda_{1,k} – \mu^*_{\lambda,1}}{\xi^*_{\lambda,1}}
$$

Here, \( \mu^*_{\lambda,1} \) and \( \xi^*_{\lambda,1} \) are the mean and standard deviation of \( \lambda_{1,k} \) calculated from an extensive dataset of normal, fault-free operation of the battery energy storage system. Under normal conditions, \( D_k \) will fluctuate around zero. A significant positive deviation signals a potential incipient fault.

2. Short-Circuit Fault Detection Performance and Threshold Analysis

2.1 Quantitative Determination of the Detection Threshold

A critical step is setting a robust threshold \( \delta_{sc} \) for \( D_k \) to declare a fault. This requires understanding the relationship between the fault magnitude and its impact on \( D_k \). Consider an incipient short-circuit fault of amplitude \( f \) (representing a voltage deviation) occurring in the \( l \)-th module. The measured differential vector can be modeled as:

$$
\mathbf{x}_k = \mathbf{x}^*_k + \gamma f_k
$$

where \( \mathbf{x}^*_k \) is the fault-free component and \( \gamma \in \mathbb{R}^{n \times 1} \) is a fault direction vector determined by the topology (derived from matrix \( \mathbf{A} \)). After sliding window collection and normalization, the covariance matrix \( \mathbf{C}_k \) becomes the sum of three components: the healthy baseline covariance \( \mathbf{C}_{1,k} \), a cross-term \( \mathbf{C}_{2,k} \), and the fault-specific term \( \mathbf{C}_{3,k} \):

$$
\mathbf{C}_k = \mathbf{C}_{1,k} + \mathbf{C}_{2,k} + \mathbf{C}_{3,k}
$$

Through statistical expectation analysis and exploiting properties of the matrix trace, the expected value of the trace of \( \mathbf{C}_{3,k} \) is derived as \( \mathbb{E}\{\text{Tr}(\mathbf{C}_{3,k})\} = (t_f / w) \mathbb{E}\{f^2\} (\bar{\mathbf{a}}_l^T \bar{\mathbf{a}}_l) \), where \( t_f \) is fault duration and \( \bar{\mathbf{a}}_l \) is related to the \( l \)-th column of the normalized system matrix. The trace is related to the sum of eigenvalues. To ensure detection, we require \( \mathbb{E}\{D_k\} \ge \delta_{sc} \). This leads to a theoretical bound linking the detectable fault amplitude to the threshold:

$$
\delta_{sc} \leq \frac{ \frac{t_f}{w} \mathbb{E}\{f^2\} (\bar{\mathbf{a}}_l^T \bar{\mathbf{a}}_l) }{ \sum_{i=1}^{n} \xi_{\lambda,i}^* }
$$

Given that \( D_k \) does not necessarily follow a Gaussian distribution, the practical threshold \( \delta_{HS} \) for the battery energy storage system health status is determined empirically from the healthy historical data at a chosen significance level (e.g., α=0.01). Considering the two possible differential sequences from the interleaved topology, the final system threshold is the more conservative one:

$$
\delta_{HS} = \max(\delta_{sc,1}, \delta_{sc,2})
$$

This analysis provides a principled way to understand the sensitivity of the method. It shows that for a given threshold, there is a minimum fault energy (\( t_f \cdot f^2 \)) that can be detected, which is crucial for defining the “incipient” fault detection capability of the system.

2.2 Fault Location via Eigenvector Contribution Analysis

Once \( D_k > \delta_{HS} \) triggers a fault alarm, the next step is to identify the faulty module. The location information is encoded in the eigenvector \( \mathbf{v}_{1,k} \) associated with the largest (now anomalous) eigenvalue \( \lambda_{1,k} \). The contribution of each differential measurement channel \( i \) to this anomalous eigenvalue can be quantified by the squared elements of the eigenvector:

$$
\eta_i = (v_{1,k}^{(i)})^2, \quad i = 1, \dots, n
$$

where \( v_{1,k}^{(i)} \) is the \( i \)-th element of \( \mathbf{v}_{1,k} \). For a short-circuit fault in a specific module \( j \), a specific pattern emerges in the \( \boldsymbol{\eta} = [\eta_1, \eta_2, …, \eta_n]^T \) vector. The fault location logic is as follows: find the two largest, non-adjacent elements \( \{\eta_i, \eta_m\} \). The faulty module index \( j \) is then determined by the specific mapping rule defined by the interleaved topology:

$$
j = \begin{cases}
n, & \text{if } i = 1 \\
i-1, & \text{if } i = 2, …, n
\end{cases}
$$

This mapping accounts for the circular nature of the differential measurements. The clear mathematical foundation for location makes this method robust compared to simple anomaly detection.

2.3 Integrated Fault Detection and Location Strategy

The complete workflow for incipient short-circuit fault detection and location in a battery energy storage system is as follows:

  1. Initialization: Collect historical voltage data from the interleaved sensors during known healthy operation of the battery energy storage system. Calculate the normalization parameters (\( \boldsymbol{\mu}_0, \boldsymbol{\xi}_0 \)) and the baseline statistics for the largest eigenvalue (\( \mu^*_{\lambda,1}, \xi^*_{\lambda,1} \)). Determine the health status threshold \( \delta_{HS} \).
  2. Real-Time Monitoring: Continuously sample interleaved voltages. Construct the sliding window matrix \( \mathbf{X}_k \) and compute the differential matrix.
  3. Normalization & Covariance Calculation: Normalize \( \mathbf{X}_k \) to get \( \bar{\mathbf{X}}_k \). Compute the sample covariance matrix \( \mathbf{C}_k \).
  4. Eigen-Decomposition & Indicator Calculation: Perform eigen-decomposition on \( \mathbf{C}_k \). Compute the fault detection indicator \( D_k \) using the largest eigenvalue \( \lambda_{1,k} \).
  5. Fault Detection: Compare \( D_k \) with the threshold \( \delta_{HS} \). If \( D_k \le \delta_{HS} \), the battery energy storage system is considered healthy. Return to Step 2.
  6. Fault Location: If \( D_k > \delta_{HS} \), a fault is declared. Compute the contribution vector \( \boldsymbol{\eta} \) from the principal eigenvector \( \mathbf{v}_{1,k} \). Identify the two largest non-adjacent contributions and apply the mapping rule to locate the faulty battery module.

3. Experimental Validation and Discussion

3.1 Data Source and Experimental Setup

Validation was performed using data from a 12S (12 series-connected modules) battery energy storage system. Each module had a nominal voltage of 12V and a rated capacity of 2.5Ah at 1C. Two primary experiments were conducted: one under normal healthy operation and another with a simulated incipient short-circuit fault. The load profile for both tests was a 2C discharge rate under the World Light Vehicle Test Cycle (WLTC) driving profile. For the fault experiment, a low-resistance path (approximately 0.2 Ω) was connected in parallel to a specific module (e.g., module #1) to simulate an incipient short circuit during the time interval [800s, 1000s].

Table 1: Key Parameters of the Tested Battery Energy Storage System
Parameter Value
System Configuration 12S (12 in series)
Module Nominal Voltage 12 V
Module Rated Capacity 2.5 Ah @ 1C
Test Load Profile WLTC @ 2C
Simulated Fault Resistance ~0.2 Ω
Fault Injection Window 800 s – 1000 s
Sliding Window Width (w) Empirically tuned

3.2 Results Under Healthy System Operation

The voltage curves for all 12 modules during a full charge-discharge cycle under healthy conditions are shown below (conceptually). The corresponding two sets of differential voltage sequences (\( \mathbf{X}_k^1 \) and \( \mathbf{X}_k^2 \)) were calculated. While the differential operation reduced the overall signal drift, some variability due to system non-linearity and state-of-charge effects remained. The health detection indicators \( D_k^1 \) and \( D_k^2 \) for both differential sequences were computed in real-time. The results confirmed that during healthy operation, although momentary fluctuations occurred, the indicators did not simultaneously and persistently exceed their respective empirically set thresholds \( \delta_{sc,1} \) and \( \delta_{sc,2} \). Consequently, the overall system health status \( D_k \) remained below the final threshold \( \delta_{HS} \), correctly indicating a healthy battery energy storage system.

3.3 Results Under Short-Circuit Fault Condition

The method’s performance was evaluated with the incipient short-circuit fault simulated on module #1. The equivalent short-circuit current was approximately 16.21A (0.5C). The calculated minimum detectable fault impedance for this setup was 0.1886Ω, confirming that the injected fault (0.2Ω) was within the detectable range.

Table 2: Fault Detection and Location Performance Summary
Metric Value / Observation
Faulty Module #1
Equivalent Short-Circuit Current ~0.5C (16.21 A)
Detection Time (Sequence 1, \(D_k^1\)) ~1.5 s after fault inception
Detection Time (Sequence 2, \(D_k^2\)) ~6.4 s after fault inception
Final Fault Declaration At \(t = 806.4\) s
Location Accuracy Correctly identified Module #1
Post-Fault Alarm Persistence Continued due to accumulated imbalance (can be cleared by balancing system)

The fault detection indicators \( D_k^1 \) and \( D_k^2 \) both exhibited significant rises shortly after the fault was injected at t=800s. The difference in their response times and peak magnitudes is attributed to how the fault signature projects onto the two different differential sequences and the varying channel noise characteristics. The crucial point is that both exceeded their thresholds, triggering a system fault alarm. Subsequently, the eigenvector contribution analysis was performed. The contribution vector \( \boldsymbol{\eta} \) showed the predicted pattern, with the two largest non-adjacent contributions correctly mapping to module #1. This validated the entire detection and location pipeline. It is noted that after the simulated fault path was removed at t=1000s, the fault indicator remained high for a period because the short circuit had caused a state-of-charge imbalance between modules—an actual persistent effect. In a real battery energy storage system with an active balancing function, this imbalance would be corrected, and the alarm would eventually clear.

4. Conclusion and Future Perspectives

This article has presented a novel, data-driven methodology for the timely detection and precise location of incipient short-circuit faults within a battery energy storage system. By leveraging a modified interleaved measurement topology, the method enhances the observability of fault signatures. The core innovation lies in the design of a fault detection indicator based on the eigen-decomposition of a normalized differential voltage covariance matrix, coupled with a rigorous theoretical analysis that quantifies the relationship between the detection threshold and the minimum detectable fault amplitude. The subsequent fault location is achieved through a principled analysis of eigenvector contributions.

Experimental validation on a 12S battery pack demonstrated the method’s effectiveness, achieving accurate detection and location of a 0.5C equivalent short-circuit fault within 6.4 seconds. This meets the real-time requirements necessary for preventing the development of incipient faults into catastrophic thermal runaway events, thereby significantly enhancing the safety and reliability of battery energy storage systems.

For future work, the proposed real-time health assessment framework can be extended to encompass other common failure modes in a battery energy storage system, such as sensor faults and connection faults, creating a comprehensive diagnostic suite. Furthermore, to maintain high accuracy throughout the system’s lifetime, the method must adapt to aging and performance degradation. This involves dynamically updating the baseline healthy parameters (\( \boldsymbol{\mu}_0, \boldsymbol{\xi}_0, \mu^*_{\lambda,1}, \xi^*_{\lambda,1} \)) using long-term health assessment data, enabling precise and robust real-time monitoring across the entire operational life of the battery energy storage system.

Scroll to Top