A Two-Level Diagnostic Framework for Battery Energy Storage System Anomalies Based on Reconfigurable Module Topology

The safe and reliable operation of large-scale battery energy storage systems (BESS) is paramount for integrating renewable energy sources and stabilizing modern power grids. Among the various internal failure modes, the gradual, often uneven, degradation of individual battery cells or modules poses a significant threat. This degradation, if left undetected, can lead to accelerated capacity fade, thermal imbalances, and in extreme cases, trigger cascading thermal runaway events. Therefore, the accurate and timely diagnosis of anomalous battery modules is a critical research frontier for enhancing the operational safety and longevity of battery energy storage systems. Traditional health monitoring approaches often struggle to balance diagnostic accuracy, computational efficiency, and practical applicability within the complex, large-scale architecture of a battery energy storage system.

State-of-Health (SOH) estimation remains the most direct metric for quantifying battery degradation. Existing methodologies can be broadly categorized into model-based and data-driven approaches. Model-based methods, such as equivalent circuit models coupled with filters (e.g., Kalman Filters), rely on precise parameter identification to map electrochemical dynamics to capacity loss. Their accuracy is highly sensitive to model fidelity and operating conditions. Conversely, data-driven methods, particularly those employing deep learning architectures like Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and Gated Recurrent Units (GRUs), have shown remarkable prowess in extracting complex, non-linear relationships between measurable operational data (voltage, current, temperature) and the underlying SOH. However, a common limitation of these state-of-the-art algorithms is their design paradigm: they are typically applied to a static, fixed battery pack configuration, treating it as a monolithic entity or a collection of permanently connected cells. This overlooks a significant technological trend—the emergence of reconfigurable battery energy storage system topologies.

Reconfigurable BESS architectures, enabled by power electronics switches, allow for dynamic alteration of the electrical connection between battery modules. This capability is primarily explored for active cell balancing and fault isolation. The core premise is that a faulty or severely degraded module can be physically bypassed or switched into a separate diagnostic circuit without shutting down the entire battery energy storage system. Yet, the diagnostic algorithms themselves have not evolved to fully exploit this hardware flexibility. Most SOH estimation models remain passive observers, rather than active participants in a dynamically reconfigurable framework.

To bridge this gap, this paper proposes a novel, proactive two-level diagnostic methodology specifically designed for and empowered by a reconfigurable battery energy storage system topology. This framework synergizes hardware reconfiguration capability with intelligent software diagnostics. The first level acts as a fast, computationally lightweight screening process using a Least Squares Support Vector Machine (LSSVM) classifier. It continuously monitors easily accessible module-level parameters to flag “suspect” modules exhibiting outlier behavior. Once a module is flagged, the reconfiguration hardware physically isolates it from the main series string and connects it to a dedicated diagnostic circuit. The second level then engages a high-fidelity, data-driven SOH estimation model—a Residual-connected Gated Recurrent Unit (R-GRU) network—to perform a detailed, precise assessment of the isolated module’s health, thereby validating the initial screening. This hierarchical approach leverages the strengths of both simple classifiers and complex estimators while utilizing the hardware’s reconfigurability to enhance safety and diagnostic accuracy. The subsequent sections detail the system architecture, the theoretical foundation of the diagnostic algorithms, and present comprehensive validation results demonstrating the framework’s efficacy.

Reconfigurable Battery Energy Storage System Architecture for Proactive Diagnostics

The proposed diagnostic framework is built upon a specifically designed reconfigurable battery module topology. Unlike static packs, this system introduces a layer of controllability at the module level, enabling the dual functions of continued operation and targeted diagnostics. The core schematic of this topology is presented conceptually below. The system comprises three main components: the battery module本体, the reconfiguration (bypass) circuit, and the isolation/diagnostic circuit.

In this architecture, n battery modules (B₁, B₂, …, B_n) are connected in series through main contactors or switches (Q_n) to form a high-voltage battery string. In parallel with each battery module, a reconfiguration circuit is installed. This circuit primarily consists of a power N-type MOSFET (M_n) whose Drain and Source terminals are connected across the terminals of the corresponding module B_n. The Gate driver circuit (D_n), often optically isolated for safety, controls the state of this MOSFET. During normal operation, all MOSFETs M_n are in the OFF state (open circuit). The battery modules are solely connected in series via the main path, and the battery energy storage system delivers power as a conventional string.

The diagnostic capability is activated when an anomaly is suspected. Each battery module is also connected to a dedicated diagnostic bus via a pair of switches (Q_n-1, Q_n-2). When the primary diagnostic level flags module B_k as suspect, the following reconfiguration sequence occurs:

The controller sends a signal to the driver D_k to turn ON MOSFET M_k. This creates a low-resistance bypass path across module B_k.
Almost simultaneously, the main series contactor Q_k is opened to break the high-current path through the module.
The isolation switches Q_k-1 and Q_k-2 are closed, connecting the now-bypassed module B_k to the separate, lower-power diagnostic circuit.

This process effectively removes the suspect module from the primary energy delivery circuit while making it available for detailed, offline testing without interrupting the operation of the rest of the battery energy storage system. The diagnostic circuit can apply specialized current profiles, measure impedance, or collect data for the secondary SOH estimation model in a controlled, isolated environment.

The Two-Level Diagnostic Methodology: From Screening to Precise Estimation

The algorithmic core of the framework is a two-level strategy that mirrors the hardware’s capability for isolation and focused analysis. This strategy is fundamentally different from applying a single, complex SOH estimator to all modules simultaneously. It optimizes for both system-wide monitoring efficiency and module-specific diagnostic accuracy.

Level 1: Primary Anomaly Screening via LSSVM Classifier
The objective of the first level is continuous, real-time surveillance of all modules in the battery energy storage system to identify potential outliers. The choice of input features is critical: they must be easily measurable at the module level, sensitive to degradation, and computationally cheap to process. Studies indicate that module terminal voltage and surface temperature distribution are highly effective indicators. A module with accelerated capacity fade or increased internal resistance will exhibit divergent voltage under load/charge and may show abnormal thermal signatures compared to its peers.

We employ a Least Squares Support Vector Machine (LSSVM) for this binary classification task (normal vs. suspect). LSSVM modifies the classic SVM formulation by using a least squares cost function and equality constraints, transforming the problem from a quadratic programming (QP) task to solving a set of linear equations. This significantly reduces computational complexity, making it ideal for the fast, repeated execution required for system-wide monitoring.

Given a training dataset of m samples $\{ (x_1, y_1), (x_2, y_2), …, (x_m, y_m) \}$ where $x_i \in R^d$ is the feature vector (e.g., [Voltage_deviation, Temperature_gradient]) and $y_i \in \{-1, +1\}$ is the class label, the LSSVM optimization problem is formulated as:
$$
\min_{w, b, e} J(w, e) = \frac{1}{2} w^T w + \frac{\gamma}{2} \sum_{i=1}^{m} e_i^2
$$
subject to the equality constraints:
$$
y_i [w^T \phi(x_i) + b] = 1 – e_i, \quad i = 1, …, m
$$
where $w$ is the weight vector, $b$ is the bias term, $e_i$ are error variables, $\gamma$ is a regularization parameter controlling the trade-off between margin maximization and error minimization, and $\phi(\cdot)$ is a nonlinear mapping function to a high-dimensional feature space. Using the Radial Basis Function (RBF) kernel $K(x_i, x_j) = \phi(x_i)^T \phi(x_j) = \exp(-\|x_i – x_j\|^2 / 2\sigma^2)$, the solution is found in the dual space by solving a linear system. The resulting decision function for classification is:
$$
f(x) = \text{sign}\left( \sum_{i=1}^{m} \alpha_i y_i K(x, x_i) + b \right)
$$
where $\alpha_i$ are the Lagrange multipliers. A model trained on a comprehensive dataset simulating various operational profiles and degradation stages can achieve high accuracy in flagging modules whose voltage/temperature patterns deviate from the normal cluster.

Level 2: Secondary SOH Estimation via Residual-Enhanced GRU Network
Once a module is isolated into the diagnostic circuit, the secondary level engages. This level’s goal is to provide a precise, quantitative estimate of the module’s State-of-Health (SOH), typically defined as the ratio of current maximum capacity to initial capacity ($SOH = C_{current} / C_{nominal} \times 100\%$). For this task, we leverage the power of recurrent neural networks to model temporal dependencies in battery cycling data and enhance it with residual connections to improve gradient flow and model performance.

The chosen architecture is a stack of Gated Recurrent Unit (GRU) layers with residual skip connections (R-GRU). GRUs are advantageous over LSTMs due to their simpler gating mechanism (combining the forget and input gates into a single “update gate”), leading to fewer parameters and faster training while still effectively capturing long-term dependencies. The residual connections help mitigate the vanishing gradient problem in deeper networks, allowing for more effective training.

For a time step $t$, the GRU calculations are as follows:
Update Gate: $z_t = \sigma(W_z \cdot x_t + U_z \cdot h_{t-1})$
Reset Gate: $r_t = \sigma(W_r \cdot x_t + U_r \cdot h_{t-1})$
Candidate Activation: $\tilde{h}_t = \tanh(W \cdot x_t + U \cdot (r_t \odot h_{t-1}))$
Final Activation: $h_t = (1 – z_t) \odot h_{t-1} + z_t \odot \tilde{h}_t$
where $x_t$ is the input vector (e.g., sequences of charge voltage, current, temperature from diagnostic tests), $h_t$ is the hidden state, $W$ and $U$ are weight matrices, $\sigma$ is the sigmoid function, $\tanh$ is the hyperbolic tangent function, and $\odot$ denotes element-wise multiplication.

A residual block incorporating a GRU layer can be expressed as:
$$
h_t^{l+1} = \mathcal{F}(h_t^l, \Theta^l) + h_t^l
$$
where $h_t^l$ is the input to the $l$-th residual block/GRU layer, $\mathcal{F}$ represents the transformation learned by the GRU layer and its activation, $\Theta^l$ are the layer’s parameters, and $h_t^{l+1}$ is the output. This structure allows the network to learn modifications to the identity mapping, which is often easier than learning the transformation from scratch.

The overall R-GRU model for SOH estimation takes time-series windows of diagnostic charge/discharge data as input, processes them through several residual-GRU blocks, flattens the output, and passes it through fully connected layers to produce a single SOH estimate. This model is trained on historical data from beginning-of-life to end-of-life, learning the complex mapping from operational signatures to capacity fade.

Experimental Validation and Performance Analysis

The proposed two-level framework was validated using both simulated data for the LSSVM classifier and publicly available benchmark data for the R-GRU SOH estimator.

Level 1 (LSSVM) Validation:
A dataset of 5,000 samples was generated through simulation to represent module-level voltage and surface temperature distributions under various states of health and load conditions within a battery energy storage system. The dataset was split 80/20 for training and testing. The LSSVM model with an RBF kernel was tuned, achieving optimal parameters ($\gamma = 10$, $\sigma^2 = 0.2$). The model’s performance in distinguishing normal from anomalous modules is summarized below:

Dataset	Accuracy	Precision	Recall	F1-Score
Training Set (4,000 samples)	98.9%	0.987	0.991	0.989
Test Set (1,000 samples)	97.4%	0.972	0.975	0.973

The high accuracy and F1-score demonstrate the LSSVM classifier’s effectiveness as a reliable and fast primary screening tool for a battery energy storage system.

Level 2 (R-GRU) Validation:
The secondary diagnostic model was trained and tested using the NASA Ames Prognostics Center of Excellence lithium-ion battery dataset. Cells B05, B06, B07, and B18 were used, with data from the first 80 cycles used for training and the remaining cycles for testing. Input features were charge voltage, charge current, and cell temperature sequences, normalized prior to training. The proposed R-GRU model was compared against standard GRU, LSTM, and CNN models. The prediction results for cell B05 are illustrated conceptually, showing the R-GRU’s estimations closely tracking the actual SOH degradation curve with minimal fluctuation, outperforming the other models. A quantitative error analysis across all test cells confirms this superiority.

Cell ID	Model	MAE	RMSE	Improvement in MAE vs. Baseline*
B06	R-GRU (Proposed)	0.0171	0.0278	–
	GRU	0.0257	0.0412	33.4% Reduction
	LSTM	0.0361	0.0692	52.6% Reduction
	CNN	0.0345	0.0570	50.4% Reduction
B07	R-GRU (Proposed)	0.0179	0.0294	–
	GRU	0.0299	0.0564	40.1% Reduction
	LSTM	0.0397	0.0633	54.9% Reduction
	CNN	0.0409	0.0648	56.2% Reduction

*Baseline for comparison is the respective standalone model (GRU, LSTM, CNN).

The results unequivocally show that the R-GRU model achieves the lowest Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) across all test cases. The incorporation of residual connections consistently enhanced the prediction stability and accuracy of the base GRU model, making it a highly reliable tool for the precise SOH estimation required in the second diagnostic level of the battery energy storage system framework.

Conclusion and Future Outlook

This paper presented an integrated hardware-software framework for advanced anomaly diagnosis in battery energy storage systems. The core innovation lies in the synergistic combination of a reconfigurable module-level topology with a two-level diagnostic algorithm. The architecture of the battery energy storage system is no longer passive but actively supports the diagnostic process through physical reconfiguration. The algorithmic strategy efficiently decomposes the complex problem of system-wide health monitoring: the LSSVM-based primary level provides rapid, system-wide anomaly screening, while the R-GRU-based secondary level delivers precise, module-specific SOH estimation in a safe, isolated environment.

The experimental validation confirms the effectiveness of both levels. The LSSVM classifier achieved over 97% accuracy in identifying suspect modules, and the proposed R-GRU model significantly outperformed common deep learning benchmarks in SOH estimation accuracy, with MAE reductions exceeding 40% in many cases. This framework directly addresses key challenges in battery energy storage system management: it enhances operational safety by allowing for the proactive isolation of degrading modules, improves maintenance planning through accurate SOH data, and increases system availability by enabling continued operation during diagnostic procedures.

Future work will focus on several areas to advance this paradigm. First, the co-optimization of the diagnostic triggering policy (thresholds for the LSSVM) and the reconfiguration control logic to minimize energy impact and switching wear. Second, expanding the diagnostic models to predict not just SOH but also Remaining Useful Life (RUL) and specific failure modes. Third, implementing and testing this framework on a real-world, grid-scale battery energy storage system prototype to validate its performance under realistic, noisy conditions and diverse aging scenarios. The integration of reconfigurable power electronics with intelligent, hierarchical diagnostics represents a significant step forward in building more resilient, safe, and intelligent battery energy storage systems for the future grid.