Cell Energy Storage System Two-Level Diagnostic Method Based on Battery Module Reconfiguration

Ensuring the safe and efficient operation of large-scale cell energy storage system is paramount for modern power grids integrating renewable energy. Lithium-ion batteries, the dominant technology in such systems, are susceptible to performance degradation and internal faults over their operational life. Within a battery pack composed of numerous cells in series and parallel, the failure of even a single unit can cascade, leading to significant performance loss or severe safety incidents like thermal runaway and fire. Therefore, accurate and timely diagnosis of anomalous battery modules is a critical challenge for the management of any cell energy storage system.

The State of Health (SOH) is a key metric for quantifying battery degradation. Traditional SOH estimation methods can be broadly categorized into model-based and data-driven approaches. Model-based methods rely on accurate electrochemical or equivalent circuit models, which can be computationally expensive and sensitive to parameter variations. Data-driven methods, particularly those leveraging neural networks, have shown great promise as they learn complex patterns from operational data like voltage, current, and temperature without requiring deep physical insight. However, many existing data-driven methods focus solely on algorithmic improvements and operate primarily at the cell level, which does not align with the practical trend towards module- and cluster-level management in real-world cell energy storage system deployments.

A promising paradigm shift is the adoption of dynamically reconfigurable battery topologies. These systems allow for the flexible connection, disconnection, or bypass of individual battery modules. While much research on reconfigurable topologies focuses on achieving active cell balancing during charge/discharge, their potential for enhancing diagnostic capabilities remains underexplored. This work bridges that gap by designing a reconfigurable topology and proposing a novel two-level diagnostic framework that effectively leverages this hardware capability for a more reliable and efficient cell energy storage system safety strategy.

Reconfigurable Battery System Architecture for Diagnostics

The proposed hardware foundation for the diagnostic method is a reconfigurable battery string architecture based on module-level bypass. As shown in the topology diagram, a battery string is formed by ‘n’ battery modules (B1…Bn) connected in series. The core innovation lies in the parallel bypass circuit attached to each module. This circuit consists of an N-channel MOSFET (M1…Mn) connected across the module terminals, driven by an isolated gate driver circuit (D1…Dn).

During normal operation, all MOSFETs are in the OFF state (non-conducting), and the battery modules are connected in series through the main path. Each battery module is also connected to a separate diagnostic bus via a pair of switches (e.g., Q1-1, Q1-2). When an anomaly is suspected in a specific module, the diagnostic process initiates: the corresponding MOSFET is turned ON (conducting), effectively creating a short-circuit path that bypasses the module from the main series string. Simultaneously, its connection switches to the diagnostic bus are closed, isolating it for detailed testing without interrupting the overall operation of the cell energy storage system. This hardware capability is fundamental to the proposed two-level diagnostic strategy.

Two-Level Diagnostic Strategy

The diagnostic methodology moves beyond a single, monolithic estimation step. It employs a “screening-confirmation” approach perfectly tailored to the reconfigurable hardware. The overall diagnostic workflow is illustrated in the following flowchart:

Level 1: Primary Anomaly Screening (Online Classification)
This stage continuously monitors easily accessible, module-level parameters for the entire string. The goal is to quickly identify modules that exhibit behavior deviating from the norm, flagging them as “suspected anomalies.” For an effective screening tool, we require a model that is fast, accurate, and robust. We employ a Least Squares Support Vector Machine (LSSVM) classifier for this task. LSSVM simplifies the standard SVM optimization problem by using equality constraints and a least squares cost function, leading to solving a set of linear equations rather than a quadratic programming problem. This enhances computational efficiency, which is crucial for real-time monitoring in a cell energy storage system.

The optimization problem for LSSVM is formulated as follows:

$$
\min_{w, b, \xi} J(w, \xi) = \frac{1}{2} ||w||^2 + \frac{C}{2}\sum_{i=1}^{n}\xi_i^2
$$

subject to the equality constraints:

$$
y_i[w^T \phi(x_i) + b] = 1 – \xi_i, \quad i = 1, 2, …, n
$$

where $w$ is the weight vector, $b$ is the bias term, $C$ is a regularization parameter controlling the trade-off between margin maximization and error tolerance, $\xi_i$ are slack variables, $x_i$ are the input feature vectors, $y_i \in \{-1, +1\}$ are the class labels (normal/anomalous), and $\phi(\cdot)$ is a nonlinear mapping function to a high-dimensional feature space. Using a Radial Basis Function (RBF) kernel, $K(x_i, x_j) = \phi(x_i)^T \phi(x_j) = \exp(-||x_i – x_j||^2 / \sigma^2)$, the classifier decision function becomes:

$$
f(x) = \text{sign}\left[\sum_{i=1}^{n} \alpha_i y_i K(x, x_i) + b\right]
$$

The input features for the LSSVM classifier are the module’s terminal voltage and its surface center temperature. Research indicates that a capacity fade exceeding 2% can manifest as detectable irregularities in these signals. These features are ideal for primary screening as they are non-invasive and simple to measure across all modules in a cell energy storage system. The model was trained on a large simulated dataset representing full lifecycle data under various operating conditions. The optimized parameters, found via simulated annealing, were $\gamma = 10$ (related to $C$) and $\sigma^2 = 0.2$ for the RBF kernel. Cross-validation results demonstrated a high classification accuracy of 97.4% on the test set, confirming its reliability for the screening task.

Level 2: Secondary SOH Estimation (Isolated Diagnosis)
Once a module is flagged by the LSSVM classifier and physically switched to the diagnostic bus by the reconfigurable circuit, the secondary diagnosis begins. This stage performs a precise, data-driven estimation of the module’s SOH to confirm the initial screening result. For this, we propose a novel neural network model that combines Gated Recurrent Units (GRUs) with Residual Connections (R-GRU). GRUs are excellent at capturing temporal dependencies in sequential data (like charge/discharge cycles) with fewer parameters than LSTMs, making them efficient to train. Residual connections help mitigate vanishing gradient problems in deeper networks, allowing for more effective training and stable learning.

The architecture of the R-GRU model is as follows: Normalized time-series data (charging voltage, current, temperature) from a fixed window of cycles is fed into a stack of three GRU layers. Residual skip connections are added between the output of one GRU layer and the input of the next. The final sequential output is flattened and passed through fully connected layers to produce a single SOH estimate.

The mathematical formulation of a GRU cell at time step $t$ is given by:

Update Gate: $$z_t = \sigma(W_z x_t + U_z h_{t-1})$$

Reset Gate: $$r_t = \sigma(W_r x_t + U_r h_{t-1})$$

Candidate Activation: $$\tilde{h}_t = \tanh(W x_t + U (r_t \odot h_{t-1}))$$

Final Activation: $$h_t = (1 – z_t) \odot h_{t-1} + z_t \odot \tilde{h}_t$$

where $x_t$ is the input, $h_{t-1}$ is the previous hidden state, $z_t$ and $r_t$ are the update and reset gates, $\sigma$ is the sigmoid function, $\tanh$ is the hyperbolic tangent, $\odot$ denotes element-wise multiplication, and $W$ and $U$ are learnable weight matrices.

The residual connection implements a bypass path, so the output $H(x)$ of a block with residual connection becomes: $$H(x) = x + F(x)$$ where $x$ is the input to the block and $F(x)$ represents the nonlinear transformations (GRU operations) applied to $x$. This structure facilitates the flow of gradients during backpropagation.

The model was trained and tested using the well-known NASA battery aging dataset (cells B05, B06, B07, B18). The first 80 cycles were used for training, and the remaining cycles for testing. The input features (voltage, current, temperature) were normalized. The model’s performance was compared against standard GRU, LSTM, and Convolutional Neural Network (CNN) models using Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) as metrics:

$$
\text{MAE} = \frac{1}{m}\sum_{i=1}^{m} |y_i – \hat{y}_i|
$$

$$
\text{RMSE} = \sqrt{\frac{1}{m}\sum_{i=1}^{m} (y_i – \hat{y}_i)^2}
$$

where $y_i$ is the true SOH, $\hat{y}_i$ is the estimated SOH, and $m$ is the number of samples.

Results and Performance Analysis

The performance of the primary LSSVM classifier was validated, achieving a 97% accuracy on a random validation set, confirming its effectiveness for rapid online screening of a cell energy storage system.

For the secondary R-GRU SOH estimator, the results were quantitatively superior. The following table summarizes the MAE and RMSE for all four models across the four test batteries:

Battery ID	Metric	R-GRU	GRU	LSTM	CNN
B05	MAE	0.0182	0.0272	0.0409	0.0383
B05	RMSE	0.0301	0.0456	0.0654	0.0616
B06	MAE	0.0171	0.0257	0.0361	0.0345
B06	RMSE	0.0278	0.0412	0.0692	0.0570
B07	MAE	0.0179	0.0299	0.0397	0.0409
B07	RMSE	0.0294	0.0564	0.0633	0.0648
B18	MAE	0.0155	0.0169	0.0321	0.0303
B18	RMSE	0.0283	0.0321	0.0549	0.0559

The performance improvement is significant. For instance, on battery B06, the R-GRU model reduced MAE by 33.4%, 52.6%, and 50.4% compared to the GRU, LSTM, and CNN models, respectively. Similarly, RMSE was reduced by 32.5%, 59.8%, and 51.2%. On battery B07, MAE reductions were 40.1%, 54.9%, and 56.2%, with RMSE reductions of 47.9%, 53.6%, and 54.6%. These results unequivocally demonstrate that the R-GRU model provides a more accurate and stable SOH estimation, which is critical for the confirmation stage of the diagnostic process in a cell energy storage system.

Qualitatively, the prediction curves from the R-GRU model are smoother and adhere more closely to the true SOH trajectory throughout the battery’s lifecycle, especially in the later stages of degradation where accurate estimation is most challenging and safety-critical.

Conclusion

This work addresses the critical need for accurate anomaly diagnosis in lithium-ion based cell energy storage system by presenting a holistic hardware-software solution. The core contributions are twofold: firstly, the design of a practical reconfigurable battery topology based on module-level bypass switches; and secondly, the proposal of a novel two-level diagnostic methodology that capitalizes on this hardware’s flexibility.

The two-level strategy—employing a fast and accurate LSSVM classifier for initial online screening of module anomalies based on voltage and temperature, followed by a precise R-GRU neural network for isolated SOH estimation—effectively reduces the false alarm rate and increases the safety and efficiency of the diagnostic process. The seamless integration of the diagnostic algorithm with the reconfigurable circuit allows suspicious modules to be isolated for detailed analysis without compromising the operation of the overall cell energy storage system.

Experimental validation on standard datasets confirms the high performance of both diagnostic levels: the LSSVM classifier achieved 97% accuracy in anomaly screening, and the proposed R-GRU model significantly outperformed common neural network architectures (GRU, LSTM, CNN) in SOH estimation accuracy. This integrated approach provides a robust framework for proactive health management and enhanced safety in future large-scale cell energy storage system deployments.