Remaining Useful Life Prediction for Energy Storage Lithium Batteries Using BiLSTM and Attention Mechanism

In recent years, energy storage lithium batteries have become indispensable in various industrial applications due to their high energy density, long cycle life, and low self-discharge rates. These batteries are critical for renewable energy integration, electric vehicles, and grid stabilization. However, the performance degradation of energy storage lithium batteries over time poses significant challenges to their reliability and safety. Accurately predicting the remaining useful life (RUL) of these batteries is essential for optimizing maintenance schedules, preventing failures, and extending operational lifespan. Traditional methods for RUL prediction often struggle with capturing the complex, nonlinear degradation patterns inherent in battery data, leading to reduced accuracy in real-world scenarios.

Existing approaches for battery health monitoring can be broadly categorized into model-based and data-driven methods. Model-based techniques rely on physical or electrochemical models to simulate battery behavior, but they often require precise parameter tuning and may not adapt well to varying operating conditions. In contrast, data-driven methods leverage historical data to learn degradation patterns without explicit physical models. Techniques such as Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU) have shown promise in handling sequential data, but they may overlook bidirectional dependencies in time series. For instance, LSTM networks process data in a forward direction, potentially missing critical information from future time steps. To address this, Bidirectional LSTM (BiLSTM) networks have been introduced, which capture both past and future context, enhancing the model’s ability to understand temporal dynamics.

Despite these advancements, many data-driven models fail to prioritize influential features in the input sequence, leading to suboptimal predictions. Attention mechanisms have emerged as a powerful tool to mitigate this issue by dynamically weighting important time steps. In this study, we propose a novel BiLSTM-Attention model for RUL prediction of energy storage lithium batteries. Our model combines the bidirectional processing capability of BiLSTM with the focus-enhancing properties of attention, enabling it to capture long-term dependencies and highlight critical degradation phases. We validate our approach using publicly available battery datasets and demonstrate superior performance compared to existing methods.

The core of our methodology lies in the integration of BiLSTM and attention mechanisms. The BiLSTM component processes input sequences in both forward and backward directions, allowing the model to learn from historical and future data points simultaneously. This is particularly beneficial for energy storage lithium batteries, where degradation trends may exhibit complex temporal correlations. The output of the BiLSTM layer is then passed to an attention layer, which computes similarity scores between current and previous hidden states. These scores are used to assign weights, emphasizing time steps that significantly impact RUL prediction. The weighted output is finally fed into a fully connected layer to generate the RUL estimate.

To formalize the BiLSTM component, we define the following equations for a standard LSTM unit. The input gate $ i_t $, forget gate $ f_t $, output gate $ o_t $, and memory cell $ c_t $ are computed as follows:

$$ i_t = \sigma \left( W_i \cdot [h_{t-1}, x_t] + b_i \right) $$

$$ f_t = \sigma \left( W_f \cdot [h_{t-1}, x_t] + b_f \right) $$

$$ o_t = \sigma \left( W_o \cdot [h_{t-1}, x_t] + b_o \right) $$

$$ c_t = f_t \odot c_{t-1} + i_t \odot \tanh \left( W_c \cdot [h_{t-1}, x_t] + b_c \right) $$

$$ h_t = o_t \odot \tanh(c_t) $$

where $ \sigma $ denotes the sigmoid activation function, $ \odot $ represents element-wise multiplication, $ h_{t-1} $ is the previous hidden state, $ x_t $ is the current input, $ W $ and $ b $ are weight matrices and bias vectors, respectively. In the BiLSTM architecture, we incorporate both forward and backward LSTM layers. The hidden states from these layers are concatenated to form the final output:

$$ H_t = h_{t,f} \oplus h_{t,b} $$

Here, $ h_{t,f} $ and $ h_{t,b} $ represent the hidden states from the forward and backward LSTM layers at time $ t $, and $ \oplus $ denotes concatenation. This bidirectional approach enables the model to capture comprehensive temporal dependencies in energy storage lithium battery data.

The attention mechanism further refines the model by focusing on relevant time steps. Given the BiLSTM output $ H = (H_1, H_2, \ldots, H_t) $, the attention score for each time step is calculated as:

$$ \text{Sim}(H_t, H_i | \omega_\alpha) = V_a^T \tanh(W_a H_t + U_a H_i) $$

where $ W_a $, $ U_a $, and $ V_a $ are learnable weight parameters, and $ \omega_\alpha $ represents the attention weight vector. The attention weights are normalized using the softmax function:

$$ s(H_t, H_i | \omega_\alpha) = \frac{\exp[\text{Sim}(H_t, H_i | \omega_\alpha)]}{\sum_{i=1}^{t} \exp[\text{Sim}(H_t, H_i | \omega_\alpha)]} $$

The final attention-weighted output is obtained by:

$$ A(H, H_t | \omega_\alpha) = \sum_{i=1}^{N} s(H_t, H_i | \omega_\alpha) \cdot H_i $$

This output is then used for RUL prediction, ensuring that the model prioritizes influential features in the degradation process of energy storage lithium batteries.

We evaluated our BiLSTM-Attention model using the CALCE battery dataset, which includes cycling data for multiple energy storage lithium batteries under controlled conditions. The dataset comprises parameters such as voltage, current, capacity, and internal resistance, recorded at regular intervals during charge-discharge cycles. We focused on four battery samples (CS2-35, CS2-36, CS2-37, CS2-38) with varying cycle lives, ranging from 800 to over 1000 cycles. The State of Health (SOH) was calculated as the ratio of actual capacity to rated capacity:

$$ \text{SOH} = \frac{Q_{\text{real}}}{Q_N} $$

where $ Q_{\text{real}} $ is the measured capacity and $ Q_N $ is the nominal capacity. The RUL is derived from the SOH trajectory, defined as the number of cycles until the battery reaches end-of-life criteria (e.g., SOH below 80%).

Data preprocessing involved normalizing the input features to a [0,1] range and handling missing values through interpolation. The dataset was split into 70% for training and 30% for testing. We used a sliding window approach with a window size of 12 time steps to create input sequences for the model. The model parameters were initialized as follows:

Parameter	Value
Number of Epochs	1000
Batch Size	32
Hidden Units	64
Optimizer	Adam
Learning Rate	0.001
Window Size	12

To assess model performance, we employed several evaluation metrics: R-squared (R²), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). These metrics are defined as:

$$ R^2 = 1 – \frac{\sum_{i=1}^{m} (y_i – \hat{y}_i)^2}{\sum_{i=1}^{m} (y_i – \bar{y})^2} $$

$$ \text{MSE} = \frac{1}{m} \sum_{i=1}^{m} (y_i – \hat{y}_i)^2 $$

$$ \text{RMSE} = \sqrt{\frac{1}{m} \sum_{i=1}^{m} (y_i – \hat{y}_i)^2} $$

$$ \text{MAE} = \frac{1}{m} \sum_{i=1}^{m} |y_i – \hat{y}_i| $$

where $ y_i $ is the actual value, $ \hat{y}_i $ is the predicted value, $ \bar{y} $ is the mean of actual values, and $ m $ is the number of samples. Higher R² values and lower MSE, RMSE, and MAE values indicate better prediction accuracy.

We compared our BiLSTM-Attention model against three baseline methods: LSTM, BiLSTM, and VMD-TCN-Attention (VMD-TCN). The training loss convergence for our model is shown in Figure 1, where the loss decreases rapidly in the initial epochs and stabilizes after 100 epochs, indicating effective learning. The prediction results for the four battery samples are summarized in Table 1.

Table 1: Performance Comparison of RUL Prediction Models
Battery ID	Model	R² (%)	MSE (×10⁻⁴)	RMSE (×10⁻²)	MAE (×10⁻²)
CS2-35	LSTM	94.50	7.10	8.43	1.46
	BiLSTM	96.82	1.00	3.20	1.02
	VMD-TCN	96.38	1.20	3.48	1.10
	BiLSTM-Attention	98.77	0.40	1.51	0.69
CS2-36	LSTM	98.27	0.80	2.88	1.78
	BiLSTM	98.39	0.80	2.78	1.67
	VMD-TCN	99.46	0.30	1.61	1.17
	BiLSTM-Attention	99.64	0.20	1.32	1.03
CS2-37	LSTM	98.47	0.50	2.25	0.93
	BiLSTM	99.13	0.30	1.71	1.00
	VMD-TCN	96.47	9.10	9.54	8.27
	BiLSTM-Attention	99.50	0.20	1.29	0.85
CS2-38	LSTM	96.97	1.10	3.27	2.01
	BiLSTM	97.54	0.90	2.95	1.62
	VMD-TCN	96.18	5.30	7.28	6.58
	BiLSTM-Attention	98.47	0.50	2.33	1.53

The results demonstrate that our BiLSTM-Attention model achieves the highest R² values across all battery samples, reaching up to 99.64% for CS2-36. Similarly, the MSE, RMSE, and MAE values are consistently lower than those of the baseline methods, highlighting the model’s superior accuracy and stability. For example, in CS2-35, the BiLSTM-Attention model reduces MSE by 60% compared to the standard BiLSTM model. This improvement can be attributed to the attention mechanism’s ability to focus on critical degradation phases, which is particularly important for energy storage lithium batteries exhibiting complex aging patterns.

Furthermore, we analyzed the SOH prediction trends for each battery sample. The BiLSTM-Attention model closely tracks the actual SOH curves, with minimal deviation even at the end-of-life stages. This consistency underscores the model’s robustness in capturing the nonlinear degradation behavior of energy storage lithium batteries. The attention weights revealed that specific cycles, such as those with rapid capacity fade, were assigned higher importance, enabling the model to adaptively prioritize influential data points.

In conclusion, our BiLSTM-Attention model offers a reliable and accurate solution for RUL prediction of energy storage lithium batteries. By leveraging bidirectional temporal dependencies and attention-based feature weighting, the model effectively addresses the limitations of existing methods. Future work will explore the integration of multi-sensor data, such as temperature and impedance, to enhance prediction under diverse operating conditions. Additionally, we plan to investigate transfer learning techniques to generalize the model across different battery chemistries and formats, further advancing the management of energy storage lithium batteries.