In recent years, the rapid development of electrochemical energy storage systems has highlighted the critical need for accurate state of health (SOH) assessment of energy storage cells. These cells, particularly lithium-ion-based systems, are pivotal in applications ranging from grid stabilization to renewable energy integration. However, existing methods for SOH evaluation often suffer from limited precision due to the complex electrochemical processes involved. This study addresses this challenge by proposing a data-driven approach that leverages operational data to estimate the SOH of energy storage cells with high accuracy. We focus on extracting health factors from voltage profiles during charging and employing an optimized long short-term memory (LSTM) neural network enhanced with genetic algorithms (GA). By analyzing the correlation between voltage variations and SOH, we identify key intervals that serve as reliable indicators of degradation. The methodology is validated through cyclic aging experiments on commercial energy storage cells, demonstrating significant improvements over conventional models. This approach not only enhances the reliability of SOH assessments but also supports the safe and efficient operation of energy storage systems in real-world scenarios.
The importance of energy storage cells in modern power systems cannot be overstated. They enable the storage of excess energy from renewable sources, such as solar and wind, and provide backup power during peak demand. However, the performance of these cells degrades over time due to factors like cycling, temperature variations, and internal chemical changes. Accurately assessing the SOH—defined as the ratio of current capacity to nominal capacity—is essential for predicting remaining useful life, preventing failures, and optimizing maintenance schedules. Traditional methods, such as electrochemical or equivalent circuit models, often struggle with robustness and adaptability to varying operational conditions. In contrast, data-driven techniques harness the power of machine learning to model complex relationships without requiring detailed physical insights. This paper explores the use of fragment data from charging cycles to derive health factors and integrates them into a GA-optimized LSTM network for precise SOH estimation. The results show that this method reduces errors substantially compared to unoptimized models, making it a valuable tool for the energy storage industry.
Background and Motivation
Energy storage cells, especially lithium-ion variants, are widely adopted due to their high energy density, long cycle life, and environmental benefits. Despite these advantages, their degradation over time poses significant challenges for system reliability. The SOH of an energy storage cell reflects its current condition relative to its initial state, typically expressed as a percentage. A lower SOH indicates increased aging and reduced capacity. Accurate SOH assessment is crucial for applications like electric vehicles and grid-scale storage, where unexpected failures can lead to safety hazards and economic losses. Existing approaches include model-based methods, which rely on mathematical representations of cell behavior, and data-driven methods, which use historical data to train predictive models. While model-based techniques can be accurate under controlled conditions, they often lack generalization capabilities. Data-driven methods, such as those using neural networks, offer flexibility but require careful selection of input features to avoid overfitting and ensure accuracy. This study focuses on identifying optimal health factors from routine operational data—specifically, voltage changes during charging—and employing an advanced LSTM network optimized with genetic algorithms to enhance prediction performance.
The degradation of energy storage cells is influenced by multiple factors, including charge-discharge cycles, operating temperature, and depth of discharge. Over time, these factors lead to capacity fade and increased internal resistance, which can be monitored through voltage and current profiles. In this work, we analyze charging data to extract features that correlate strongly with SOH. By selecting voltage segments that capture key electrochemical transitions, we create health factors that are both informative and easy to obtain in practical settings. The LSTM network is chosen for its ability to handle sequential data, making it ideal for time-series analysis of battery parameters. However, LSTM models involve hyperparameters that can significantly impact performance. To address this, we use genetic algorithms—a global optimization technique—to fine-tune these parameters, resulting in a more robust and accurate model. This combined approach of feature extraction and model optimization represents a significant step forward in SOH assessment for energy storage cells.
Methodology
The proposed methodology for SOH assessment of energy storage cells consists of three main steps: health factor extraction, model development, and optimization. First, we collect operational data from charging cycles of energy storage cells and identify voltage segments that exhibit high correlation with SOH. Second, we design an LSTM neural network to process these health factors and predict SOH. Finally, we apply genetic algorithms to optimize the hyperparameters of the LSTM model, improving its accuracy and stability.
Health Factor Extraction
To derive health factors, we analyze the voltage profiles of energy storage cells during constant-current charging. The charging process involves distinct phases where voltage changes reflect internal electrochemical states. Specifically, we focus on the voltage plateau between 3.25 V and 3.40 V, as this region corresponds to the primary redox reactions in lithium iron phosphate (LiFePO4) energy storage cells. We calculate the voltage change over three time intervals: 10 minutes after reaching 3.25 V, 20 minutes before reaching 3.35 V, and 30 minutes before reaching 3.40 V. These intervals are selected based on their Pearson correlation coefficients with SOH, as summarized in Table 1. The Pearson correlation coefficient, denoted as r, measures the linear relationship between two variables and is computed as:
$$ r = \frac{\text{Cov}(X, Y)}{S_X S_Y} $$
where Cov(X, Y) is the covariance between the voltage change (X) and SOH (Y), and S_X and S_Y are the standard deviations. A higher absolute value of r indicates a stronger correlation, with values close to 1 or -1 signifying high positive or negative correlation, respectively.
| Time Interval | Voltage Point | Correlation Coefficient (r) |
|---|---|---|
| 10 minutes | After 3.25 V | 0.6406 |
| 20 minutes | Before 3.35 V | 0.6358 |
| 30 minutes | Before 3.40 V | 0.9028 |
The selected health factors—denoted as ΔV1, ΔV2, and ΔV3—are used as input features for the SOH assessment model. These factors capture the dynamics of voltage changes that are most sensitive to aging, providing a compact representation of the cell’s health status.
LSTM Neural Network
The long short-term memory (LSTM) network is a type of recurrent neural network (RNN) designed to handle sequential data by incorporating memory cells and gating mechanisms. Unlike standard RNNs, LSTMs mitigate the vanishing gradient problem, making them suitable for long-term dependencies. The architecture includes three gates: forget gate, input gate, and output gate, which regulate the flow of information. The equations for these gates are as follows:
Forget gate:
$$ f_t = \sigma [W_f (h_{t-1}, x_t) + b_f] $$
Input gate:
$$ i_t = \sigma [W_i (h_{t-1}, x_t) + b_i] $$
$$ \tilde{C}_t = \tanh [W_C (h_{t-1}, x_t) + b_C] $$
Cell state update:
$$ C_t = f_t \cdot C_{t-1} + i_t \cdot \tilde{C}_t $$
Output gate:
$$ o_t = \sigma [W_o (h_{t-1}, x_t) + b_o] $$
$$ h_t = o_t \cdot \tanh(C_t) $$
Here, σ represents the sigmoid activation function, tanh is the hyperbolic tangent function, W denotes weight matrices, b denotes bias vectors, x_t is the input at time t, h_t is the hidden state, and C_t is the cell state. The LSTM model processes the sequence of health factors (ΔV1, ΔV2, ΔV3) over multiple cycles to predict the SOH of the energy storage cell.
Genetic Algorithm Optimization
Genetic algorithms (GA) are evolutionary optimization techniques inspired by natural selection. They are used to find optimal hyperparameters for the LSTM model, such as the number of hidden units, learning rate, and number of epochs. The GA process involves initialization, selection, crossover, and mutation. Initially, a population of candidate solutions (hyperparameter sets) is generated randomly. Each candidate is evaluated using a fitness function—in this case, the mean squared error (MSE) on a validation set. Candidates with higher fitness (lower MSE) are selected for reproduction, where crossover and mutation operations create new offspring. This process iterates until a termination criterion is met, such as a maximum number of generations. The optimized hyperparameters are then used to train the final LSTM model, referred to as GA-LSTM. This optimization enhances the model’s ability to generalize and reduces the risk of overfitting, leading to more accurate SOH estimates for energy storage cells.
Experimental Setup
To validate the proposed method, we conducted cyclic aging experiments on five soft-pack LiFePO4 energy storage cells with a nominal capacity of 20 A·h and a voltage range of 2.5 V to 3.65 V. The cells were subjected to repeated charge-discharge cycles at a constant temperature of 25°C using a battery testing system. Each cycle involved charging at 0.5C rate to 3.65 V and discharging to 2.5 V. After every 100 cycles, the capacity of each energy storage cell was calibrated to determine the actual SOH using the formula:
$$ \text{SOH} = \frac{Q_0}{Q_N} \times 100\% $$
where Q_0 is the current capacity and Q_N is the nominal capacity (20 A·h). Operational data, including voltage, current, and time, were recorded during charging for health factor extraction. The dataset comprised over 4,000 cycles across all cells, with 80% used for training and 20% for testing. Additionally, one cell was reserved for independent validation to assess model generalization. The experimental setup ensured that the data reflected real-world operating conditions, making the results applicable to practical energy storage systems.

Results and Analysis
The cycling data revealed a nearly linear decline in SOH over time, with capacity retention averaging 90.5% after 4,000 cycles. The charging curves showed consistent voltage plateaus in the 3.25 V to 3.40 V range, confirming the relevance of the selected health factors. Incremental capacity analysis (ICA) further validated that the peaks in the dQ/dV curves occurred within this voltage window, indicating strong ties to electrochemical degradation. The health factors ΔV1, ΔV2, and ΔV3 were computed for each cycle and input into the GA-LSTM model. For comparison, we also evaluated an unoptimized LSTM model and a GA-optimized backpropagation (GA-BP) neural network. The models were assessed using mean squared error (MSE) and mean absolute percentage error (MAPE), defined as:
$$ y_{\text{MSE}} = \frac{1}{n} \sum_{i=1}^{n} (y_i – \hat{y}_i)^2 $$
$$ y_{\text{MAPE}} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{y_i – \hat{y}_i}{y_i} \right| \times 100\% $$
where y_i is the actual SOH, ŷ_i is the predicted SOH, and n is the number of samples. The GA-LSTM model achieved the lowest errors, with an MSE of 0.000142 and MAPE of 1.09%, representing reductions of 48.3% in MSE and 74.1% in MAPE compared to the unoptimized LSTM. The GA-BP model performed worse, highlighting the superiority of LSTM for sequential data. Table 2 summarizes the error metrics for each model.
| Model | MSE | MAPE (%) |
|---|---|---|
| Unoptimized LSTM | 0.000548 | 2.11 |
| GA-BP | 0.002820 | 1.33 |
| GA-LSTM | 0.000142 | 1.09 |
The SOH predictions from the GA-LSTM model closely followed the actual values across all cycles, with deviations within acceptable limits. In contrast, the unoptimized LSTM showed larger fluctuations, particularly in later cycles where degradation accelerated. The GA-BP model, while improved over standard BP, still exhibited higher errors due to its inability to capture temporal dependencies. These results demonstrate that the combination of carefully selected health factors and GA optimization significantly enhances the accuracy of SOH assessment for energy storage cells. The method’s robustness is further evidenced by its performance on the independent validation set, where it maintained low errors without retraining.
Discussion
The findings underscore the effectiveness of using voltage-based health factors and GA-LSTM for SOH estimation in energy storage cells. The high correlation coefficients of the selected intervals (e.g., 0.9028 for the 30-minute segment before 3.40 V) confirm that these features encapsulate critical aging mechanisms, such as solid electrolyte interface growth and lithium inventory loss. The LSTM’s recurrent nature allows it to model the progressive decline in SOH by leveraging historical data, while GA optimization ensures optimal network configuration. This approach outperforms traditional methods by avoiding the need for complex physical models and adapting to individual cell variations. Moreover, the use of fragment data from routine charging makes it practical for implementation in battery management systems (BMS) without requiring additional sensors or disruptive tests.
However, limitations exist, such as the dependency on consistent charging protocols and temperature conditions. Future work could explore adaptive health factors that account for varying operational profiles and integrate real-time data streams. Additionally, extending the method to other chemistries, such as nickel-manganese-cobalt (NMC) energy storage cells, would enhance its generality. Despite these challenges, the proposed framework offers a scalable solution for monitoring the health of energy storage cells in diverse applications, contributing to longer lifetimes and improved safety.
Conclusion
In this study, we developed a data-driven method for assessing the SOH of energy storage cells based on operational data. By extracting voltage changes from specific charging segments as health factors and employing a GA-optimized LSTM network, we achieved accurate and reliable SOH estimates. Experimental results on LiFePO4 cells demonstrated significant error reductions compared to unoptimized models, with MSE and MAPE values of 0.000142 and 1.09%, respectively. The method’s practicality stems from its use of easily obtainable data and its adaptability to sequential degradation patterns. This research advances the field of battery health monitoring and provides a foundation for future innovations in predictive maintenance for energy storage systems. As the demand for reliable energy storage grows, such techniques will play a vital role in ensuring the sustainability and efficiency of modern power networks.
