The pursuit of carbon emission peak and carbon neutrality has underscored the critical role of energy storage systems in facilitating the global energy transition. Among various technologies, the lithium-ion battery stands out due to its high energy density, long cycle life, and environmental friendliness, making it a cornerstone for modern energy storage solutions. The safe, efficient, and reliable operation of these systems hinges on the Battery Management System (BMS), whose core functions include the accurate and real-time estimation of two pivotal states: the State of Charge (SOC) and the State of Health (SOH). SOC indicates the available energy remaining in the lithium-ion battery, analogous to a fuel gauge, while SOH reflects its degradation level and remaining useful life. Accurate SOC estimation is essential for preventing overcharge/discharge, optimizing energy dispatch, and ensuring user confidence. Concurrently, precise SOH estimation is vital for predictive maintenance, safety assurance, and determining the residual value of the lithium-ion battery pack. However, these states are not directly measurable and must be inferred from other operational parameters.

Traditional estimation methods can be broadly categorized into model-based and data-driven approaches. Model-based methods, such as those employing Equivalent Circuit Models (ECM) combined with filters like the Kalman Filter, require precise parameter identification and often struggle with the complex, nonlinear aging dynamics of a lithium-ion battery. While data-driven methods, particularly those leveraging deep learning, have shown remarkable capability in modeling complex nonlinearities without explicit physical models, they often treat SOC and SOH estimation as separate tasks. This decoupling leads to a significant drawback: the estimation of SOC typically relies on the nominal capacity of the lithium-ion battery. As the battery degrades (SOH decreases), using the nominal capacity introduces growing errors into the SOC estimate. Therefore, a joint estimation framework that dynamically updates the available capacity used in SOC calculation is paramount for maintaining high accuracy throughout the entire lifespan of the lithium-ion battery.
This article presents a novel, hybrid data-driven framework for the joint estimation of SOC and SOH in lithium-ion batteries. The core innovation lies in the synergistic combination of three powerful algorithms: a Convolutional Neural Network-Bidirectional Long Short-Term Memory (CNN-BiLSTM) network for SOH estimation and a Light Gradient Boosting Machine (LightGBM) model for SOC estimation, with the estimated SOH serving as a critical input feature for the SOC model. The methodology begins with the construction and rigorous selection of health indicators from easily measurable operational data, such as voltage and temperature profiles. A dedicated CNN-BiLSTM model then processes these indicators to provide an accurate SOH estimate. This SOH value, alongside real-time voltage, current, and temperature measurements, is fed into a highly efficient LightGBM model to generate the final SOC estimate. This integrated approach ensures that the SOC estimation is always informed by the current health state of the lithium-ion battery, leading to superior accuracy and robustness, especially during advanced stages of degradation. Validation using real-world battery cycling datasets confirms that the proposed framework achieves high estimation precision while offering significant advantages in computational efficiency.
Health Indicator Construction and Factor Extraction
The performance degradation of a lithium-ion battery is a complex process manifesting through changes in internal electrochemical parameters like capacity fade and internal resistance increase. While direct measurement of these internal states is impractical in most applications, their effects are observable in external operational data. Therefore, constructing effective health indicators (HIs) from measurable parameters like voltage, current, temperature, and time is a foundational step for data-driven SOH estimation. The SOH is commonly defined based on capacity fade:
$$ \text{SOH} = \frac{C_{\text{current}}}{C_{\text{nominal}}} \times 100\% $$
where $C_{\text{current}}$ is the current maximum available capacity and $C_{\text{nominal}}$ is the nominal capacity of the fresh lithium-ion battery.
Based on the analysis of charging profiles, several candidate health indicators are proposed, which sensitively reflect the aging process of a lithium-ion battery:
- HF1: Total Charging Time per Cycle. As internal resistance increases, the constant-current (CC) charging phase may lengthen.
- HF2: Ratio of Charge to Discharge Time. This ratio captures asymmetrical changes in charge/discharge kinetics due to aging.
- HF3: Average Temperature during Charge. Increased internal resistance leads to greater Joule heating, raising the average temperature.
- HF4: Temperature-Time Integral during Charge. This indicator accounts for both the magnitude and duration of temperature rise.
- HF5: Average Voltage during Charge. The shape of the voltage curve shifts with degradation.
- HF6: Voltage-Time Integral during Charge. This provides a consolidated measure of the voltage profile change.
- HF7: Accumulated Cycle Count. A direct but crude indicator of aging progression.
To distill the most informative and relevant features from this set, the Kendall’s Tau (KT-a) rank correlation coefficient is employed. Unlike Pearson’s correlation, Kendall’s Tau is a non-parametric statistic that measures the ordinal association between two variables, making it robust to outliers and non-normal data distributions—common characteristics in lithium-ion battery aging data. The KT-a coefficient between a health indicator $X$ and the SOH $Y$ over $N$ cycles is calculated as:
$$ \tau_a = \frac{2(C – D)}{N(N-1)} $$
where $C$ is the number of concordant pairs (pairs of observations that have the same order in both variables), and $D$ is the number of discordant pairs. The value of $\tau_a$ ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation). A strong monotonic relationship with SOH is indicated by an absolute value close to 1.
Applying this analysis to a public battery degradation dataset yields the following correlation results for the candidate indicators:
| Health Indicator | Description | |Kendall’s Tau| |
|---|---|---|
| HF1 | Total Charging Time | > 0.99 |
| HF2 | Charge/Discharge Time Ratio | ~0.96 |
| HF3 | Average Charge Temperature | ~0.97 |
| HF4 | Temperature-Time Integral | ~0.97 |
| HF5 | Average Charge Voltage | ~0.98 |
| HF6 | Voltage-Time Integral | > 0.99 |
| HF7 | Cycle Count | > 0.99 |
Based on a threshold of |KT| > 0.99, three health factors (HFs) are selected for the subsequent SOH estimation model: HF1 (Total Charging Time), HF6 (Voltage-Time Integral), and HF7 (Cycle Count). These factors provide a robust, multi-dimensional representation of the lithium-ion battery‘s aging state, effectively capturing temporal, electrical, and cumulative degradation effects.
Proposed Hybrid Joint Estimation Framework
The proposed framework is architected to leverage the complementary strengths of different machine learning paradigms. The SOH estimation task, which involves learning long-term temporal degradation trends from sequences of health factors, is assigned to a deep learning model combining CNN and BiLSTM. The SOC estimation task, which requires rapid, precise mapping from high-frequency, multi-dimensional operational data (including the estimated SOH) to a single value, is assigned to the efficient and powerful LightGBM algorithm. The graphical overview of this joint estimation model is presented below, illustrating the flow of information from raw data to final state estimates.
[A conceptual model diagram would be inserted here, showing data flow from Voltage/Current/Temperature & Health Factors to the CNN-BiLSTM SOH estimator, whose output feeds into the LightGBM SOC estimator alongside V/I/T data.]
SOH Estimation via CNN-BiLSTM Network
The SOH estimation model is designed to capture both local patterns and long-range dependencies in the sequence of health factor vectors $ \mathbf{HF}_t = [HF1_t, HF6_t, HF7_t] $ over cycles $t$.
- CNN Module: The Convolutional Neural Network acts as a sophisticated feature extractor. It applies one-dimensional convolutional filters to the input sequence to automatically learn and highlight local correlations and patterns among the health factors across a short window of cycles. This operation can be represented as:
$$ \mathbf{Y}_{\text{conv}} = \sigma(\mathbf{W} * \mathbf{HF} + \mathbf{b}) $$
where $*$ denotes the convolution operation, $\mathbf{W}$ and $\mathbf{b}$ are learnable weights and biases, and $\sigma$ is a non-linear activation function like ReLU. Subsequent pooling layers reduce dimensionality and enhance feature invariance. - BiLSTM Module: The features extracted by the CNN are then fed into a Bidirectional Long Short-Term Memory network. Unlike a standard LSTM that processes sequences only in the forward direction, a BiLSTM consists of two separate LSTM layers: one processing the sequence from past to future, and the other from future to past. This allows the model to learn context from both earlier and later cycles simultaneously, which is crucial for understanding the progressive and often nonlinear degradation trajectory of a lithium-ion battery. The final hidden states from both directions are concatenated to form a comprehensive context vector.
- Output: This context vector is passed through fully connected layers to produce the estimated SOH value for the current cycle. The combined CNN-BiLSTM architecture effectively models the spatial-temporal degradation features of the lithium-ion battery.
SOC Estimation via LightGBM
For SOC estimation, we employ LightGBM, a highly efficient gradient boosting framework. Its superiority lies in its speed and low memory footprint, achieved through techniques like Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB), making it ideal for the high-dimensional, rapid inference requirements of a BMS. The model learns a function $F(\mathbf{x})$ that maps input features $\mathbf{x}$ to the target SOC.
The model is built in an additive manner over $M$ weak learners (decision trees):
$$ F_M(\mathbf{x}) = F_0(\mathbf{x}) + \sum_{m=1}^{M} \eta h_m(\mathbf{x}) $$
where $F_0$ is the initial estimate (e.g., mean SOC), $\eta$ is the learning rate, and $h_m$ is a decision tree constructed in the $m$-th iteration to fit the negative gradient (pseudo-residuals) of the loss function $L$:
$$ r_{im} = -\left[\frac{\partial L(\text{SOC}_i, F_{m-1}(\mathbf{x}_i))}{\partial F_{m-1}(\mathbf{x}_i)}\right] $$
The key innovation in our joint framework is the composition of the input feature vector $\mathbf{x}_i$ for the LightGBM model. For a given time step $i$, it includes:
$$ \mathbf{x}_i = [V_i, I_i, T_i, \widehat{\text{SOH}}_i] $$
where $V_i$, $I_i$, and $T_i$ are the instantaneous voltage, current, and temperature measurements, and $\widehat{\text{SOH}}_i$ is the SOH estimate from the CNN-BiLSTM model corresponding to the current cycle. By integrating the estimated health state directly into the SOC estimation process, the model dynamically adjusts its mapping to account for the lithium-ion battery‘s present capacity and internal condition, thereby eliminating a primary source of error in standalone SOC estimators.
Experimental Validation and Results Analysis
The proposed framework is validated using a publicly available dataset comprising multiple commercial lithium-ion battery cells cycled under various conditions until end-of-life. Cells are subjected to repeated charge-discharge cycles, with voltage, current, temperature, and capacity recorded throughout. The dataset is partitioned into training and testing sets to evaluate model performance.
Experimental Setup and Metrics
The CNN-BiLSTM model for SOH estimation is trained on sequences of health factors, with hyperparameters (e.g., number of filters, LSTM units, learning rate) optimized via a validation set. The LightGBM model for SOC estimation is trained on the multivariate time-series data, with its hyperparameters (e.g., number of leaves, learning rate, feature fraction) tuned using Bayesian optimization. All input features are normalized to the [0, 1] range to ensure stable and fast training.
Performance is quantitatively assessed using two standard metrics:
- Root Mean Square Error (RMSE): Measures the standard deviation of the estimation errors.
$$ \text{RMSE} = \sqrt{\frac{1}{N} \sum_{j=1}^{N} (y_j – \hat{y}_j)^2 } $$ - Maximum Absolute Error (MAE): Captures the worst-case estimation error.
$$ \text{MAE} = \max(|y_j – \hat{y}_j|) $$
where $y_j$ is the true state value (SOC or SOH), $\hat{y}_j$ is the estimated value, and $N$ is the number of samples.
Ablation Study: Joint vs. Independent Estimation
To demonstrate the necessity of joint estimation, we first compare the performance of our proposed joint framework against a “standalone” SOC estimation method. The standalone method uses an identical LightGBM model but is trained and tested without the SOH ($\widehat{\text{SOH}}_i$) as an input feature, effectively assuming a fixed nominal capacity. The results for two representative cells are summarized below.
| Battery Cell | Estimation Method | Avg. SOC RMSE (%) | Max. SOC MAE (%) |
|---|---|---|---|
| Cell #3 | Proposed Joint Estimation | 0.51 | 3.63 |
| Standalone SOC Estimation | 4.27 | 16.00 | |
| Cell #6 | Proposed Joint Estimation | 0.61 | 3.83 |
| Standalone SOC Estimation | 4.07 | 17.20 |
The results are unequivocal. The joint estimation framework reduces the average SOC RMSE by nearly an order of magnitude, from over 4% to approximately 0.5-0.6%. More critically, the maximum absolute error is drastically reduced from over 16% to below 4%. This demonstrates that ignoring SOH leads to progressively larger SOC errors as the lithium-ion battery degrades, while the joint framework successfully compensates for this degradation, maintaining high accuracy throughout the battery’s lifespan.
Performance Comparison with State-of-the-Art
We further compare our LightGBM-CNN-BiLSTM framework with a recent, sophisticated deep learning-based joint estimation method from the literature (used as a baseline). The comparison focuses on estimation accuracy and computational efficiency.
| Metric | Proposed Method | Baseline Method [27] |
|---|---|---|
| Avg. SOC RMSE (%) | 0.57 | 0.64 |
| Avg. SOH RMSE (%) | ~1.5 | ~1.8 |
| Model Training Time (s) | 62.16 | 1633.01 |
| Goodness-of-Fit (R²) | 0.9986 | 0.9983 |
The proposed method achieves comparable, and in some cases slightly better, estimation accuracy than the baseline, as evidenced by the lower average RMSE and higher R² value. The most striking advantage is in computational efficiency. Our framework trains in just over a minute, which is 96.19% faster than the baseline method that requires nearly half an hour. This dramatic reduction in training time is attributed to the efficiency of the LightGBM algorithm for SOC regression and the streamlined flow of the hybrid architecture. Fast training is crucial for model development, hyperparameter tuning, and potential online adaptation in real-world BMS applications for lithium-ion battery systems.
Visualization of Estimation Results
The superior performance of the joint framework is visually apparent. The following descriptions contrast the estimation trajectories against the true values for a lithium-ion battery at different health states. [Note: Actual plots would show time-series data.]
- High SOH (~100%): Both the proposed and baseline methods track the true SOC closely during a charge cycle. The standalone method already shows minor deviations.
- Medium SOH (~85%): The proposed method continues to follow the true SOC accurately. The standalone method’s estimate begins to diverge significantly, especially in the mid-to-high SOC range, leading to larger errors.
- Low SOH (~75%): At this advanced stage of degradation, the error of the standalone method becomes very pronounced. In contrast, the proposed joint framework, informed by the accurate SOH estimate from the CNN-BiLSTM model, maintains a reliable SOC trajectory, with errors only slightly elevated compared to the high-SOH case. The SOH estimate itself remains stable and accurate until the very late stages of life, where some error increase is observed but is managed effectively within the joint framework.
Conclusion
Accurate and robust state estimation is a fundamental requirement for unlocking the full potential and ensuring the safe operation of lithium-ion battery energy storage systems. This article has presented a novel, hybrid data-driven framework for the joint estimation of State of Charge (SOC) and State of Health (SOH). The framework intelligently combines a CNN-BiLSTM network for capturing long-term temporal degradation features and a LightGBM model for efficient, high-precision SOC regression. The critical link is the feedforward of the estimated SOH as a dynamic input to the SOC estimator, allowing it to adapt to the battery’s current capacity.
Extensive validation on real-world cycling data leads to two primary conclusions. First, the joint estimation paradigm is essential. Compared to standalone SOC estimation, our framework reduces the average RMSE by approximately an order of magnitude (from >4% to ~0.55%) and cuts the maximum absolute error by more than 75%. This proves that compensating for capacity fade in real-time is crucial for maintaining SOC accuracy throughout the entire service life of the lithium-ion battery. Second, the proposed LightGBM-CNN-BiLSTM architecture offers an excellent balance between accuracy and efficiency. It matches or surpasses the accuracy of more complex deep learning baselines while achieving a staggering 96% reduction in model training time. This computational efficiency makes the framework particularly attractive for practical BMS implementations, where resources are often constrained, and models may need periodic updating.
Future work will focus on enhancing the framework’s robustness and generalizability. This includes testing under more diverse and dynamic load profiles, investigating transfer learning techniques to adapt models across different lithium-ion battery chemistries and formats with minimal data, and exploring the fusion of this data-driven approach with simplified physical models to incorporate fundamental electrochemical knowledge, potentially further improving estimation stability and interpretability.
