Prediction of Lithium-ion Battery Pack SOH Based on Consistency Check

Lithium-ion batteries have become the cornerstone of modern energy storage systems, powering applications from portable electronics to electric vehicles (EVs) and large-scale grid storage. Their high energy density, long cycle life, and decreasing cost make them an ideal choice. However, to meet the stringent energy and power demands of these applications, thousands of individual lithium-ion battery cells are connected in series and parallel to form large-scale battery packs. The performance and longevity of such a pack are not simply the sum of its parts; they are critically dependent on two intertwined factors: the inherent aging of each lithium-ion battery cell and the consistency among them.

The aging of a lithium-ion battery is a complex electrochemical degradation process influenced by usage patterns and environmental conditions. Furthermore, due to inevitable manufacturing tolerances, temperature gradients, and uneven aging during operation, the parameters of individual lithium-ion batteries within a pack gradually diverge. This phenomenon, known as inconsistency, leads to capacity imbalance, where the weakest cell dictates the usable capacity of the entire pack—a manifestation of the “bucket effect.” Consequently, the pack’s State of Health (SOH), typically defined as the ratio of its current maximum available capacity to its initial capacity, degrades faster than the average SOH of its constituent cells. Accurate prediction of the SOH for a lithium-ion battery pack is therefore paramount for ensuring safety, reliability, and optimizing maintenance schedules.

Traditional approaches for pack-level SOH estimation often simplify the system by modeling the entire lithium-ion battery pack as an equivalent single cell. While computationally efficient, these methods fail to capture the nuanced effects of cell-to-cell variations, leading to potential inaccuracies. On the other hand, methods that model every single cell in detail become computationally prohibitive and data-intensive for large-scale packs comprising hundreds or thousands of lithium-ion batteries. This creates a critical need for a balanced methodology that achieves high prediction accuracy without the burden of monitoring every cell’s full history.

Our study addresses this challenge by proposing a novel SOH prediction framework for lithium-ion battery packs based on consistency check and outlier cell screening. The core premise is that the pack’s SOH is primarily constrained by a subset of cells that age faster or exhibit significant behavioral deviation from the group. By identifying these “outlier” cells through statistical consistency analysis of operational data, we can focus modeling efforts on them, dramatically reducing data and computational requirements. The method involves three key stages: First, we develop a robust data-driven model to estimate the SOH of a single lithium-ion battery from easily extractable features. Second, we apply a consistency check algorithm to pack cycling data to dynamically identify the most inconsistent cells. Finally, we predict the pack SOH by applying the single-cell model only to this small subset of outlier cells. This approach provides a practical and accurate solution for managing the health of complex lithium-ion battery energy storage systems.

Development of a Single Lithium-ion Battery SOH Model

The foundation of our pack SOH prediction strategy is a reliable model for estimating the health of an individual lithium-ion battery. We adopt a data-driven approach, leveraging features extracted from standard constant-current (CC) charging voltage curves, which are readily available from the Battery Management System (BMS).

Feature Extraction and Selection

As a lithium-ion battery ages, its internal resistance increases and active lithium inventory decreases, causing its charging voltage curve to shift. We systematically extract 15 candidate features from the CC charging phase to capture these changes. The features are categorized into three groups, all focused on the voltage range between 3.4 V and 3.8 V, where the electrochemical behavior is stable and informative.

1. Capacity-Related Features (F1-F5): These represent the time taken to charge through fixed, consecutive voltage intervals (e.g., 3.4-3.6 V, 3.6-3.8 V), which is directly proportional to capacity in that interval.
$$ \text{Feature}_\text{cap} = \Delta t(V_i \to V_{i+1}) $$

2. Resistance-Indicative Features (F6-F10): These capture the voltage drop over a fixed time period, indicative of internal resistance growth.
$$ \text{Feature}_\text{res} = V(t) – V(t + 1200 \text{ s}) \quad \text{at starting points like } V=3.4\text{V} $$

3. Curve Shape Features (F11-F15): These are the average voltage over a fixed time window from a specific starting voltage, reflecting the overall shape change of the curve.
$$ \text{Feature}_\text{shape} = \frac{1}{N}\sum_{t}^{t+1200\text{s}} V(\tau) \quad \text{from starting points like } V=3.4\text{V} $$

To select the most relevant features, we calculate the Pearson correlation coefficient between each feature and the actual SOH for multiple single lithium-ion batteries. The Pearson coefficient $ r $ is defined as:
$$ r = \frac{\sum_{i=1}^{n}(X_i – \bar{X})(Y_i – \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i – \bar{X})^2 \cdot \sum_{i=1}^{n}(Y_i – \bar{Y})^2}} $$
where $ X_i $ and $ Y_i $ are the feature value and SOH at cycle $ i $, and $ \bar{X} $ and $ \bar{Y} $ are their respective means. Features with $ |r| > 0.99 $ across all tested cells are considered highly correlated. Based on this analysis, three features were shortlisted for the final model selection: F3 (a capacity-related feature), F8 (a resistance-indicative feature starting at 3.4V), and F13 (a shape feature starting at 3.4V).

Feature Code	Description	Avg. \|Pearson r\|
F3	Charging time for interval 3.6V-3.8V	0.992
F8	Voltage drop from 3.4V over 1200s	0.995
F13	Average voltage from 3.4V over 1200s	0.991

Model Building and Comparison

We employ Gaussian Process Regression (GPR) to build the mapping from the selected features to the SOH of a lithium-ion battery. GPR is a non-parametric, probabilistic model that not only provides predictions but also quantifies the uncertainty associated with them, which is valuable for health assessment. The 95% confidence interval for a prediction $ y_* $ is given by:
$$ [\bar{y}_* – 1.96 \times \sqrt{\text{var}(y_*)}, \quad \bar{y}_* + 1.96 \times \sqrt{\text{var}(y_*)} ] $$
To determine the optimal feature set, we train and test GPR models using all possible combinations of the three shortlisted features. The model using the combination [F8, F13] was chosen as it offered an excellent balance between high accuracy (low error) and narrow prediction uncertainty bands.

Model	Feature Set	R²	MAE (%)	RMSE (%)
GPR-1	[F8]	0.992	0.36	0.46
GPR-2	[F8, F13]	0.991	0.39	0.50
GPR-3	[F3, F8, F13]	0.990	0.45	0.53

We further validated the chosen GPR model against other common algorithms. The GPR model consistently outperformed Support Vector Regression (SVR) and Long Short-Term Memory (LSTM) networks in terms of stability and accuracy for single lithium-ion battery SOH estimation, with a Root Mean Square Error (RMSE) of only 0.50%. This robust single-cell model forms the core predictor that will be applied to the pack’s outlier cells.

Consistency Check Method for Outlier Cell Identification

Monitoring every single lithium-ion battery in a large pack is impractical. Our method reduces this burden by identifying only the cells that significantly deviate from the group’s behavior, as these are the most likely candidates to limit pack capacity. We perform this identification through a statistical consistency check on cell voltages during operation.

The process begins by selecting a stable segment of the pack’s CC charging data, specifically the 600-second window after the total pack voltage reaches a certain point (e.g., 68V for our test pack). For this window, we construct a feature vector for each series-connected lithium-ion battery string (or cell in a simple series configuration). The feature vector includes both dimensional and dimensionless time-domain characteristics calculated from the voltage data series $ \{x_1, x_2, …, x_n\} $:

Dimensional Features: Maximum ($x_{\text{max}}$), Minimum ($x_{\text{min}}$), Peak-to-Peak ($x_p = x_{\text{max}} – x_{\text{min}}$), Mean ($\mu$), Root Mean Square ($X_{\text{rms}}$), Variance ($\sigma^2$).
Dimensionless Features: Kurtosis ($K$), Skewness ($S$), Crest Factor ($k_a$).

These are calculated as follows:
$$ \mu = \frac{1}{n}\sum_{i=1}^{n} x_i, \quad X_{\text{rms}} = \sqrt{\frac{1}{n}\sum_{i=1}^{n} x_i^2}, \quad \sigma^2 = \frac{1}{n}\sum_{i=1}^{n} (x_i – \mu)^2 $$
$$ K = \frac{\frac{1}{n}\sum_{i=1}^{n} (x_i – \mu)^4}{\left[\frac{1}{n}\sum_{i=1}^{n} (x_i – \mu)^2\right]^2}, \quad S = \frac{\frac{1}{n}\sum_{i=1}^{n} (x_i – \mu)^3}{\left[\frac{1}{n}\sum_{i=1}^{n} (x_i – \mu)^2\right]^{3/2}}, \quad k_a = \frac{x_p}{X_{\text{rms}}} $$

Next, we compute the distance correlation matrix $ \mathbf{Q_A} $ among all the cell feature vectors. This matrix quantitatively represents the pairwise behavioral similarity between every lithium-ion battery in the pack during that cycle. A cell is considered an “outlier” or inconsistent if its behavior is largely dissimilar from most others in the pack.

We investigate two thresholding strategies to declare a lithium-ion battery an outlier:

1. Fixed Correlation Threshold ($T_r$): For each cell $i$, we count how many other cells $j$ have a correlation $r_{ij} > T_r$ (e.g., $T_r = 0.9$). If the number of similar cells is below a certain percentage (e.g., 60%) of the total, cell $i$ is labeled an outlier.

2. Fixed Count Threshold ($T_{cell}$): We sort the cells based on their average correlation with others (or a similar metric) from lowest to highest. A fixed number $T_{cell}$ of the lowest-ranking cells (e.g., 5%, 10%, 20% of the pack) are designated as the outlier set. This method directly controls the computational budget for the subsequent prediction step.

This consistency check is performed periodically (e.g., every cycle or every N cycles), allowing the set of monitored outlier lithium-ion batteries to update dynamically as the pack ages and inconsistency patterns evolve.

Integrated Framework for Lithium-ion Battery Pack SOH Prediction

The complete framework for predicting the SOH of a lithium-ion battery pack integrates the single-cell model and the consistency check, creating an efficient and accurate estimation pipeline.

Step 1: Model Initialization. The GPR model mapping features [F8, F13] to SOH is trained offline using historical data from single lithium-ion battery aging tests. This model is fixed and used for all subsequent predictions.

Step 2: Online Data Acquisition and Outlier Screening. During the regular operation of the lithium-ion battery pack, voltage data from all cells during a CC charging phase is recorded. The consistency check algorithm, using one of the threshold strategies, is applied to this data to identify the current set of $M$ outlier cells ($M \ll$ total cells).

Step 3: Feature Extraction for Outliers. For each of the $M$ identified outlier lithium-ion batteries, the specific features F8 and F13 are extracted from their individual voltage profiles within the designated analysis window.

Step 4: Pack SOH Prediction. The extracted features for each outlier cell are fed into the pre-trained GPR model, yielding an SOH estimate for each of those $M$ cells. According to the “bucket effect,” the health of the entire lithium-ion battery pack is constrained by its weakest cell. Therefore, the pack SOH is predicted as the minimum of the SOH estimates from the outlier set:
$$ \text{SOH}_{\text{pack}} = \min(\text{SOH}_{\text{outlier}_1}, \text{SOH}_{\text{outlier}_2}, …, \text{SOH}_{\text{outlier}_M}) $$
This step requires running the model only $M$ times per estimation cycle.

Step 5: Validation and Metrics. The predicted pack SOH is compared against the experimentally measured pack capacity. The performance is evaluated using standard metrics:
$$ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i – \hat{y}_i)^2}, \quad \text{MAE} = \frac{1}{n}\sum_{i=1}^{n}|y_i – \hat{y}_i|, \quad R^2 = 1 – \frac{\sum_{i=1}^{n}(y_i – \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i – \bar{y})^2} $$
where $y_i$ is the true SOH, $\hat{y}_i$ is the predicted SOH, and $\bar{y}$ is the mean of the true SOH values.

Results and Discussion

The proposed method was validated using experimental data from a 2-parallel 20-series lithium-ion battery pack cycled to end-of-life. The single-cell GPR model was trained on separate single-cell data.

First, we confirmed the evolution of inconsistency. The consistency check clearly showed that while all lithium-ion batteries were similar at the beginning of life, distinct outliers emerged as cycling progressed. For instance, by the mid-life point, several cells exhibited significantly lower correlation with the group, and by end-of-life, one cell was a pronounced outlier, visually confirming the rationale for our approach.

Second, we evaluated the impact of the outlier selection threshold on pack SOH prediction accuracy. Using the fixed count threshold ($T_{cell}$), we compared results when monitoring 5%, 10%, 20%, and 100% of the cells. The results demonstrate a clear trade-off:

Outlier Set Size	R²	MAE (%)	RMSE (%)
5% (1 cell)	0.901	1.27	1.57
10% (2 cells)	0.978	0.65	0.74
20% (4 cells)	0.984	0.54	0.64
100% (20 cells)	0.981	0.59	0.68

The key finding is that monitoring only 20% of the lithium-ion batteries (4 out of 20 series strings) yielded the best prediction accuracy, with an RMSE of 0.64%. This performance slightly surpassed the scenario where all cells were monitored (RMSE 0.68%). This counter-intuitive result can be attributed to the fact that focusing on the most inconsistent, and therefore most critical, lithium-ion batteries filters out noise from the majority of normally aging cells, allowing the model to more accurately capture the limiting factor. It also confirms that the pack’s SOH trajectory is effectively governed by a small subset of cells.

Furthermore, the efficiency gain is substantial. The 20% monitoring strategy reduces the feature extraction and model inference workload by 80% compared to the full-monitoring approach. This makes the method highly scalable and suitable for implementation in real-world BMS for large lithium-ion battery packs.

The proposed GPR-based single-cell model also proved superior when applied within this framework. Compared to using SVR or LSTM models on the 20% outlier set, the GPR model provided more stable and accurate predictions across the entire aging cycle of the lithium-ion battery pack, with all relative errors remaining within 1.5%.

Conclusion

In this work, we have presented a novel and practical method for predicting the State of Health of lithium-ion battery packs. The method strategically addresses the dual challenges of cell inconsistency and computational complexity. By developing a robust Gaussian Process Regression model that estimates single lithium-ion battery SOH from two highly correlated charging curve features, and then deploying this model selectively on a small, dynamically identified set of outlier cells, we achieve high-accuracy pack SOH prediction with drastically reduced data processing requirements.

The experimental validation on a 2P20S lithium-ion battery pack demonstrates that the method is highly effective. The optimal configuration, which monitors only 20% of the cells identified via consistency check, achieved a prediction RMSE of 0.64%, outperforming the scenario where all cells are monitored. This approach ensures that the management system focuses resources on the most critical lithium-ion batteries in the pack—those most likely to cause premature failure or capacity limitation.

The implications of this research are significant for the management of large-scale lithium-ion battery energy storage systems. It provides a pathway to accurate, real-time health assessment without the prohibitive cost of full individual cell analytics. Future work will focus on adapting the consistency check and feature extraction methods to more complex, real-world cycling profiles and further optimizing the algorithm for embedded implementation in battery management systems.