A Comprehensive Fault Diagnosis Framework for Lithium-Ion Batteries Integrating Distribution Entropy and Metaheuristic-Optimized Machine Learning

The widespread adoption of lithium-ion batteries as the primary energy storage solution for electric vehicles (EVs) is underpinned by their superior energy density, long cycle life, and low self-discharge rate. However, safety incidents, including fires and explosions, pose a significant challenge to their reliable deployment. These failures often originate from internal faults within the battery system, which can be broadly categorized into sudden faults and progressive faults. Sudden faults, like severe internal short circuits (ISC), lead to rapid thermal runaway. Progressive faults, however, develop gradually over time, often due to aging mechanisms or minor manufacturing defects that slowly evolve into more serious conditions like incipient ISC. The early-stage voltage or temperature signatures of such progressive faults are exceedingly subtle and can be easily masked by operational noise, making timely diagnosis critically important yet technically difficult.

Traditional fault diagnosis methods for lithium-ion batteries encompass knowledge-based, model-based, and data-driven approaches. Knowledge-based methods rely heavily on expert rules and predefined thresholds, which lack adaptability. Model-based methods require highly accurate mathematical models of the lithium-ion battery, which are complex to derive and may not generalize well across different battery types and operating conditions. In contrast, data-driven methods, leveraging the power of machine learning (ML) and artificial intelligence (AI), have gained prominence. These methods learn the intricate mapping between measurable battery signals (e.g., voltage, current, temperature) and the underlying state of health or fault condition directly from data, offering strong potential for detecting early-stage anomalies. The core challenge lies in extracting robust, informative features from noisy operational data and constructing a highly accurate and generalizable diagnostic model.

This article presents a novel, integrated fault diagnosis methodology for lithium-ion batteries, specifically targeting the early detection of progressive internal short circuits. The proposed framework combines a powerful nonlinear dynamic analysis tool—Distribution Entropy (DE)—with an advanced machine learning model—Random Forest (RF)—whose hyperparameters are meticulously tuned using an improved metaheuristic algorithm. The effectiveness of this approach, termed MCPO-RF, is rigorously validated using both laboratory-simulated fault data and real-world vehicle data.

1. Theoretical Foundation: Distribution Entropy for Feature Extraction

Entropy measures are widely used in signal processing to quantify the complexity, irregularity, or predictability of a time series. For fault diagnosis in lithium-ion batteries, entropy can capture the subtle changes in voltage or temperature profiles induced by an incipient fault. Sample Entropy (SampEn) is a common choice but suffers from a strong dependency on its parameters (embedding dimension and tolerance) and inconsistency for short data lengths. Distribution Entropy (DistEn) addresses these limitations by evaluating the distribution of distances between state vectors in the reconstructed phase space, offering greater robustness and consistency.

The calculation of Distribution Entropy for a discrete voltage time series $X = \{x_1, x_2, …, x_N\}$ involves the following steps:

Step 1: Phase Space Reconstruction. For a given embedding dimension $m$, reconstruct the phase space vectors:
$$Y(i) = \{x(i), x(i+1), …, x(i+m-1)\}, \quad i = 1, 2, …, N-m+1$$

Step 2: Distance Matrix Construction. Compute the Chebyshev distance between all pairs of vectors $Y(i)$ and $Y(j)$:
$$d_{ij} = \max_{k=0,1,…,m-1} (|x(i+k) – x(j+k)|), \quad i,j = 1,2,…,N-m+1$$
The collection of all $d_{ij}$ (excluding the diagonal $i=j$ to avoid self-matching) forms the distance matrix $D$.

Step 3: Empirical Probability Density Estimation. The elements of $D$ are discretized using a histogram with $T$ bins. Let the frequency count for bin $t$ be $p(t)$. The empirical probability density $P_d(t)$ for each bin is:
$$P_d(t) = \frac{p(t)}{\sum_{t=1}^{T} p(t)}$$

Step 4: Entropy Calculation. Finally, the Distribution Entropy is computed based on Shannon’s entropy formula:
$$DE(m, T) = -\frac{1}{\log_2(T)} \sum_{t=1}^{T} P_d(t) \log_2(P_d(t))$$

The value of $DE$ is normalized between 0 and 1. A key advantage is its reliance on only two parameters ($m$ and $T$), making it less sensitive to parameter selection compared to SampEn and thus more suitable for characterizing the voltage signals of a lithium-ion battery under varying states.

Table 1: Key Parameters in Distribution Entropy Calculation
Parameter	Description	Typical Setting / Role
$m$	Embedding Dimension	Defines the length of the state vector for phase space reconstruction. Usually a small integer (e.g., 2 or 3).
$T$	Number of Histogram Bins	Determines the discretization resolution for the distance distribution. Affects the entropy estimation stability.
$N$	Length of Input Time Series	Window length of the voltage signal segment. Must be sufficiently long to capture dynamics but short for real-time application.
$DE$	Distribution Entropy Value	Output feature. A lower value indicates more regularity; a higher value indicates more complexity/irregularity in the signal.

2. Diagnostic Model: Random Forest and Its Optimization

2.1 Random Forest Classifier

Random Forest is an ensemble learning method that operates by constructing a multitude of decision trees during training. For a classification task—such as labeling a lithium-ion battery data segment as “normal” or “faulty”—each tree in the forest casts a vote, and the class with the majority votes becomes the model’s prediction. This ensemble approach reduces overfitting and enhances generalization compared to a single decision tree. The algorithm introduces randomness in two ways: 1) by training each tree on a bootstrap sample (i.e., a random sample with replacement) of the training data, and 2) by selecting a random subset of features to split on at each node during tree construction.

The mathematical formulation for a Random Forest’s prediction can be summarized as follows: Let $\Theta_k$ represent the random vector (defining the bootstrap sample and feature subset) used to grow the $k$-th tree in an ensemble of $K$ trees. For a given input feature vector $\mathbf{x}$ (e.g., a set of Distribution Entropy values), the prediction of the $k$-th tree is $h(\mathbf{x}, \Theta_k)$. The final forest prediction $H(\mathbf{x})$ for a classification problem is:
$$H(\mathbf{x}) = \text{majority vote} \{ h(\mathbf{x}, \Theta_1), h(\mathbf{x}, \Theta_2), …, h(\mathbf{x}, \Theta_K) \}$$
The performance of the RF model is highly sensitive to its hyperparameters, primarily:

$n_{tree}$: The number of decision trees in the forest.
$m_{try}$: The number of randomly selected features considered for splitting at each node.

Improper tuning of these parameters can lead to suboptimal accuracy, excessive computational cost, or overfitting. Therefore, an automated and intelligent optimization strategy is crucial.

2.2 Crested Porcupine Optimizer (CPO) and Its Enhancement (MCPO)

The Crested Porcupine Optimizer is a novel population-based metaheuristic algorithm inspired by the four principal defensive mechanisms of the crested porcupine. The algorithm simulates these strategies in a hierarchical manner, transitioning from one to the next to balance exploration and exploitation in the search space. The four defense strategies correspond to specific mathematical update equations for the position of each candidate solution (agent) in the population.

Table 2: Mathematical Models of CPO Defense Strategies
Defense Phase	Mathematical Formulation	Key Parameters
Initialization	$X_i = L + r \cdot (U – L), \quad i=1,…,N$	$L, U$: search bounds; $r$: random vector in [0,1].
Strategy I (Sight/Sound)	$X_i^{k+1} = X_i^{k} + \tau_1 \cdot \|2 \cdot \tau_2 \cdot X_{CP}^{k} – Y_i^{k}\|$	$\tau_1, \tau_2$: random numbers; $X_{CP}^k$: best solution; $Y_i^k$: predator position.
Strategy II (Odor)	$X_i^{k+1} = (1-U_1) \cdot X_i^{k} + U_1 \cdot [Y + \tau_3 \cdot (X_{r1}^k – X_{r2}^k)]$	$U_1$: binary switch; $\tau_3$: random in [0,1]; $r1, r2$: random indices.
Strategy III (Physical Quill)	$X_i^{k+1} = (1-U_1) \cdot X_i^{k} + U_1 \cdot [X_{r1}^k + S_i^k \cdot (X_{r2}^k – X_{r3}^k) – \tau_3 \cdot \delta \cdot \gamma^k \cdot S_i^k]$	$S_i^k$: odor factor; $\delta$: direction; $\gamma^k$: defense factor; $r3$: random index.
Strategy IV (Advanced Quill)	$X_i^{k+1} = X_{CP}^{k} + [\alpha(1-\tau_4)+\tau_4] \cdot (\delta \cdot X_{CP}^{k} – X_i^{k}) – \tau_5 \cdot \delta \cdot \gamma^k \cdot F_i^k$	$\alpha$: convergence factor; $\tau_4, \tau_5$: random; $F_i^k$: average predator force.

While CPO is effective, its convergence speed and solution accuracy can be improved. We propose an enhanced version, the Modified Crested Porcupine Optimizer (MCPO), incorporating two key improvements:

1. Chaotic Map for Population Initialization: Instead of purely random initialization, a logistic chaotic map is employed to generate the initial population. This ensures a more uniform distribution of agents across the search space, improving the initial exploration capability. Starting from a random vector $\vec{X}_0$ in [0,1], the chaotic sequence is generated iteratively:
$$\vec{X}_{n+1} = r_c \cdot \vec{X}_n \cdot (1 – \vec{X}_n)$$
where $r_c = 4$. After $N$ iterations, the final chaotic population $\vec{X}_N$ is mapped to the actual search domain $[L, U]$:
$$X = L + (U – L) \cdot \vec{X}_N$$

2. Adaptive Cosine Weight Factor: An adaptive weight $\omega$ is introduced into the core update equations, particularly Strategy II, to dynamically balance global exploration and local exploitation. The weight decreases non-linearly with iteration count $k$:
$$\omega = \cos\left(\frac{\pi \cdot k}{2 \cdot K_{max}}\right)$$
where $K_{max}$ is the maximum number of iterations. The modified Strategy II update becomes:
$$X_i^{k+1} = \omega \cdot (1-U_1) \cdot X_i^{k} + U_1 \cdot [Y + \tau_3 \cdot (X_{r1}^k – X_{r2}^k)]$$
This weight starts near 1, encouraging broad exploration early on, and decays towards 0, promoting fine-tuned local search (exploitation) in later iterations, thereby enhancing convergence precision.

3. The Proposed MCPO-RF Fault Diagnosis Methodology for Lithium-Ion Batteries

The integrated framework for diagnosing progressive faults in lithium-ion batteries is executed in a systematic pipeline. The core steps are outlined below and form the basis for a reliable diagnostic system.

Step 1: Data Acquisition and Labeling. Voltage data is collected from a series-connected lithium-ion battery pack during operation (e.g., under a driving cycle profile). Faults are induced in a controlled manner in the laboratory using the external resistor method to simulate an internal short circuit in a specific cell. Each data segment (e.g., a voltage curve from a time window) is labeled as “Normal” or “Fault” based on the experimental ground truth.

Step 2: Feature Extraction via Sliding Window and DE. A sliding window of length $W$ and step size $S$ moves across the voltage time series of each individual cell. For each window segment $V_{seg}$, its Distribution Entropy $DE(V_{seg})$ is calculated. This process converts the raw voltage signal into a sequence of entropy values that capture local dynamic complexity. A feature vector for a specific diagnostic instance is constructed by concatenating the DE values from $L$ consecutive windows, typically centered around a point of interest (e.g., the suspected fault onset).

Step 3: Dataset Construction and Split. The collected feature vectors and their corresponding labels form the complete dataset. It is then randomly partitioned into a training set (e.g., 70-80%) for model development and a testing set (e.g., 20-30%) for final performance evaluation.

Step 4: MCPO-based Hyperparameter Tuning of RF. The MCPO algorithm is configured to optimize the Random Forest hyperparameters, $n_{tree}$ and $m_{try}$. The search space for these parameters is defined. The objective function for MCPO is the maximization of the diagnostic accuracy (or minimization of the error rate) obtained via cross-validation on the *training set* using an RF model with the proposed $(n_{tree}, m_{try})$ combination. MCPO iteratively searches this space, leveraging its chaotic initialization and adaptive weight to find the optimal parameter pair.

Step 5: Model Training and Evaluation. Once the optimal hyperparameters $(n_{tree}^*, m_{try}^*)$ are found, a final Random Forest model is trained on the entire training set using these parameters. This trained MCPO-RF model is the final diagnostic classifier. Its performance is rigorously assessed on the held-out *testing set* using metrics such as Accuracy, Precision, Recall, F1-Score, and Confusion Matrix.

4. Experimental Validation and Results

4.1 Laboratory Data from Simulated Internal Short Circuit

To validate the proposed method, an experiment was conducted using a series-connected pack of eight 18650 lithium-ion batteries. A FUDS (Federal Urban Driving Schedule) profile was applied for discharge. To simulate a progressive internal short circuit fault in one cell (Cell #7), external resistors ($R_{SC}$ = 20Ω, 10Ω, 7.5Ω) were connected in parallel to the cell at a specific cycle. This method creates an additional leakage current path, causing the fault cell’s voltage to gradually deviate from its peers—a hallmark of a progressive fault. The voltage divergence is subtle in the early stages, especially with higher resistance values, mimicking a challenging real-world diagnosis scenario.

Feature extraction was performed with a sliding window of 50 data points and a step of 25. The calculated Distribution Entropy for both normal and fault cells over one operational cycle revealed a critical insight: while the DE values tracked each other closely before the fault, a significant and sustained increase in the DE value of the fault cell was observed immediately after the fault injection. This demonstrates DE’s high sensitivity to the irregular voltage dynamics induced by an incipient fault in the lithium-ion battery.

For model input, a feature vector was created using the DE values from 20 consecutive windows surrounding the fault onset point for fault samples, and corresponding windows for normal samples. A dataset of 128 samples was created and split into training and testing sets. The MCPO was configured with a population size of 20 and 50 iterations to optimize the RF parameters within a defined search space ($n_{tree}$: [10, 300], $m_{try}$: [1, feature_dimension]).

4.2 Diagnostic Performance and Comparative Analysis

The performance of the proposed MCPO-RF model was evaluated using 10-fold cross-validation. The results were compared against a standard Random Forest (RF) with default parameters and a CPO-optimized RF (CPO-RF) to isolate the benefit of the proposed modifications to the optimizer.

The adaptation curve of MCPO showed superior convergence, reaching a stable optimum within a few iterations. The final MCPO-RF model achieved an average diagnostic accuracy of 97.69% across the 10 folds, with individual fold accuracies consistently at or above 95%. The confusion matrix indicated excellent capability in identifying both normal and faulty states with minimal false positives and false negatives.

Table 3: Comparative Performance of Diagnostic Models on Laboratory Data
Performance Metric	Standard RF	CPO-RF	Proposed MCPO-RF	Improvement (MCPO-RF vs. CPO-RF)
Accuracy (Ac) %	85.66	93.84	97.69	+3.85
Recall (Re) %	91.42	93.81	97.14	+3.33
Precision (Pr) %	83.47	94.73	98.55	+3.82
False Positive Rate (FPR) %	21.11	6.11	1.67	-4.44
False Negative Rate (FNR) %	8.58	6.19	2.86	-3.33
F1-Score %	87.26	94.27	97.84	+3.57

The results in Table 3 clearly demonstrate the superiority of the MCPO-RF model. It outperforms both the baseline RF and the CPO-RF across all metrics. The significant reduction in both False Positive Rate (FPR) and False Negative Rate (FNR) is particularly important for safety-critical applications involving lithium-ion batteries, as it implies fewer missed alarms for real faults and fewer unnecessary warnings for normal operation.

4.3 Validation with Real-World Electric Vehicle Data

To further demonstrate practical applicability, the method was tested on voltage data from real electric vehicles where a gradual, undetected voltage deviation in a single cell was later confirmed as a progressive fault. The BMS sampling period was 10 seconds. Applying the same sliding window and DE feature extraction process to the voltage data from 96 series-connected cells revealed a distinct and growing discrepancy in the DE value of the faulty cell (Cell #39) compared to the normal cells, even though the absolute voltage difference was very small (~2 mV). This pattern confirmed DE’s effectiveness in amplifying subtle, diagnostically relevant anomalies.

When the MCPO-RF model, trained on laboratory data features, was applied to the processed real-vehicle feature set, it successfully diagnosed the faulty cell with 100% accuracy in the tested segments. This result strongly validates the robustness and generalization potential of the proposed Distribution Entropy and MCPO-RF framework for diagnosing real-world progressive faults in lithium-ion battery packs.

5. Conclusion

This article presented a novel, data-driven fault diagnosis framework for the early detection of progressive faults, such as incipient internal short circuits, in lithium-ion batteries. The method effectively addresses the challenge of extracting meaningful features from subtle voltage anomalies by employing Distribution Entropy, a robust nonlinear dynamic measure sensitive to changes in signal complexity. The diagnostic model is built upon a Random Forest classifier, whose critical hyperparameters are intelligently optimized using an enhanced Crested Porcupine Optimizer (MCPO). The incorporation of chaotic initialization and an adaptive cosine weight factor significantly improved the optimizer’s search efficiency and final solution quality.

Comprehensive validation using both controlled laboratory experiments and real-world electric vehicle data confirmed the framework’s high performance. The MCPO-RF model achieved superior accuracy (97.69%), precision, and recall compared to baseline models, while simultaneously minimizing false alarm rates. The successful application to real vehicle data underscores the method’s practical relevance and generalization capability. Therefore, the integration of Distribution Entropy for feature extraction with metaheuristic-optimized Random Forest offers a powerful, reliable, and promising solution for enhancing the safety management of lithium-ion battery systems through early and accurate fault diagnosis.