Consistency State Recognition for Distributed Energy Storage Cells Based on YOLOv7

In distributed energy storage systems, the inconsistency among individual energy storage cells due to manufacturing variations, environmental factors, and operational differences can severely impact system performance and lead to cascading failures. Traditional methods relying on single-state features often fail to capture real-time variations, resulting in low recognition accuracy. To address this, I propose a consistency state recognition technique for distributed energy storage cells based on YOLOv7. This approach integrates multi-modal data fusion with advanced deep learning to accurately identify states such as normal operation, local overheating, deformation, and performance degradation. By leveraging YOLOv7’s feature extraction and enhancement capabilities, the method achieves high precision in diagnosing and classifying consistency states, ensuring improved safety and management of energy storage systems.

The core of this technique involves collecting operational and image data from energy storage cells using sensors and high-resolution cameras. Operational data include voltage, current, internal resistance, and charge-discharge cycles, while image data encompass thermal distribution, geometric deformation, and surface degradation. To enhance data quality, I apply filtering techniques to both modalities. For operational data, a sliding window filter is used:

$$X_i = \frac{1}{n} \sum_{e=0}^{n-1} x(t – e)$$

where $X_i$ represents the filtered operational data, $n$ is the filter window size, $x(\cdot)$ denotes the original data, and $t$ is the sampling time. For image data, Gaussian filtering is applied:

$$Y(i,j) = \sum_{a=1}^{A} \sum_{b=1}^{B} w(a,b) I(i+a, j+b)$$

Here, $Y(i,j)$ is the filtered image, $w(a,b)$ is the Gaussian kernel function, and $I(i+a, j+b)$ represents the pixel values after shifting. This preprocessing step reduces noise and improves feature clarity.

To address the challenges of aligning time-series operational data with spatial image data, I employ weighted fusion. Operational data are assigned a higher weight to preserve critical electrochemical characteristics, while image data contribute spatial information. The fusion process is defined as:

$$G_d = \delta_x X + \delta_y Y(i,j)$$

where $G_d$ is the fused multi-modal data, $\delta_x$ and $\delta_y$ are weights for operational and image data, respectively. This fusion enables synchronized feature extraction and enhances the robustness of subsequent analyses.

The fused data are input into the YOLOv7 algorithm for feature extraction. The convolutional layers of YOLOv7 capture multi-scale features relevant to consistency states, such as thermal anomalies and structural deformations. The feature extraction process is expressed as:

$$Y_j = \sum_{m=0}^{M-1} \sum_{n=0}^{N-1} \sum_{p=0}^{P-1} W_{mnp} G_d + b_j$$

where $Y_j$ is the output of the convolutional layer, $W_{mnp}$ represents the kernel weights, $M$, $N$, and $P$ are the input, output, and kernel dimensions, and $b_j$ is the bias term. Following this, pooling layers and output functions refine the features:

$$T_z = Q(C_h(w_c Y_j + b_c) + b_q)$$

Here, $T_z$ denotes the extracted consistency state features, $C_h(\cdot)$ is the pooling function, $w_c$ and $b_c$ are pooling parameters, and $Q(\cdot)$ is the output layer function. To enhance these features, a feature pyramid network is applied:

$$T_z^* = \sum_{z=1}^{Z} F_a(F_b T_z) \lambda_z$$

where $T_z^*$ is the enhanced feature, $\lambda_z$ is the polarization impedance enhancement coefficient, and $F_a$ and $F_b$ are residual branch functions and parameters, respectively. This step amplifies critical details for accurate state diagnosis.

For consistency state diagnosis, the enhanced features are fed into YOLOv7’s detection head. A loss function is defined to quantify the alignment between detected and actual states:

$$v_s = \frac{4}{\pi} \left( \arctan \frac{a_c}{h_c} – \arctan \frac{a_p}{h_p} \right)$$

where $v_s$ is a correction factor, $a_c$ and $h_c$ are the dimensions of the detection box, and $a_p$ and $h_p$ are the target box dimensions. The loss function combines intersection-over-union (IoU) and center distance metrics:

$$L_s = r_s (1 – \text{IoU}) + \frac{d_s^2}{c_s^2} + \alpha_s v_s$$

Here, $L_s$ is the total loss, $r_s$ is the IoU between detection and ground truth boxes, $d_s$ is the center distance, $c_s$ is the diagonal distance, and $\alpha_s$ is a voltage drop weight coefficient. Based on this loss, the state classification is performed:

$$F_l = \sum_{z=1}^{Z} \beta_z \left[ \phi(d_k) + \left(1 – \phi(d_k)\right) \prod_{g=1}^{r} m_g \right] – \prod_{g=1}^{r} \beta_z \phi(d_k) L_s$$

where $F_l$ is the classification result, $\beta_z$ is a normalization factor, $\phi(\cdot)$ is the state function, $d_k$ are state parameters, $m_g$ are neighbor parameters, and $r$ is the iteration count. If $F_l$ exceeds a threshold, the energy storage cell is deemed consistent; otherwise, it is flagged for further inspection.

To recognize specific consistency states, I compute a state value based on temperature field contrast and operational deviations. The temperature contrast is calculated as:

$$H_z = -\sum_{z=1}^{Z} p(T_z^*) \log p(T_z^*)$$

where $H_z$ is the temperature field contrast value, and $p(T_z^*)$ is the probability of feature occurrence. The consistency state value is then derived as:

$$Z_l = \frac{H_z \gamma_z}{F_l} + P_l$$

Here, $Z_l$ is the state value, $\gamma_z$ is a cell type parameter, and $P_l$ is the consistency state deviation. States are classified as follows: normal for $H_z \in (0, 0.50)$, local overheating for $H_z \in (0.50, 0.70)$, deformation for $H_z \in (0.70, 0.90)$, and performance degradation for $H_z \in (0.90, 1.00)$. This quantitative approach ensures precise identification of multiple consistency states in energy storage cells.

In experimental validation, I tested the method on a 900 kW lithium battery energy storage system comprising 10 parallel strings of 100 cells each. Key parameters of individual energy storage cells are summarized in Table 1.

Table 1: Parameters of Individual Energy Storage Cells
Parameter	Value
Cell Voltage	3.2 V
Cell Capacity	280 Ah
Discharge Rate	0.5C
Charge Rate	0.2C
Cycle Life	8000 cycles
Mass Energy Density	150-200 Wh/kg
Volume Energy Density	300-400 Wh/L
Internal Resistance	20 mΩ
Operating Temperature	-20 to 55°C

For the YOLOv7 algorithm, I set parameters as shown in Table 2 to optimize feature extraction and detection.

Table 2: YOLOv7 Algorithm Parameters
Parameter	Value
Input Channels	640×640
Convolutional Layers	4
Feature Map Size	160×160×128
Pooling Kernel Size	9×9
Output Channels	3×9
Initial Learning Rate	0.01
Batch Size	32
Training Epochs	300
Confidence Threshold	0.50

During training, the loss function decreased from an initial value of approximately 0.74 to below 0.65 after 50 epochs, reaching a minimum of 0.28 by epoch 120, indicating stable convergence. The method achieved a recognition rate of 0.98 for various consistency states, with an IoU value of 0.97, demonstrating superior accuracy compared to existing techniques. This high performance is attributed to the effective fusion of multi-modal data and YOLOv7’s robust feature handling, enabling real-time monitoring and enhanced safety for distributed energy storage systems.

In conclusion, the integration of YOLOv7 with multi-modal data fusion provides a reliable solution for consistency state recognition in energy storage cells. By accurately identifying states such as overheating and deformation, this technique facilitates proactive maintenance and optimizes system performance. Future work will focus on adapting the method to larger-scale systems and incorporating additional data sources for even greater precision in energy storage management.