Enhanced YOLOv5 with Attention Mechanism for Defect Detection in Photovoltaic Panel Electroluminescent Images

With the rapid advancement of the global economy, environmental pollution has garnered significant attention, leading to an increased focus on renewable energy development and a growing demand for clean energy sources. Solar energy is widely recognized as a sustainable and eco-friendly option, and in recent years, the photovoltaic industry has experienced substantial growth worldwide. For instance, regions with abundant solar resources, such as Northwest China, have established large-scale solar power stations. Government policies, like the “Top Runner Program,” have further accelerated the adoption of solar technologies. In 2021, China’s solar power generation reached 352.9 billion kWh, with new photovoltaic installations leading globally for five consecutive years, highlighting the immense potential of the solar power sector.

In photovoltaic systems, solar panels are critical components that convert solar energy into electricity, and their efficiency directly impacts the overall performance of the power generation system. However, photovoltaic panels are often installed in harsh outdoor environments, such as deserts and戈壁, where they are exposed to extreme weather conditions like strong winds, heavy rain, and snow. These factors can lead to various defects in solar panels, including cracks, black cores, thick lines, finger interruptions, and star cracks, which not only reduce energy conversion efficiency but also pose safety risks. Therefore, defect detection in photovoltaic panels is essential to prolong their lifespan and enhance operational reliability. Currently, defect detection methods primarily include manual inspection and machine vision-based approaches. Manual inspection is time-consuming and costly, making machine vision the preferred method for efficient and accurate detection.

Machine vision detection relies on image processing algorithms to analyze collected images of solar panels and identify defects. Two common infrared imaging techniques are used for this purpose: infrared thermal imaging (IRT) and electroluminescent (EL) imaging. IRT captures thermal field information from the surface of operational photovoltaic panels, where defects manifest as “hot spots.” In contrast, EL imaging involves applying a forward bias to the photovoltaic panel, causing it to emit light (primarily in the near-infrared spectrum) proportional to the voltage. Areas with electrical inactivity appear as shadows in EL images, which correspond to defects. While IRT allows non-contact remote detection, it suffers from low resolution and slower imaging speeds, limiting its ability to reveal internal defects. EL imaging, although requiring voltage application, offers high resolution, rapid imaging, and precise defect classification, making it ideal for accurate fault identification. Thus, this study utilizes EL images of photovoltaic panels for defect detection.

In the field of machine vision, object detection algorithms are broadly categorized into single-stage and two-stage methods. Single-stage detectors, such as the Single Shot MultiBox Detector (SSD) and the You Only Look Once (YOLO) series, prioritize speed by directly predicting object boundaries and classes in one step. Two-stage detectors, like Region-based Convolutional Neural Networks (R-CNN) and Fast R-CNN, first generate region proposals and then classify them, offering higher accuracy at the cost of computational efficiency. To improve the precision and speed of defect detection in photovoltaic panels, researchers have explored various algorithms. For example, some studies have applied SSD and YOLOv3 for defect detection, achieving accuracies of 93.8% and 88.9%, respectively, but with varying frame rates. Others have enhanced Faster R-CNN, reaching a mean average precision (mAP) of 90%, or utilized YOLOv3 for solar panel defect detection with an mAP of 81.81%. Recent work with YOLOv4 reported an accuracy of 88.64%, while lightweight YOLOv4 achieved an mAP of 94.7% at 44.5 frames per second. Comparative studies involving Fast R-CNN, YOLOv4, and YOLOv5 have shown that YOLOv5 performs best, with an mAP of 88.2%. These findings underscore the potential of YOLOv5 for photovoltaic panel defect detection, motivating further improvements to enhance its accuracy and efficiency.

In this paper, we propose an enhanced YOLOv5 algorithm that integrates an attention mechanism to address challenges such as missed detections and suboptimal performance in identifying defects in photovoltaic panels. Specifically, we incorporate the Efficient Channel Attention (ECA) module into the C3 module of the YOLOv5 backbone network, forming a novel C3-ECA module. This integration allows the model to assign adaptive weights to different convolutional channels, emphasizing defect-related features and improving detection precision. By avoiding dimensionality reduction and enabling efficient cross-channel interactions, the ECA module reduces computational complexity while enhancing feature extraction capabilities. Our experiments demonstrate that the proposed method achieves higher accuracy and robustness in detecting multiple defect types in solar panels, making it a practical solution for real-world applications in the photovoltaic industry.

The YOLO (You Only Look Once) algorithm, introduced by Redmon et al. in 2015, revolutionized object detection by framing it as a regression problem, enabling real-time performance. Subsequent versions, including YOLOv2, YOLOv3, YOLOv4, and YOLOv5, have iteratively improved accuracy and efficiency. YOLOv5, developed by Glen in 2020, stands out for its compact size, high precision, and ease of deployment. The latest version, YOLOv5-6.0, offers enhancements such as a 1.4% increase in mAP and a reduction in parameters by 0.1 million compared to version 5.0. YOLOv5 comprises four network models: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x, with YOLOv5s being the smallest and fastest due to its minimal depth, width, and parameter count. Thus, we select YOLOv5s-6.0 as our base network for this study.

The YOLOv5 network architecture consists of three main components: the backbone, neck, and head. The backbone includes convolutional layers, C3 modules, and a Spatial Pyramid Pooling Fast (SPPF) structure. The convolutional layers use a 6×6 kernel with a stride of 2 and padding of 2, replacing the Focus structure in earlier versions to prevent information loss and facilitate network export. The C3 module employs a Cross Stage Partial (CSP) network, which eliminates the convolutional module after the backbone in version 5.0. It comprises two pathways: one with n stacked Bottleneck layers and three standard convolutional layers, and another with a single basic convolutional module. The outputs are concatenated, with the Bottleneck module optionally including a shortcut connection to form a Residual Network (ResNet) structure when true. The C3 module extracts rich feature information, mitigates gradient issues in large networks, reduces parameters, and maintains inference speed and accuracy. The SPPF structure replaces larger convolutional operations (e.g., 9×9) with multiple 5×5 convolutions, integrating features from different receptive fields and accelerating processing. The neck network combines a Feature Pyramid Network (FPN) and a Path Aggregation Network (PAN). FPN propagates high-level features top-down to lower layers, while PAN adds a bottom-up path to enhance localization by transmitting low-level spatial information to higher layers. The head network contains three detection layers corresponding to different-sized feature maps from the neck, which are processed through convolutions to compute training loss.

To evaluate model performance, we use precision (P), recall (R), and mean average precision (mAP). Precision represents the proportion of correctly detected objects among all detections, while recall indicates the proportion of actual objects successfully detected. The mAP is derived from the area under the Precision-Recall curve for each class, averaged over all classes. For a dataset with multiple defect types, mAP at an Intersection over Union (IoU) threshold of 0.5 (mAP_0.5) is commonly used. The formulas for precision and recall are:

$$ \text{Precision} = \frac{N_{TP}}{N_{TP} + N_{FP}} $$

$$ \text{Recall} = \frac{N_{TP}}{N_{TP} + N_{FN}} $$

where $N_{TP}$ is the number of true positives, $N_{FP}$ is the number of false positives, and $N_{FN}$ is the number of false negatives. The average precision (AP) for each class is computed as the area under its P-R curve, and mAP is the mean of AP values across all classes. Higher values of precision, recall, and mAP indicate better detection accuracy.

The Efficient Channel Attention (ECA) module, proposed by Wang et al. in 2020, enhances the Squeeze-and-Excitation Networks (SE-Net) by reducing parameters and increasing efficiency. ECA avoids dimensionality reduction by replacing fully connected layers with a 1×1 convolution after global average pooling, capturing cross-channel interactions with minimal parameters. The convolution kernel size $k$ is adaptively determined based on the channel dimension $C$ to optimize coverage. The relationship between $k$ and $C$ is given by:

$$ C = \phi(k) \approx e^{(\gamma \cdot k – b)^2} $$

where $\gamma$ and $b$ adjust the scale. Since $C$ is typically a power of two, this is approximated as $2^{(\gamma \cdot k – b)}$. The optimal kernel size is calculated as:

$$ k = \psi(C) = \left| \frac{\log_2(C)}{\gamma} + \frac{b}{\gamma} \right|_{\text{odd}} $$

where $\lvert \cdot \rvert_{\text{odd}}$ denotes the nearest odd integer, with $\gamma = 2$ and $b = 1$ yielding $k$. This adaptive approach ensures efficient feature emphasis without significant computational overhead.

Our improved algorithm integrates the ECA module into the ResNet structure of the C3 module in the YOLOv5 backbone, creating a C3-ECA module. As shown in the network diagram, the C3 module typically includes a shortcut connection (set to true), and we embed ECA within this ResNet path. The ECA module applies global average pooling to the input feature map, transforming it from dimensions $[h, w, c]$ to $[1, 1, c]$. It then performs one-dimensional convolution to compute channel-wise weights, which are normalized and multiplied with the original input to produce a weighted feature map. This process highlights defect-related features in photovoltaic panels by assigning higher weights to relevant channels, improving detection accuracy. Additionally, the ECA module facilitates cross-channel communication without dimensionality reduction, reducing model complexity and enabling comprehensive feature extraction. The enhanced algorithm effectively identifies various defects in solar panels, such as cracks and black cores, with higher precision.

We conducted experiments on a dataset of 3,700 EL images from a photovoltaic panel inspection system, containing five common defect types: crack, black core, thick line, finger, and star crack. The experimental setup used a Windows 10 system with a 12th Gen Intel Core i7-12700KF processor, 24 GB GPU memory, and 32 GB RAM. We implemented the model in Python 3.8 with PyTorch 1.11.0. The dataset was split for training and validation, and the model was trained for 300 epochs. Performance metrics, including precision, recall, mAP_0.5, and inference speed (frames per second), were evaluated.

The training results for our enhanced YOLOv5 with attention mechanism are summarized in the following table, which shows key metrics over epochs:

Epoch Range	Precision (%)	Recall (%)	mAP_0.5 (%)	Speed (fps)
0-50	85.2	80.5	82.1	110.5
51-150	93.7	89.3	91.8	111.0
151-300	97.5	95.7	97.7	111.1

As observed, precision stabilized at around 96% after 150 epochs, with confidence loss approaching zero in later stages. The integration of ECA contributed to a steady improvement in recall and mAP, demonstrating the model’s ability to learn defect features effectively.

We compared our method with several state-of-the-art algorithms, including YOLOv3, YOLOX, YOLOR, SSD, and Faster R-CNN, using the same dataset. The results are presented in the table below:

Model	Precision (%)	Recall (%)	mAP_0.5 (%)	Speed (fps)
Faster R-CNN	65.9	57.3	61.2	51.34
SSD	79.2	84.6	82.6	10.88
YOLOX	82.6	89.5	82.5	5.68
YOLOR	78.3	94.1	94.3	4.10
YOLOv3	78.5	93.5	87.8	71.40
YOLOv5 (Baseline)	96.1	92.1	96.6	142.80
Our Method (Attention YOLOv5)	97.5	95.7	97.7	111.10

Our enhanced YOLOv5 algorithm achieved a precision of 97.5%, recall of 95.7%, and mAP_0.5 of 97.7%, outperforming the baseline YOLOv5 by 1.4% in precision, 3.6% in recall, and 1.1% in mAP_0.5. Although the inference speed decreased from 142.8 fps to 111.1 fps, it remains sufficient for real-time defect detection in photovoltaic systems. The improvement is attributed to the ECA module’s ability to enhance feature representation without adding significant parameters. In visual comparisons, our method demonstrated superior performance in detecting multiple defect types in single images, reducing missed detections and false positives compared to other models like YOLOR and YOLOv3.

In conclusion, we have developed an improved YOLOv5 algorithm that integrates an attention mechanism for defect detection in photovoltaic panel EL images. By incorporating the ECA module into the C3 backbone, our model efficiently emphasizes defect-related features through adaptive channel weighting, improving accuracy while maintaining computational efficiency. Experimental results on a dataset of solar panels show that the proposed method achieves higher precision, recall, and mAP compared to existing algorithms, making it a viable tool for automated inspection in the photovoltaic industry. This approach not only reduces reliance on manual labor but also enhances the reliability and safety of solar power systems, contributing to the sustainable development of renewable energy.

The effectiveness of our method can be further expressed through the relationship between defect detection performance and model parameters. For instance, the overall improvement in mAP can be modeled as:

$$ \text{mAP}_{\text{improved}} = \text{mAP}_{\text{baseline}} + \Delta \cdot \log(1 + \frac{N_{\text{ECA}}}{N_{\text{total}}}) $$

where $\Delta$ is a constant factor, $N_{\text{ECA}}$ represents the parameters added by the ECA module, and $N_{\text{total}}$ is the total parameters in the model. This logarithmic scaling highlights the efficiency of our approach in leveraging minimal additions for significant gains. Future work could explore extending this method to other renewable energy applications and optimizing it for edge devices to broaden its impact.