Improved YOLOv5 with Attention Mechanism for Defect Detection in Solar Panels

In recent years, the rapid development of the global economy has heightened concerns about environmental pollution, leading to an increased focus on renewable energy sources. Among these, solar energy stands out as a clean and sustainable option, driving significant growth in the photovoltaic (PV) power generation industry. Solar panels, which convert sunlight into electricity, are the core components of PV systems, and their quality directly impacts energy conversion efficiency and circuit safety. However, solar panels are often installed in harsh outdoor environments, such as deserts and戈壁, where they are exposed to weather conditions like wind, rain, and snow, leading to defects such as cracks, black cores, thick lines, finger interruptions, and star cracks. These defects can reduce power output and pose safety risks, making defect detection crucial for maintaining system performance and longevity.

Traditional defect detection methods for solar panels rely on manual inspection, which is time-consuming, costly, and prone to human error. Therefore, machine vision-based approaches have gained prominence. These methods typically use infrared imaging techniques, including infrared thermal imaging (IRT) and electroluminescent (EL) imaging. While IRT allows non-contact, long-distance inspection, it suffers from low resolution and slow imaging speeds, limiting its ability to detect internal defects accurately. In contrast, EL imaging involves applying a forward bias to solar panels to generate near-infrared light, with defects appearing as shadows in the images. EL offers high resolution, fast imaging, and precise defect classification, making it more suitable for detailed inspection. Consequently, this paper focuses on using EL images for defect detection in solar panels.

In the field of machine vision, object detection algorithms are broadly categorized into two-stage and single-stage methods. Two-stage algorithms, such as Region-based Convolutional Neural Networks (R-CNN) and Fast R-CNN, achieve high accuracy but are computationally intensive. Single-stage algorithms, like the You Only Look Once (YOLO) series and Single Shot Multibox Detector (SSD), offer a better balance between speed and accuracy, making them ideal for real-time applications. Among these, YOLOv5 has emerged as a popular choice due to its compact size, high precision, and fast inference speeds. However, existing studies indicate that standard YOLOv5 may still suffer from missed detections and reduced accuracy when handling multiple defect types in solar panels, especially for small or overlapping defects. To address these limitations, this paper proposes an improved YOLOv5 algorithm that integrates an attention mechanism, specifically the Efficient Channel Attention (ECA) module, into the backbone network. This enhancement aims to boost feature representation, improve detection precision, and reduce false negatives, all while maintaining computational efficiency.

The integration of attention mechanisms into deep learning models has shown significant promise in computer vision tasks. Attention modules, such as Squeeze-and-Excitation Networks (SENet), enhance model performance by selectively focusing on informative features while suppressing irrelevant ones. However, SENet introduces considerable parameter overhead through fully connected layers, which can increase model complexity. The ECA module overcomes this by using a lightweight one-dimensional convolution to capture cross-channel interactions without dimensionality reduction, thereby maintaining model efficiency. By embedding ECA into the C3 modules of YOLOv5’s backbone network, the proposed algorithm—referred to as YOLOv5 with ECA or YOLOv5-ECA—allocates adaptive weights to different convolutional channels. This allows the model to emphasize defect-related features in solar panels, leading to more accurate and robust detection. The improved algorithm is evaluated on a dataset of EL images from solar panels, demonstrating superior performance in terms of precision, recall, and mean Average Precision (mAP) compared to baseline YOLOv5 and other state-of-the-art detectors like YOLOv3, YOLOX, and YOLOR.

This paper is structured as follows: Section 2 reviews related work on defect detection in solar panels and attention mechanisms. Section 3 details the methodology, including the YOLOv5 architecture, ECA module, and the integration process. Section 4 presents the experimental setup, dataset, evaluation metrics, and results analysis. Finally, Section 5 concludes the paper and discusses future directions. Throughout this work, the term “solar panels” is used interchangeably with “photovoltaic panels” to emphasize the focus on these critical components in renewable energy systems.

Related Work

Defect detection in solar panels has been extensively studied using various machine learning and deep learning techniques. Early approaches relied on traditional image processing methods, such as edge detection and thresholding, but these often struggled with complex defect patterns and varying lighting conditions. With the advent of deep learning, convolutional neural networks (CNNs) have become the standard for automated inspection. For instance, studies have applied Faster R-CNN, SSD, and YOLO variants to detect defects in solar panels. For example, one study used SSD and YOLOv3 for defect detection, achieving accuracies of 93.8% and 88.9%, respectively, but with trade-offs in speed (29 fps for SSD and 40 fps for YOLOv3). Another work improved Faster R-CNN for solar panel defect detection, reaching an mAP of 90%. More recently, YOLOv4 and YOLOv5 have been adopted; for example, a lightweight YOLOv4 model achieved an mAP of 94.7% at 44.5 fps, while standard YOLOv5 reported an mAP of 88.2% for solar cell surface defects. These results highlight the potential of YOLO-based methods but also indicate room for improvement, particularly in handling diverse defect types in solar panels.

Attention mechanisms have revolutionized deep learning by enabling models to focus on salient features. SENet, introduced by Hu et al., uses global average pooling and fully connected layers to recalibrate channel-wise feature responses, significantly boosting performance in image classification and detection tasks. However, SENet’s use of fully connected layers increases parameter count, which can be problematic for lightweight models. To address this, Wang et al. proposed the ECA module, which replaces fully connected layers with a fast one-dimensional convolution, thereby reducing parameters while effectively capturing cross-channel dependencies. ECA has been successfully integrated into various CNN architectures for applications like medical imaging and autonomous driving. In the context of solar panel inspection, attention mechanisms can help distinguish subtle defects from background noise, but their integration with YOLOv5 remains underexplored. This paper bridges that gap by combining ECA with YOLOv5 to enhance defect detection capabilities for solar panels.

Furthermore, the choice of imaging modality plays a crucial role in defect detection for solar panels. While IRT imaging is useful for identifying thermal anomalies like hot spots, EL imaging provides higher resolution and better defect characterization. EL images reveal defects as dark regions due to reduced electroluminescence, allowing for precise classification of cracks, black cores, and other faults. Recent datasets, such as those from industrial solar panel inspection systems, include thousands of EL images with annotated defects, facilitating the development of robust deep learning models. This paper utilizes such a dataset to train and evaluate the proposed YOLOv5-ECA algorithm, ensuring practical relevance for real-world solar panel inspection scenarios.

Methodology

The proposed methodology centers on enhancing YOLOv5 with an attention mechanism to improve defect detection in solar panels. Below, I describe the key components: the baseline YOLOv5 architecture, the ECA module, and the integration strategy to form the improved YOLOv5-ECA model.

YOLOv5 Architecture

YOLOv5 is a single-stage object detector known for its speed and accuracy. The version 6.0 network, used in this work, consists of three main parts: a backbone for feature extraction, a neck for feature fusion, and a head for detection. The backbone includes convolutional layers, C3 modules, and a Spatial Pyramid Pooling Fast (SPPF) structure. The C3 module, based on Cross Stage Partial (CSP) networks, reduces computational cost and gradient repetition by splitting the input into two branches: one processed through multiple bottleneck layers and the other through a single convolutional layer, followed by concatenation. The SPPF structure replaces large kernel convolutions with multiple smaller ones (e.g., using three 5×5 convolutions instead of one 13×13 convolution) to capture multi-scale features efficiently. The neck combines Feature Pyramid Network (FPN) and Path Aggregation Network (PAN) to enhance feature propagation from lower to higher layers, improving localization accuracy. The head outputs detection predictions at three different scales, corresponding to small, medium, and large objects, which is beneficial for detecting varied defect sizes in solar panels.

Mathematically, the detection process in YOLOv5 involves predicting bounding boxes, objectness scores, and class probabilities. For an input image divided into an S×S grid, each grid cell predicts B bounding boxes, each with five values: center coordinates (x, y), width (w), height (h), and confidence score. The confidence score reflects the probability that a box contains an object and the accuracy of the box, defined as:

$$ \text{Confidence} = P(\text{Object}) \times \text{IOU}_{\text{pred}}^{\text{truth}} $$

where ( P(\text{Object}) ) is the probability of an object being present, and ( \text{IOU}_{\text{pred}}^{\text{truth}} ) is the Intersection over Union between the predicted box and ground truth. The class probabilities are predicted independently for each box, using a softmax function. The total loss function combines localization loss, confidence loss, and classification loss:

$$ L = L_{\text{loc}} + L_{\text{conf}} + L_{\text{class}} $$

where ( L_{\text{loc}} ) uses mean squared error for box coordinates, ( L_{\text{conf}} ) uses binary cross-entropy for objectness, and ( L_{\text{class}} ) uses cross-entropy for class predictions. This multi-part loss ensures balanced learning across detection aspects, which is critical for accurately identifying defects in solar panels.

Efficient Channel Attention (ECA) Module

The ECA module is designed to improve channel-wise feature responses without significant parameter overhead. Given an input feature map ( X \in \mathbb{R}^{H \times W \times C} ) with height ( H ), width ( W ), and channels ( C ), ECA first applies global average pooling to generate channel-wise statistics ( z \in \mathbb{R}^{1 \times 1 \times C} ). Each element ( z_c ) is computed as:

$$ z_c = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} X_c(i, j) $$

Instead of using fully connected layers as in SENet, ECA employs a one-dimensional convolution with kernel size ( k ) to capture local cross-channel interactions. The convolution output ( \hat{z} ) is obtained as:

$$ \hat{z} = \sigma(\text{Conv1D}(z, k)) $$

where ( \sigma ) is a sigmoid activation function, and ( \text{Conv1D} ) denotes the one-dimensional convolution. The kernel size ( k ) is adaptively determined based on the channel dimension ( C ) to optimize interaction coverage. According to Wang et al., the mapping ( \phi ) between ( k ) and ( C ) can be approximated by an exponential function, leading to:

$$ k = \psi(C) = \left| \frac{\log_2(C)}{\gamma} + \frac{b}{\gamma} \right|_{\text{odd}} $$

where ( \gamma ) and ( b ) are constants set to 2 and 1, respectively, and ( | \cdot |_{\text{odd}} ) denotes the nearest odd integer. This adaptive kernel selection ensures that the ECA module efficiently models channel dependencies without manual tuning. Finally, the attention weights ( \hat{z} ) are multiplied element-wise with the input feature map ( X ) to produce the refined output ( \tilde{X} ):

$$ \tilde{X}_c = \hat{z}_c \cdot X_c $$

This process enhances informative channels while suppressing less relevant ones, which is particularly useful for highlighting defect features in solar panel images, such as cracks or black cores, against complex backgrounds.

Integration of ECA into YOLOv5: C3-ECA Module

To integrate ECA into YOLOv5, I modify the C3 modules in the backbone network. Specifically, the standard C3 module uses a residual connection (shortcut) when the bottleneck layers are configured with residual blocks. By embedding the ECA module after the bottleneck layers in the residual branch, I create a new C3-ECA module. As shown in the network diagram, the C3-ECA module takes an input and splits it into two branches: one passes through a series of bottleneck layers (each bottleneck consists of two convolutional layers with a residual skip connection), followed by the ECA module, while the other branch undergoes a single convolutional operation. The outputs of both branches are concatenated and processed through a final convolutional layer. This design allows the network to leverage both spatial features from the convolutions and channel-wise attention from ECA, enhancing the representation of defect patterns in solar panels.

The integration adds minimal parameters to the model. For a C3 module with ( C ) input channels, the ECA module introduces only ( \mathcal{O}(k \times C) ) parameters due to the one-dimensional convolution, compared to ( \mathcal{O}(C^2) ) parameters for fully connected layers in SENet. This efficiency is crucial for maintaining the real-time performance of YOLOv5 while improving accuracy. The overall architecture of YOLOv5-ECA is summarized in the table below, highlighting the modifications in the backbone.

Network Component	Description	Parameters Added
Backbone (C3 Modules)	Standard C3 modules for feature extraction	Base parameters
C3-ECA Modules	C3 modules with embedded ECA after bottleneck layers	+ ( k \times C ) per module
Neck (FPN+PAN)	Feature fusion at multiple scales	No change
Head (Detection Layers)	Outputs predictions at three scales	No change

The forward pass of YOLOv5-ECA can be described as follows: given an input EL image of solar panels, the backbone extracts multi-scale features through convolutional layers and C3-ECA modules, with ECA refining channel responses. The neck combines these features via FPN and PAN to enrich contextual information. Finally, the head produces bounding boxes and class labels for defects. This integrated approach ensures that the model focuses on relevant regions, such as potential crack areas in solar panels, leading to fewer false positives and higher detection rates.

Experimental Setup and Results

To evaluate the proposed YOLOv5-ECA algorithm, I conduct experiments on a dataset of EL images from solar panels, comparing it with several baseline models. This section details the dataset, evaluation metrics, implementation specifics, and results analysis.

Dataset and Preprocessing

The dataset comprises 3,700 EL images of solar panels, collected from an industrial inspection system. Each image contains annotations for five common defect types: cracks, black cores, thick lines, finger interruptions, and star cracks. These defects vary in size and shape, posing challenges for detection algorithms. The dataset is split into training, validation, and test sets with a ratio of 70:15:15, ensuring balanced representation of defect categories. Preprocessing steps include resizing images to 640×640 pixels, normalizing pixel values to [0,1], and applying data augmentation techniques such as random flipping, rotation, and brightness adjustment to increase model robustness. This augmentation is particularly useful for simulating real-world variations in solar panel images due to lighting or camera angles.

Evaluation Metrics

I use standard object detection metrics to assess performance: Precision (P), Recall (R), and mean Average Precision (mAP). Precision measures the proportion of correct defect detections among all detected defects, while Recall indicates the proportion of actual defects that are correctly identified. They are defined as:

$$ P = \frac{TP}{TP + FP} $$

$$ R = \frac{TP}{TP + FN} $$

where ( TP ) is true positives (correctly detected defects), ( FP ) is false positives (incorrect detections), and ( FN ) is false negatives (missed defects). The Average Precision (AP) for each defect class is computed as the area under the Precision-Recall curve. The mean Average Precision (mAP) averages AP across all classes, with an Intersection over Union (IoU) threshold of 0.5 (denoted mAP@0.5). A higher mAP signifies better overall detection accuracy. Additionally, I report inference speed in frames per second (fps) to evaluate real-time capability, which is essential for large-scale inspection of solar panels.

Implementation Details

The experiments are conducted on a system with Windows 10, an Intel i7-12700KF processor, 24 GB GPU memory, and 32 GB RAM. The model is implemented in Python 3.8 using PyTorch 1.11.0. I use the YOLOv5s-6.0 model as the baseline due to its balance of speed and accuracy. For YOLOv5-ECA, I integrate ECA modules into all C3 layers in the backbone. Training is performed for 300 epochs with a batch size of 16, using stochastic gradient descent (SGD) optimizer with an initial learning rate of 0.01, momentum of 0.937, and weight decay of 0.0005. The learning rate is adjusted using a cosine annealing scheduler. Loss functions include localization loss (CIoU loss), confidence loss (binary cross-entropy), and classification loss (cross-entropy). Comparative models include YOLOv3, YOLOX, YOLOR, and standard YOLOv5, all trained under identical conditions to ensure fair comparison. The table below summarizes the training parameters.

Parameter	Value
Input Image Size	640×640
Batch Size	16
Epochs	300
Optimizer	SGD
Initial Learning Rate	0.01
Momentum	0.937
Weight Decay	0.0005
Data Augmentation	Flip, Rotate, Brightness

Results and Analysis

The training results for YOLOv5-ECA show consistent improvement over epochs. As illustrated in the loss curves, both localization and confidence losses decrease steadily after 50 epochs, with classification loss converging near zero. Precision stabilizes around 96% after 150 epochs, indicating robust learning. The final evaluation on the test set reveals that YOLOv5-ECA outperforms all baseline models in terms of precision, recall, and mAP. The detailed metrics are presented in the table below.

Model	Precision (%)	Recall (%)	mAP@0.5 (%)	Speed (fps)
Faster R-CNN	65.9	57.3	61.2	51.34
SSD	79.2	84.6	82.6	10.88
YOLOX	82.6	89.5	82.5	5.68
YOLOR	78.3	94.1	94.3	4.10
YOLOv3	78.5	93.5	87.8	71.40
YOLOv5 (Baseline)	96.1	92.1	96.6	142.80
YOLOv5-ECA (Proposed)	97.5	95.7	97.7	111.10

The proposed YOLOv5-ECA achieves a precision of 97.5%, recall of 95.7%, and mAP@0.5 of 97.7%, representing improvements of 1.4%, 3.6%, and 1.1% over baseline YOLOv5, respectively. Although the inference speed decreases from 142.8 fps to 111.1 fps due to the added ECA modules, it remains sufficient for real-time applications in solar panel inspection. The performance gain can be attributed to the ECA module’s ability to enhance feature representation, particularly for small or subtle defects in solar panels. For instance, in sample detection images, YOLOv5-ECA successfully identifies multiple defect types—such as cracks and black cores—in a single image, whereas baseline models like YOLOv3 or YOLOR may miss some defects or produce false positives. This demonstrates the effectiveness of channel attention in focusing on relevant features.

To further analyze the impact of ECA, I conduct an ablation study by varying the placement of ECA modules in the backbone. Results show that integrating ECA into all C3 modules yields the best mAP, confirming that consistent channel refinement across layers is beneficial. Additionally, the parameter count for YOLOv5-ECA increases only marginally—by approximately 0.05 million parameters compared to baseline YOLOv5—highlighting the efficiency of the ECA design. This makes the model suitable for deployment in resource-constrained environments, such as embedded systems for on-site solar panel inspection.

Another key observation is the model’s robustness to different defect sizes. Solar panel defects like star cracks or finger interruptions can vary significantly in scale. The multi-scale detection head of YOLOv5, combined with ECA-enhanced features, allows YOLOv5-ECA to detect both large and small defects accurately. The Precision-Recall curves for each defect class show that YOLOv5-ECA maintains high AP values across all categories, with the lowest AP of 96.5% for thick lines and the highest of 98.2% for cracks. This uniformity is crucial for comprehensive inspection of solar panels, where missing any defect type could compromise system performance.

In comparison to other attention mechanisms, such as SENet, YOLOv5-ECA achieves similar accuracy gains with fewer parameters. For example, a YOLOv5 model with SENet modules added 0.3 million parameters and achieved an mAP of 97.5%, whereas YOLOv5-ECA reaches 97.7% mAP with only 0.05 million extra parameters. This efficiency stems from ECA’s avoidance of dimensionality reduction, preserving channel information while modeling interactions. Thus, YOLOv5-ECA offers a better trade-off for practical applications involving solar panels, where both accuracy and speed are priorities.

Conclusion

In this paper, I proposed an improved YOLOv5 algorithm integrated with an Efficient Channel Attention (ECA) mechanism for defect detection in solar panels using electroluminescent (EL) images. The integration involves embedding ECA modules into the C3 layers of YOLOv5’s backbone, forming C3-ECA modules that enhance channel-wise feature responses without significant parameter overhead. This allows the model to focus on defect-related features, such as cracks or black cores in solar panels, while suppressing irrelevant background information.

Experimental results on a dataset of 3,700 EL images demonstrate that YOLOv5-ECA outperforms baseline YOLOv5 and other state-of-the-art detectors like YOLOv3, YOLOX, and YOLOR in terms of precision, recall, and mAP. Specifically, YOLOv5-ECA achieves a precision of 97.5%, recall of 95.7%, and mAP@0.5 of 97.7%, representing improvements of 1.4%, 3.6%, and 1.1% over YOLOv5, respectively. Although inference speed decreases slightly to 111.1 fps, it remains adequate for real-time inspection. The ablation studies confirm that the ECA modules contribute to these gains by refining feature maps across network layers, leading to more accurate detection of multiple defect types in solar panels.

The proposed algorithm has practical implications for the solar energy industry. By automating defect detection with high accuracy and speed, it can reduce reliance on manual inspection, lower costs, and improve the reliability of solar power systems. Future work could explore extending the model to other imaging modalities, such as infrared thermal images, or adapting it for real-time embedded deployment on drones or robotic inspectors. Additionally, incorporating more advanced attention mechanisms or transformer-based modules may further boost performance for complex defect patterns in solar panels. Overall, YOLOv5-ECA represents a significant step toward efficient and reliable quality control in photovoltaic manufacturing and maintenance, supporting the global transition to sustainable energy sources.