In recent years, the rapid expansion of renewable energy systems has positioned solar power as a critical component in achieving global sustainability goals. Solar panels, which convert sunlight into electricity, are fundamental to this transition. However, the efficiency and longevity of solar panels can be compromised by various defects, such as cracks, broken grids, black cores, thick lines, and star-shaped cracks. These imperfections not only reduce energy output but also increase maintenance costs, highlighting the need for effective defect detection methods. Traditional inspection techniques, including visual checks and optical methods, often suffer from high costs, low accuracy, and an inability to classify defect types, making them unsuitable for large-scale applications. With advancements in artificial intelligence, deep learning-based approaches have emerged as promising solutions. Among these, the YOLO (You Only Look Once) series of algorithms have gained popularity due to their balance of speed and accuracy. This paper presents an enhanced YOLOv8 model, termed YOLOv8-EDD, which integrates multiple improvements to address the challenges of solar panel defect detection. By incorporating a multi-scale attention mechanism, deformable convolutions, a lightweight upsampling operator, and an advanced loss function, the proposed model achieves superior performance in both detection precision and speed. The following sections detail the methodology, experimental setup, and results, demonstrating the model’s effectiveness in real-world scenarios.
The increasing adoption of solar panels in energy systems underscores the importance of maintaining their operational integrity. Defects in solar panels can arise from manufacturing errors, environmental stress, or physical damage, leading to significant energy losses. For instance, micro-cracks may propagate over time, while black cores indicate potential cell degradation. Early detection of these issues is crucial to prevent further deterioration and ensure optimal performance. Conventional methods, such as electroluminescence imaging or thermal analysis, provide some insights but are often labor-intensive and lack scalability. In contrast, deep learning models, particularly convolutional neural networks (CNNs), offer automated and scalable alternatives. These models can learn complex patterns from image data, enabling accurate classification and localization of defects. The YOLO framework, known for its one-stage detection pipeline, processes images in a single pass, making it efficient for real-time applications. However, standard YOLO models may struggle with irregular defect shapes or small objects, necessitating further optimizations for solar panel inspections.

The YOLOv8 architecture serves as the foundation for our improvements. It consists of three main components: the backbone, neck, and head. The backbone employs convolutional layers, C2f modules, and an SPPF (Spatial Pyramid Pooling Fast) module to extract multi-scale features from input images. The neck utilizes an FPN-PAN (Feature Pyramid Network-Path Aggregation Network) structure to fuse features at different resolutions, enhancing the model’s ability to detect objects of varying sizes. The head adopts an anchor-free approach, directly predicting bounding boxes and class probabilities, which reduces computational overhead. Despite its strengths, YOLOv8 can be enhanced to better handle the specific challenges of solar panel defects, such as varying scales and complex backgrounds. Our modifications focus on four key areas: attention mechanisms, convolutional operations, upsampling techniques, and loss functions. These changes collectively improve the model’s feature extraction capabilities, computational efficiency, and generalization performance.
To enhance feature representation, we integrate the Efficient Multi-scale Attention (EMA) mechanism into the backbone network. EMA operates by dividing input features into multiple groups and processing them through parallel pathways. This allows the model to capture dependencies across different scales and spatial dimensions. The EMA structure can be represented mathematically as follows: given an input feature map ( X ) of dimensions ( C \times H \times W ), it is split into ( G ) subgroups ( X_i ), where ( i = 0, 1, \ldots, G-1 ). Each subgroup undergoes separate processing through 1×1 and 3×3 convolutional branches. The outputs are combined using global average pooling and sigmoid activation to generate attention weights. The final output ( Y ) is computed as:
$$ Y = \text{Sigmoid}\left(\text{Re-weight}\left(\text{Concat}\left(\text{AvgPool}(X_i)\right)\right)\right) $$
This mechanism enables the model to focus on relevant defect regions while suppressing irrelevant background information, which is particularly beneficial for detecting subtle defects like broken grids in solar panels.
Another critical improvement involves replacing standard convolutions with Deformable Convolutions version 2 (DCNv2) within the C2f modules. Traditional convolutions use fixed kernel sizes, which may not adapt well to irregular shapes commonly found in solar panel defects. DCNv2 introduces learnable offsets ( \Delta p_k ) and modulation weights ( \Delta m_k ) to the sampling locations, allowing the model to dynamically adjust its receptive field. The operation for a output feature ( y(p) ) at position ( p ) is defined as:
$$ y(p) = \sum_{k=1}^{K} w_k \cdot x(p + p_k + \Delta p_k) \cdot \Delta m_k $$
Here, ( K ) is the number of sampling points, ( w_k ) are the weights, ( x ) is the input feature, and ( \Delta m_k \in [0,1] ) scales the contribution of each sample. By integrating DCNv2 into the C2f modules at layers 2, 4, 6, and 8 of the backbone, the model gains flexibility in handling diverse defect morphologies, such as star-shaped cracks or irregular black cores in solar panels.
To counter the increased computational load from DCNv2, we adopt DySample, a lightweight upsampling operator, in the neck section. Unlike conventional upsampling methods like nearest-neighbor or bilinear interpolation, DySample uses a point-sampling approach that reduces memory usage and latency. Given an input feature ( \alpha ) of size ( C \times H_1 \times W_1 ), DySample generates a sampling set ( \delta ) of size ( 2g \times sH \times sW ), where ( g ) is a factor and ( s ) is the scale. The upsampled feature ( \alpha’ ) of size ( C \times H_2 \times W_2 ) is obtained via a grid sampling function:
$$ \alpha’ = \text{grid_sampler}(\alpha, \delta) $$
The sampling set ( \delta ) is derived using a static range factor of 0.25 to minimize artifacts near boundaries. This replacement significantly boosts inference speed without compromising detail preservation, which is vital for processing high-resolution images of solar panels.
For bounding box regression, we replace the CIoU loss function with WIoUv3 (Wise Intersection over Union version 3). CIoU incorporates overlap area, center distance, and aspect ratio, but it can be sensitive to low-quality samples. WIoUv3 introduces a non-monotonic focusing mechanism that down-weights the influence of outliers. The loss is computed as:
$$ L_{\text{IoU}} = 1 – \text{IoU} = 1 – \frac{W_i H_i}{wh + w_{\text{gt}} h_{\text{gt}} – W_i H_i} $$
$$ L_{\text{WIoUv1}} = R_{\text{WIoU}} L_{\text{IoU}} $$
$$ R_{\text{WIoU}} = \exp\left(\frac{(x – x_{\text{gt}})^2 + (y – y_{\text{gt}})^2}{(W_g^2 + H_g^2)^*}\right) $$
$$ L_{\text{WIoUv3}} = r L_{\text{WIoUv1}} $$
$$ r = \frac{\beta}{\delta^{\alpha \beta – \delta}} $$
Here, ( \beta ) is a quality measure related to the anchor box, and ( \alpha ), ( \delta ) are hyperparameters. This adjustment improves the model’s robustness to imbalanced datasets, which is common in solar panel defect collections where certain defect types may be underrepresented.
We conducted extensive experiments to validate the proposed YOLOv8-EDD model. The dataset comprised 4000 images of solar panels with annotations for five defect types: cracks, broken grids, black cores, thick lines, and star-shaped cracks. The images were split into training (70%), validation (20%), and test (10%) sets. The hardware setup included an NVIDIA Tesla P4 GPU and an Intel Xeon E5-2650 v4 CPU, with software environments like PyTorch 2.0.1 and CUDA 11.8. Training parameters included a batch size of 16, image size of 640×640, and 200 epochs. Evaluation metrics included precision (P), recall (R), mean average precision (mAP), parameters (Params), GFLOPs, and frames per second (FPS).
Ablation studies were performed to assess the contribution of each component. The baseline YOLOv8 model achieved a precision of 82.4%, recall of 84.9%, mAP of 88.4%, and FPS of 178.1. Adding EMA attention increased precision to 92.3% and mAP to 93.5%, with a slight FPS improvement. Incorporating DCNv2 further boosted mAP to 95.0%, though FPS decreased marginally. Replacing the upsampler with DySample raised FPS to 187.3 while maintaining high accuracy. Finally, using WIoUv3 loss resulted in the best performance: precision of 97.7%, recall of 96.2%, mAP of 98.9%, and FPS of 184.6. These results demonstrate that each modification positively impacts the model’s ability to detect defects in solar panels.
| Model | EMA | DCNv2 | DySample | WIoUv3 | P (%) | R (%) | mAP (%) | FPS |
|---|---|---|---|---|---|---|---|---|
| YOLOv8 | No | No | No | No | 82.4 | 84.9 | 88.4 | 178.1 |
| +EMA | Yes | No | No | No | 92.3 | 86.8 | 93.5 | 179.9 |
| +DCNv2 | Yes | Yes | No | No | 93.3 | 89.0 | 95.0 | 177.1 |
| +DySample | Yes | Yes | Yes | No | 93.1 | 92.7 | 96.3 | 187.3 |
| YOLOv8-EDD | Yes | Yes | Yes | Yes | 97.7 | 96.2 | 98.9 | 184.6 |
Comparative analyses with other attention mechanisms and upsampling operators further validated our choices. For instance, EMA outperformed alternatives like CBAM and SimAM in terms of precision and computational efficiency. Similarly, DySample provided better performance than CARAFE, with lower parameter counts. The WIoUv3 loss also showed superior results compared to DIoU, GIoU, and SIoU, particularly in handling low-quality samples. When tested against other YOLO variants, YOLOv8-EDD achieved the highest mAP and competitive FPS, underscoring its suitability for solar panel defect detection.
| Model | P (%) | R (%) | mAP50 (%) | Params | FPS |
|---|---|---|---|---|---|
| YOLOv3 | 68.3 | 78.1 | 81.8 | 12.13M | 108.6 |
| YOLOv5 | 80.0 | 78.5 | 83.0 | 2.50M | 127.1 |
| YOLOv6 | 84.4 | 83.2 | 85.4 | 4.23M | 121.8 |
| YOLOv7 | 88.3 | 81.7 | 87.5 | 37.21M | 105.2 |
| YOLOv8 | 82.4 | 84.9 | 88.4 | 3.01M | 178.1 |
| YOLOv8-EDD | 97.7 | 96.2 | 98.9 | 3.18M | 184.6 |
To evaluate generalization, we added a new defect category—thick lines—to the dataset. YOLOv8-EDD maintained high performance, with precision of 86.6%, recall of 89.3%, and mAP of 94.2%, outperforming YOLOv7 and YOLOv8. This indicates the model’s adaptability to unseen defect types in solar panels. Visual comparisons further confirmed these findings, with YOLOv8-EDD accurately detecting defects that baseline models missed, such as fine cracks or low-contrast black cores. The improved confidence scores and reduced false positives highlight the practical benefits of our approach.
In conclusion, the YOLOv8-EDD model presents a comprehensive solution for solar panel defect detection. By integrating EMA attention, DCNv2 convolutions, DySample upsampling, and WIoUv3 loss, it achieves significant gains in accuracy and speed. The experimental results demonstrate its superiority over existing methods, making it a reliable tool for maintaining the efficiency and durability of solar panels. Future work could explore real-time deployment on embedded systems or extension to other renewable energy components. As solar power continues to grow, such advanced detection systems will play a vital role in ensuring sustainable energy production.
The mathematical foundations of the proposed improvements can be further elaborated. For example, the EMA mechanism’s efficiency stems from its parallel structure, which reduces computational complexity. The output of the EMA module for a given input ( X ) is computed as:
$$ Y = \text{GroupNorm}\left(\text{Sigmoid}\left(\text{Matmul}\left(\text{Softmax}\left(\text{AvgPool}(X)\right)\right)\right)\right) $$
Similarly, the deformable convolution in DCNv2 enhances feature extraction by adapting to irregular shapes, which is crucial for defects like star-shaped cracks in solar panels. The modulation weights ( \Delta m_k ) ensure that only relevant regions contribute, reducing noise. The DySample operator’s efficiency is quantified by its parameter count, which is significantly lower than traditional methods. For an input of size ( C \times H \times W ), the number of parameters in DySample is proportional to ( C \times g \times s^2 ), where ( g ) is the group factor and ( s ) is the scale, leading to faster computations.
The WIoUv3 loss function’s dynamic focusing mechanism is derived from the quality measure ( \beta ), defined as:
$$ \beta = \frac{L_{\text{IoU}}^*}{L_{\text{IoU}}} $$
where ( L_{\text{IoU}}^* ) is the ideal IoU loss. This allows the model to prioritize high-quality samples during training, improving convergence and generalization. In the context of solar panels, this is particularly beneficial for datasets with imbalanced defect distributions, such as rare star-shaped cracks versus common cracks.
Overall, the YOLOv8-EDD model represents a significant advancement in automated inspection systems for solar panels. Its ability to handle diverse defect types with high precision and speed makes it suitable for industrial applications. As the demand for renewable energy rises, such technologies will be essential for optimizing the performance and lifespan of solar energy systems. Further research could focus on integrating multimodal data, such as thermal or electroluminescence images, to enhance detection capabilities. Additionally, leveraging transfer learning could adapt the model to different solar panel designs or environmental conditions, ensuring broad applicability.
