With the global strategic shift towards clean energy, photovoltaic power generation, as a key technology, has seen its installed capacity grow rapidly. The “Dual Carbon” goals have further accelerated this trend. As of recent data, China’s cumulative grid-connected photovoltaic capacity has surpassed significant milestones. In this context, ensuring the long-term reliability and efficient power generation of photovoltaic systems is paramount. Solar panels, the core components, are susceptible to various defects during manufacturing, transportation, installation, and operation, such as cracks, snail trails, soldering failures, and cell fragmentation. These defects can severely impair the panel’s performance and longevity. Therefore, developing fast, accurate, and automated defect detection methods is crucial for maintaining the health of photovoltaic power plants and maximizing economic returns.

Traditional defect inspection of solar panels often relies on manual visual checks or electroluminescence (EL) imaging combined with expert analysis. These methods are not only time-consuming, labor-intensive, and costly but are also prone to subjectivity and inconsistency, making them unsuitable for large-scale photovoltaic farms. With the advancement of machine vision and deep learning, automated visual inspection has emerged as a promising solution. Deep learning-based object detection algorithms, particularly one-stage detectors like the YOLO series, have shown great potential due to their excellent balance between speed and accuracy. However, when applied to solar panel defect detection, several challenges persist. Defects like micro-cracks or small soldering points often appear as extremely small objects in the captured images, making them difficult for standard models to detect reliably. Furthermore, the pursuit of high detection accuracy often leads to complex model architectures with a large number of parameters and high computational cost (GFLOPs), hindering their deployment on edge devices or in real-time inspection systems within industrial settings.
To address these issues, this work proposes an enhanced YOLOv8 model specifically optimized for solar panel defect detection. The primary objectives are to improve the detection performance for small defects, reduce the model’s computational footprint, and enhance overall robustness. The contributions of this paper are threefold. First, we design a novel multi-scale convolution module and integrate it into the backbone network to form an improved C2f structure, termed C2f-MS. This module enhances multi-scale feature extraction and fusion capabilities while simultaneously reducing parameters and computations. Second, we introduce the Normalized Wasserstein Distance (NWD) into the original CIOU loss function. This hybrid loss function is more sensitive to small objects, thereby significantly boosting the model’s performance in detecting minor defects on solar panels. Finally, we replace the standard Non-Maximum Suppression (NMS) post-processing with Soft-NMS. This change mitigates the problem of missing overlapping predictions for densely packed or occluded defects, which is common in solar panel imagery where multiple defects may appear close to each other, leading to more accurate and complete detection results.
1. Related Work
The application of deep learning in solar panel inspection has gained considerable traction. Early approaches often utilized image processing techniques to extract handcrafted features, which were then fed into classifiers. However, these methods lacked robustness to varying lighting conditions and defect types. The advent of Convolutional Neural Networks (CNNs) revolutionized the field. Many studies have explored using classification networks to judge the health of a solar panel or a cell. While effective for determining the presence of defects, they lack precise localization.
For precise defect localization and classification, object detection frameworks are essential. Two-stage detectors like Faster R-CNN have been applied, offering high accuracy but at the cost of slower inference speed, which is critical for online inspection. Consequently, one-stage detectors, particularly the YOLO family, have become more popular in industrial inspection scenarios. Several researchers have adapted YOLO models for solar panel defect detection. For instance, modified versions of YOLOv3 were used with K-means clustering to optimize anchor boxes for EL images. Subsequent work on YOLOv5 introduced lightweight modules like GhostNet and attention mechanisms to reduce parameters and enhance feature representation. Other improvements involved incorporating transformer modules for global context or designing novel neck structures for better feature fusion. Despite these advances, a common trade-off exists: enhancing small object detection or model lightness often comes at the expense of the other. Our work aims to simultaneously advance both fronts—improving small defect detection on solar panels while maintaining a lightweight and efficient model architecture suitable for practical deployment.
2. Methodology
The proposed methodology is built upon the YOLOv8n architecture, chosen for its favorable baseline in terms of speed and accuracy. We introduce three key modifications to address the specific challenges in solar panel defect detection.
2.1 Improved C2f Module with Multi-Scale Convolution (C2f-MS)
The original C2f module in YOLOv8 is designed for rich gradient flow information. However, its standard convolutions may not optimally capture the multi-scale nature of defects on a solar panel, ranging from large cracks to tiny soldering spots. To enhance multi-scale feature extraction efficiency, we propose the MSConv (Multi-Scale Convolution) block, inspired by the efficiency of depthwise separable convolutions and the multi-head design principle.
The MSConv block processes input features in a grouped and multi-scale manner. Let the input feature tensor be $\mathbf{X} \in \mathbb{R}^{C \times H \times W}$, where $C$ is the number of channels, $H$ is height, and $W$ is width. The block splits the input channels into four distinct groups: $G_1$, $G_2$, $G_3$, and $G_4$, each containing $C/4$ channels. Different convolutional operations are applied to each group:
$$ G_1: \text{Identity (No Op)}, \quad G_2: \text{Depthwise Conv 3×3}, \quad G_3: \text{Depthwise Conv 5×5}, \quad G_4: \text{Identity (No Op)} $$
After processing, the features from all groups are concatenated along the channel dimension: $\mathbf{X}’ = \text{Concat}(G_1, G_2, G_3, G_4)$. Finally, a pointwise convolution (1×1 Conv) is applied to fuse information across all channels and adjust the channel count if necessary: $\mathbf{Y} = \text{Conv}_{1\times1}(\mathbf{X}’)$.
This design offers two main advantages for solar panel inspection. First, the use of 3×3 and 5×5 kernels allows the model to capture features at different receptive fields simultaneously, which is crucial for defects of varying sizes on a single solar panel. Second, by keeping half of the channels (G1 and G4) untouched and using depthwise convolutions on others, it significantly reduces the computational cost and number of parameters compared to a standard convolution with a large kernel or multiple parallel standard convolutions. The MSConv block is then integrated into the C2f structure, replacing some of the standard convolution layers, to form the new C2f-MS module. We strategically replace the original C2f modules in the backbone where the channel dimension is greater than 512, as these deeper layers handle higher-level features where multi-scale context is most valuable.
2.2 Enhanced Loss Function with NWD
Small defect detection is a critical challenge in solar panel inspection. The standard Intersection over Union (IoU) and its variants (CIoU, DIoU) are widely used as regression losses and evaluation metrics. However, for very small objects, IoU has a significant drawback: it is highly sensitive to minor localization errors. A tiny deviation for a small target can cause the IoU score to drop drastically, leading to unstable training and poor convergence for those objects. This is problematic for detecting micro-cracks or small spot defects on a solar panel.
To mitigate this, we incorporate the Normalized Wasserstein Distance (NWD) as a complementary loss term. NWD models a bounding box as a 2D Gaussian distribution $\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ and measures the similarity between two boxes using the Wasserstein distance between their corresponding distributions. For a bounding box $R = (c_x, c_y, w, h)$, where $(c_x, c_y)$ is the center and $(w, h)$ are the width and height, its Gaussian parameters are set as:
$$ \boldsymbol{\mu} = \begin{bmatrix} c_x \\ c_y \end{bmatrix}, \quad \boldsymbol{\Sigma} = \begin{bmatrix} \frac{w^2}{4} & 0 \\ 0 & \frac{h^2}{4} \end{bmatrix} $$
The squared Wasserstein distance between two Gaussians $\mathcal{N}_a$ and $\mathcal{N}_b$ has a closed form:
$$ W_2^2(\mathcal{N}_a, \mathcal{N}_b) = \| \boldsymbol{\mu}_a – \boldsymbol{\mu}_b \|_2^2 + \| \boldsymbol{\Sigma}_a^{1/2} – \boldsymbol{\Sigma}_b^{1/2} \|_F^2 $$
where $\|\cdot\|_F$ is the Frobenius norm. The NWD metric is then defined as:
$$ \text{NWD}(\mathcal{N}_a, \mathcal{N}_b) = \exp \left( -\frac{\sqrt{W_2^2(\mathcal{N}_a, \mathcal{N}_b)}}{C} \right) $$
where $C$ is a constant related to the dataset. NWD provides a smooth measure of similarity that is more tolerant to small deviations for small objects. We construct a composite loss function $\mathcal{L}_{box}$ for bounding box regression:
$$ \mathcal{L}_{box} = \lambda_{ciou} \cdot \mathcal{L}_{CIoU} + \lambda_{nwd} \cdot (1 – \text{NWD}) $$
where $\lambda_{ciou}$ and $\lambda_{nwd}$ are balancing weights. This combined loss leverages the strengths of both metrics: CIoU provides strong supervision for general object sizes and aspect ratios, while NWD offers fine-grained sensitivity crucial for the small defects prevalent on solar panels, leading to more balanced and robust training.
2.3 Soft-NMS for Improved Post-Processing
Non-Maximum Suppression (NMS) is a standard post-processing step that removes duplicate detection boxes. Conventional NMS works by selecting the box with the highest confidence score and suppressing all other boxes whose IoU with it exceeds a threshold $\theta_{NMS}$. This greedy approach has a known flaw: in scenarios where objects are close together or partially overlapping, it can incorrectly suppress valid detections, leading to missed objects (lower recall). On a solar panel, multiple defects like a cluster of micro-cracks or adjacent cell fragments can often be in close proximity.
To address this, we replace the standard NMS with Soft-NMS. Instead of abruptly removing overlapping boxes, Soft-NMS decays their confidence scores in a continuous manner based on their overlap with the higher-scoring box. For a box $b_i$ with score $s_i$, if its IoU with the current highest-score box $M$ is high, its score is reduced. We use the Gaussian weighting function:
$$ s_i = s_i \cdot \exp \left( -\frac{\text{IoU}(M, b_i)^2}{\sigma} \right), \quad \forall b_i \notin \mathcal{D} $$
where $\sigma$ is a parameter controlling the decay rate. Boxes with lower overlap are penalized less. This soft suppression strategy ensures that genuinely separate but nearby defects on the solar panel are not prematurely eliminated, thereby increasing the detection recall for challenging cases without introducing excessive false positives.
3. Experiments and Analysis
3.1 Experimental Setup
Dataset: The experiments utilize a dataset of solar panel defect images. The original set was expanded through data augmentation techniques including flipping and rotation to create a robust dataset of 2400 images. These images are annotated with three common defect types found in solar panels: Scratch/Crack, Cell Fragment/Grid-line failure, and <em{dirt (1920="" (480="" a="" and="" dataset="" images)="" images).
Implementation Details: The model is implemented using PyTorch. Training is conducted from scratch without pre-trained weights to ensure a fair comparison of architectural changes. We use the AdamW optimizer with an initial learning rate of 0.01 and a momentum of 0.937. The input image size is fixed at 640×640 pixels, and the batch size is set to 32. The models are trained for 300 epochs. The IoU threshold for evaluation is set to 0.7.
Evaluation Metrics: We use standard object detection metrics to evaluate performance: mean Average Precision at an IoU threshold of 0.5 (mAP@0.5) and the average mAP over IoU thresholds from 0.5 to 0.95 with a step of 0.05 (mAP@[0.5:0.95]). The latter provides a stricter measure of localization accuracy. We also report the model’s parameter count (Params in Millions), computational complexity (GFLOPs), and inference speed (Frames Per Second, FPS) to assess efficiency.
3.2 Ablation Study
To validate the contribution of each proposed component, we conduct an ablation study on the solar panel defect dataset. The baseline is the standard YOLOv8n model. The results are systematically presented in the table below.
| C2f-MS | NWD Loss | Soft-NMS | Params (M) | GFLOPs | mAP@0.5 | mAP@[0.5:0.95] | FPS |
|---|---|---|---|---|---|---|---|
| ✗ | ✗ | ✗ | 3.0 | 8.2 | 87.0% | 45.7% | 76.3 |
| ✓ | ✗ | ✗ | 2.7 | 7.7 | 88.7% | 45.3% | 66.7 |
| ✗ | ✓ | ✗ | 3.0 | 8.2 | 88.0% | 45.5% | 76.3 |
| ✗ | ✗ | ✓ | 3.0 | 8.2 | 88.8% | 49.2% | 67.1 |
| ✓ | ✓ | ✗ | 2.7 | 7.7 | 89.0% | 46.1% | 64.9 |
| ✓ | ✓ | ✓ | 2.7 | 7.7 | 89.5% | 49.8% | 59.9 |
Analysis:
- C2f-MS Alone: Integrating the C2f-MS module reduces parameters by 10% and GFLOPs by 6.1%, demonstrating its efficiency. It also improves mAP@0.5 by 1.7%, showing enhanced feature representation for solar panel defects. The slight drop in mAP@[0.5:0.95] suggests a minor trade-off in precise localization for some objects, but the overall trade-off is positive.
- NWD Loss Alone: Adding the NWD term improves both mAP metrics, confirming its effectiveness in providing better supervision, particularly beneficial for the challenging small defects on solar panels.
- Soft-NMS Alone: This component yields the most significant gain in strict localization accuracy (mAP@[0.5:0.95] +3.5%), validating its role in recovering valid detections that would be suppressed by standard NMS, a common issue in cluttered solar panel images.
- Combined Model (Proposed): The full integration of all three components achieves the best performance. Compared to the baseline, the proposed model reduces parameters and computation while increasing mAP@0.5 by 2.5% and mAP@[0.5:0.95] by a substantial 4.1%. This proves that the improvements are complementary and collectively address key challenges in solar panel defect detection.
3.3 Comparative Study with State-of-the-Art
We compare our proposed YOLOv8-MNS model against several other prominent object detection models on the solar panel defect dataset. The comparison includes both accuracy and efficiency metrics, which are critical for practical industrial inspection of solar panels.
| Model | Params (M) | GFLOPs | mAP@0.5 | mAP@[0.5:0.95] |
|---|---|---|---|---|
| Literature Method [6] (Multi-scale+Attention) | 49.6 | 299.3 | 76.2% | 41.3% |
| YOLOv3 | 103.7 | 283.0 | 89.3% | 47.9% |
| YOLOv3-Tiny | 12.1 | 19.0 | 86.9% | 43.5% |
| YOLOv5s | 9.1 | 24.0 | 86.9% | 45.5% |
| YOLOv7 | 37.2 | 105.1 | 86.5% | 43.6% |
| YOLOv7-Tiny | 6.0 | 13.2 | 80.4% | 38.3% |
| Proposed YOLOv8-MNS | 2.7 | 7.7 | 89.5% | 49.8% |
Analysis: The proposed model demonstrates a superior balance of accuracy and efficiency. It achieves the highest mAP@0.5 (89.5%) and the highest mAP@[0.5:0.95] (49.8%) among all compared models, indicating its strong overall detection and precise localization capability for solar panel defects. Remarkably, it attains this leading accuracy while being the most lightweight model, with only 2.7M parameters and 7.7 GFLOPs. For instance, it has only 2.6% of the parameters of YOLOv3 and 2.6% of the GFLOPs of Literature Method [6]. Compared to other efficient models like YOLOv7-Tiny, our method offers a massive gain of over 9% in mAP@0.5 and over 11% in mAP@[0.5:0.95]. This exceptional performance-to-complexity ratio makes the proposed algorithm highly suitable for real-world deployment in solar panel inspection systems, where computational resources may be limited.
3.4 Visualization and Discussion
Qualitative results on sample validation images further illustrate the advantages of the proposed method. Compared to the baseline YOLOv8n, our model shows several improvements consistent with the quantitative analysis:
- Reduced False Positives: The enhanced multi-scale feature extraction from the C2f-MS module helps the model better understand contextual information on the solar panel, leading to fewer incorrect detections in complex backgrounds.
- Higher Confidence for True Positives: Defects that are correctly detected often have higher associated confidence scores, indicating more certain predictions.
- Resolved Duplicate Detections: The use of Soft-NMS effectively eliminates the issue of multiple overlapping bounding boxes being predicted for a single defect, resulting in cleaner and more accurate output.
- Better Small Defect Detection: Instances of small scratches or spots are more consistently detected, which can be attributed to the beneficial effect of the NWD loss during training.
These visual observations confirm that the proposed modifications work synergistically to produce a more reliable detector for the solar panel inspection task.
4. Conclusion
This paper presents an improved YOLOv8-based model, YOLOv8-MNS, designed specifically for automated defect detection in solar panels. We address two primary challenges in this domain: the need for high accuracy in detecting small and varied defects, and the requirement for a lightweight model suitable for industrial application. The core of our approach lies in three modifications: 1) The C2f-MS module, which enhances multi-scale feature learning while reducing parameters and computations; 2) A hybrid loss function combining CIOU and NWD to improve the learning signal for small objects; and 3) The replacement of standard NMS with Soft-NMS to better handle clustered defects.
Comprehensive experiments on a solar panel defect dataset demonstrate the effectiveness of our method. The proposed model achieves state-of-the-art detection accuracy (89.5% mAP@0.5 and 49.8% mAP@[0.5:0.95]) while maintaining a very efficient architecture with only 2.7M parameters and 7.7 GFLOPs. This represents a significant advancement over existing methods, offering a superior solution for accurate and efficient solar panel quality inspection. Future work will focus on expanding the defect categories, testing the model on a broader range of solar panel types and imaging conditions (including EL and infrared images), and further optimizing the architecture for deployment on embedded hardware to facilitate real-time, on-site inspection.
