Abstract
To address the challenge of balancing speed and accuracy in solar panel defect detection, we propose a lightweight network, LPV-YOLO, based on YOLOv5s. This model integrates Ghost modules and Mish activation functions to reduce computational complexity while maintaining high precision. Key innovations include the GhostMConv and C3MGhost modules for backbone lightweighting, a SimAM-enhanced spatial pyramid pooling (MSSPPF) for multi-scale feature fusion, and SE channel attention in the neck network to enhance defect sensitivity. Experimental results demonstrate that LPV-YOLO reduces parameters by 49%, model size by 46%, and computations by 50%, achieving 93.8% mAP at 70.42 FPS. Compared to YOLOv7, SSD300, and RetinaNet, LPV-YOLO offers superior accuracy with minimal resource demands, making it ideal for deployment on mobile devices.

1. Introduction
Solar energy is a critical renewable resource, and solar panels form the core of photovoltaic systems. However, manufacturing defects—such as cracks, hotspots, scratches, black edges, and dead cells—severely impact panel efficiency. Traditional manual inspection is labor-intensive, error-prone, and costly. Deep learning-based methods, particularly YOLOv5s, have shown promise but suffer from high computational costs and parameter redundancy. To address these limitations, we propose LPV-YOLO, a lightweight network optimized for real-time solar panel defect detection.
2. Related Work
Existing approaches for solar panel defect detection include ResNet-based feature fusion models [12], Leaky ReLU-enhanced coordinate attention [13], and deformable convolutions [16]. While these methods improve accuracy, they often compromise speed or require excessive resources. Lightweight networks like MobileNetV2 [15] reduce complexity but lack precision. Our work bridges this gap by integrating lightweight modules with attention mechanisms tailored for solar panel defects.
3. Methodology
3.1 Network Architecture
LPV-YOLO retains YOLOv5s’s four-stage structure (input, backbone, neck, head) but introduces three key improvements:
- Lightweight Backbone: Replaces standard convolutions with GhostMConv and C3MGhost modules.
- Attention Pyramid Pooling (MSSPPF): Combines SimAM attention with spatial pyramid pooling.
- SE Channel Attention: Enhances feature channel interactions in the neck.
3.2 GhostMConv and C3MGhost Modules
The Ghost module splits convolution into two steps: generating intrinsic features via standard convolution and deriving redundant features through cheap linear operations. For an input feature map with cc channels, the computational cost is reduced by a factor ss (linear operations per channel):Ghost Cost=ns⋅h′⋅w′⋅c⋅k2+(s−1)⋅ns⋅h′⋅w′⋅d2Ghost Cost=sn⋅h′⋅w′⋅c⋅k2+(s−1)⋅sn⋅h′⋅w′⋅d2
Compared to standard convolution (n⋅h′⋅w′⋅c⋅k2n⋅h′⋅w′⋅c⋅k2), this reduces parameters by approximately s×s×. The Mish activation function further enhances gradient flow:fMish(x)=x⋅tanh(ln(1+ex))fMish(x)=x⋅tanh(ln(1+ex))
3.3 SimAM-Attention Spatial Pyramid Pooling (MSSPPF)
The MSSPPF module replaces YOLOv5s’s SPP with serial pooling layers and SimAM attention. SimAM evaluates neuron importance via an energy function:et=(xt−μ)2σ2+λet=σ2+λ(xt−μ)2
where μμ and σ2σ2 are the mean and variance of neuron values. Lower energy indicates higher importance. The output is weighted using a sigmoid function:X^=sigmoid(1E)⊗XX^=sigmoid(E1)⊗X
3.4 SE Channel Attention
Squeeze-and-Excitation (SE) blocks dynamically recalibrate channel-wise features. For a feature map X∈RC×H×WX∈RC×H×W, the SE operation computes channel weights wcwc:wc=σ(W2⋅δ(W1⋅GAP(X)))wc=σ(W2⋅δ(W1⋅GAP(X)))
where GAPGAP is global average pooling, W1W1 and W2W2 are fully connected layers, and δδ denotes ReLU.
4. Dataset and Augmentation
We use the PV-Multi-Defect dataset containing 1,107 images (600×600 pixels) annotated with five defect types. To address label inconsistencies and class imbalance, we:
- Relabeled 4,631 defects (originally 4,235).
- Augmented the dataset to 4,463 images using CycleGAN [23], which employs cyclic consistency loss:
Lcycle=Ex[∥F(G(x))−x∥1]+Ey[∥G(F(y))−y∥1]Lcycle=Ex[∥F(G(x))−x∥1]+Ey[∥G(F(y))−y∥1]
5. Experiments
5.1 Implementation Details
Training was conducted on an NVIDIA RTX 2080Ti with:
- Optimizer: SGD (lr=0.001lr=0.001, momentum=0.937).
- Epochs: 300.
- Batch Size: 16.
- Input Resolution: 640×640.
5.2 Evaluation Metrics
- mAP: Mean average precision at IoU=0.5.
- Parameters/FLOPs: Model complexity.
- FPS: Inference speed.
5.3 Results
Ablation Study
Table 1 shows the impact of each module. Adding GhostMConv (M-Ghost) reduces parameters by 49% but drops mAP by 2.3%. MSSPPF and SE recover 1.7% mAP with minimal computational overhead.
Model | mAP (%) | Params (M) | FLOPs (G) | FPS |
---|---|---|---|---|
YOLOv5s (Baseline) | 94.4 | 7.23 | 16.5 | 91.74 |
+ M-Ghost | 92.1 | 3.70 | 8.2 | 67.11 |
+ M-Ghost + MSSPPF | 93.3 | 3.70 | 8.2 | 65.52 |
LPV-YOLO (Full) | 93.8 | 3.71 | 8.3 | 70.42 |
Comparison with State-of-the-Art
LPV-YOLO outperforms YOLOv7, SSD300, and RetinaNet in accuracy while using fewer resources (Table 2).
Model | mAP (%) | Params (M) | FLOPs (G) | FPS |
---|---|---|---|---|
YOLOv7 | 88.0 | 9.33 | 26.7 | 107.5 |
SSD300 | 77.7 | 34.30 | 51.6 | 71.0 |
RetinaNet | 72.2 | 41.90 | 212.0 | 42.9 |
LPV-YOLO | 93.8 | 3.71 | 8.3 | 70.4 |
Defect-Specific Performance
LPV-YOLO achieves 99.1% mAP on dead cells and 97.1% on black edges, demonstrating robustness across defect types (Table 3).
Defect | Precision (%) | Recall (%) | mAP@0.5 (%) |
---|---|---|---|
Crack | 88.7 | 89.7 | 93.2 |
Hotspot | 87.6 | 87.8 | 92.5 |
Black Edge | 91.1 | 85.5 | 97.1 |
Scratch | 80.1 | 85.7 | 87.0 |
Dead Cell | 96.0 | 96.9 | 99.1 |
6. Conclusion
LPV-YOLO effectively balances speed and accuracy for solar panel defect detection. By integrating lightweight Ghost modules, SimAM-enhanced pooling, and SE attention, the model reduces parameters by 49% while maintaining 93.8% mAP. Future work will explore edge deployment and multi-modal data fusion for industrial applications.