Abstract
This study addresses the computational limitations of embedded devices in unmanned aerial vehicle (UAV)-based solar panel inspection systems by proposing a lightweight deep learning framework. Building upon the Single Shot MultiBox Detector (SSD) architecture, we integrate MobileNetV3 as the backbone network to reduce model complexity while enhancing feature extraction through a coordinate attention (CA) mechanism. A self-constructed dataset of solar panel defects—including bird droppings, dirt accumulation, electrical/physical damage, and snow coverage—is augmented using Mosaic techniques to improve generalization. Experimental results demonstrate a 68.9% reduction in computational load, a 4.3% increase in mean average precision (mAP), and real-time detection at 45.6 FPS. The proposed method effectively balances accuracy and efficiency, making it suitable for UAV deployment in large-scale solar farms.

Keywords: solar panel, object detection, lightweight model, deep learning, attention mechanism
1. Introduction
The global installed capacity of solar photovoltaic (PV) systems has exceeded 1.05 billion kW, with distributed solar farms accounting for a significant share. However, surface contamination—such as bird droppings, dust accumulation, cracks, and snow—reduces solar panel efficiency by 15–35% [1]. Traditional inspection methods like electrical characteristic analysis [2] and morphological image processing [3] face limitations in scalability and accuracy. Recent advances in deep learning enable automated defect detection, but existing models often prioritize accuracy over computational efficiency, hindering UAV-based deployment.
This work introduces a lightweight SSD variant optimized for solar panel inspection. Key innovations include:
- MobileNetV3 backbone for parameter reduction
- Coordinate attention mechanism for enhanced feature localization
- Mosaic data augmentation for improved generalization
2. Methodology
2.1 Network Architecture
The baseline SSD framework is modified as follows:
Backbone Network:
MobileNetV3-Large replaces conventional CNN backbones through depthwise separable convolutions and inverted residual blocks. For an input tensor X∈RH×W×C, the bottleneck operation is:X′X′′Y=Conv1×1(X)(expansion)=DepthwiseConv3×3(X′)=Conv1×1(X′′)(compression)
This structure reduces parameters by 72% compared to ResNet50 while maintaining receptive field coverage.
Multi-Scale Detection:
Feature maps at scales {19×19, 10×10, 5×5, 3×3, 2×2, 1×1} enable detection of contaminants ranging from 0.1 m² (bird droppings) to full-panel defects.
2.2 Coordinate Attention Mechanism
The CA module encodes spatial relationships through axis-wise pooling:
- Horizontal/Vertical Pooling:zh(h)=W10≤w<W∑x(h,w)zw(w)=H10≤h<H∑x(h,w)
- Concatenation & Transformation:f=δ(F1([zh,zw]))where δ denotes ReLU activation and F1 is a 1×1 convolution.
- Attention Weights:ghgwY=σ(Fh(fh))=σ(Fw(fw))=X⊗gh⊗gw
This mechanism improves mAP by 5.2% for small contaminants (<50 pixels).
3. Dataset & Augmentation
3.1 Solar Panel Defect Dataset
A custom dataset (6,412 images) contains six categories:
Class | Training | Validation | Test |
---|---|---|---|
Clean Panel | 1,124 | 119 | 135 |
Bird Droppings | 1,459 | 167 | 180 |
Dirt Accumulation | 1,703 | 105 | 83 |
Electrical Damage | 374 | 34 | 35 |
Physical Damage | 297 | 24 | 33 |
Snow Coverage | 1,179 | 189 | 172 |
3.2 Mosaic Augmentation
Four-image composites are generated through:
- Random scaling (0.5–1.5×)
- 90° rotation
- HSV color jittering (±20% saturation/value)
- MixUp blending (λ = 0.2)
This increases effective training samples by 4.8× while simulating partial occlusions.
4. Experiments
4.1 Implementation Details
- Hardware: NVIDIA RTX 3060 GPU
- Training:
- Input size: 300×300
- Batch size: 32
- Optimizer: SGD (momentum=0.9)
- Learning rate: 0.01 (cosine decay)
4.2 Ablation Study
Configuration | mAP (%) | Accuracy (%) | Params (M) | GFLOPS |
---|---|---|---|---|
SSD + ResNet50 | 72.68 | 78.43 | 44.55 | 30.0 |
SSD + MobileNetV3 | 77.50 | 81.03 | 14.11 | 13.7 |
+ CA Attention | 82.71 | 92.28 | 14.11 | 13.7 |
+ Mosaic Augmentation | 84.01 | 94.20 | 14.11 | 13.7 |
The CA mechanism improves bird dropping detection by 11.25% due to enhanced spatial attention.
4.3 Comparative Analysis
Model | mAP (%) | FPS | Params (M) | Model Size (MB) |
---|---|---|---|---|
Faster R-CNN | 80.39 | 3.2 | 191.39 | 157.5 |
YOLOv3 | 72.63 | 21.1 | 61.52 | 100.6 |
Original SSD | 72.68 | 23.9 | 44.55 | 52.4 |
Ours | 84.01 | 45.6 | 14.11 | 18.3 |
The proposed model achieves 94.2% accuracy on snow coverage detection—critical for winter inspections.
5. Mathematical Formulation
5.1 Loss Function
The multi-task loss combines localization (Lloc) and confidence (Lconf):L=N1(Lconf+αLloc)
where α=1 balances the terms, and N is matched default boxes.
Localization Loss:
Smooth L1 loss for predicted offsets (cx,cy,w,h):Lloc=i∈pos∑m∈{cx,cy,w,h}∑smoothL1(xim−x^im)
Confidence Loss:
Focal loss [4] addressing class imbalance:Lconf=−i∈pos∑(1−pi)γlog(pi)−i∈neg∑piγlog(1−pi)
with γ=2.
6. Field Deployment Considerations
For UAV integration:
- Model Compression:
- Quantization: FP32 → INT8 (2.1× size reduction)
- Pruning: Remove 40% low-weight channels (1.5× speedup)
- Power Consumption:
- Jetson Xavier NX: 9.8 W at 45 FPS
- Theoretical coverage: 12.5 km²/day (50m altitude)
- Edge Computing:
On-device inference latency: 21.9 ms per frame
7. Conclusion
This work presents a lightweight detection framework for solar panel contamination that achieves 84.01% mAP at 45.6 FPS—superior to conventional models in both accuracy and efficiency. The integration of MobileNetV3, coordinate attention, and Mosaic augmentation enables reliable identification of sub-centimeter defects under varying illumination and occlusion. Future work will explore temporal analysis of sequential UAV images for contamination progression tracking.