Lightweight Contamination Detection for Solar Panel Based on Improved SSD Algorithm

Abstract
This study addresses the computational limitations of embedded devices in unmanned aerial vehicle (UAV)-based solar panel inspection systems by proposing a lightweight deep learning framework. Building upon the Single Shot MultiBox Detector (SSD) architecture, we integrate MobileNetV3 as the backbone network to reduce model complexity while enhancing feature extraction through a coordinate attention (CA) mechanism. A self-constructed dataset of solar panel defects—including bird droppings, dirt accumulation, electrical/physical damage, and snow coverage—is augmented using Mosaic techniques to improve generalization. Experimental results demonstrate a 68.9% reduction in computational load, a 4.3% increase in mean average precision (mAP), and real-time detection at 45.6 FPS. The proposed method effectively balances accuracy and efficiency, making it suitable for UAV deployment in large-scale solar farms.

Keywords: solar panel, object detection, lightweight model, deep learning, attention mechanism

1. Introduction

The global installed capacity of solar photovoltaic (PV) systems has exceeded 1.05 billion kW, with distributed solar farms accounting for a significant share. However, surface contamination—such as bird droppings, dust accumulation, cracks, and snow—reduces solar panel efficiency by 15–35% [1]. Traditional inspection methods like electrical characteristic analysis [2] and morphological image processing [3] face limitations in scalability and accuracy. Recent advances in deep learning enable automated defect detection, but existing models often prioritize accuracy over computational efficiency, hindering UAV-based deployment.

This work introduces a lightweight SSD variant optimized for solar panel inspection. Key innovations include:

MobileNetV3 backbone for parameter reduction
Coordinate attention mechanism for enhanced feature localization
Mosaic data augmentation for improved generalization

2. Methodology

2.1 Network Architecture

The baseline SSD framework is modified as follows:

Backbone Network:
MobileNetV3-Large replaces conventional CNN backbones through depthwise separable convolutions and inverted residual blocks. For an input tensor X∈RH×W×C, the bottleneck operation is:X′X′′Y=Conv1×1(X)(expansion)=DepthwiseConv3×3(X′)=Conv1×1(X′′)(compression)

This structure reduces parameters by 72% compared to ResNet50 while maintaining receptive field coverage.

Multi-Scale Detection:
Feature maps at scales {19×19, 10×10, 5×5, 3×3, 2×2, 1×1} enable detection of contaminants ranging from 0.1 m² (bird droppings) to full-panel defects.

2.2 Coordinate Attention Mechanism

The CA module encodes spatial relationships through axis-wise pooling:

Horizontal/Vertical Pooling:zh(h)=W10≤w<W∑x(h,w)zw(w)=H10≤h<H∑x(h,w)
Concatenation & Transformation:f=δ(F1([zh,zw]))where δ denotes ReLU activation and F1 is a 1×1 convolution.
Attention Weights:ghgwY=σ(Fh(fh))=σ(Fw(fw))=X⊗gh⊗gw

This mechanism improves mAP by 5.2% for small contaminants (<50 pixels).

3. Dataset & Augmentation

3.1 Solar Panel Defect Dataset

A custom dataset (6,412 images) contains six categories:

Class	Training	Validation	Test
Clean Panel	1,124	119	135
Bird Droppings	1,459	167	180
Dirt Accumulation	1,703	105	83
Electrical Damage	374	34	35
Physical Damage	297	24	33
Snow Coverage	1,179	189	172

3.2 Mosaic Augmentation

Four-image composites are generated through:

Random scaling (0.5–1.5×)
90° rotation
HSV color jittering (±20% saturation/value)
MixUp blending (λ = 0.2)

This increases effective training samples by 4.8× while simulating partial occlusions.

4. Experiments

4.1 Implementation Details

Hardware: NVIDIA RTX 3060 GPU
Training:
- Input size: 300×300
- Batch size: 32
- Optimizer: SGD (momentum=0.9)
- Learning rate: 0.01 (cosine decay)

4.2 Ablation Study

Configuration	mAP (%)	Accuracy (%)	Params (M)	GFLOPS
SSD + ResNet50	72.68	78.43	44.55	30.0
SSD + MobileNetV3	77.50	81.03	14.11	13.7
+ CA Attention	82.71	92.28	14.11	13.7
+ Mosaic Augmentation	84.01	94.20	14.11	13.7

The CA mechanism improves bird dropping detection by 11.25% due to enhanced spatial attention.

4.3 Comparative Analysis

Model	mAP (%)	FPS	Params (M)	Model Size (MB)
Faster R-CNN	80.39	3.2	191.39	157.5
YOLOv3	72.63	21.1	61.52	100.6
Original SSD	72.68	23.9	44.55	52.4
Ours	84.01	45.6	14.11	18.3

The proposed model achieves 94.2% accuracy on snow coverage detection—critical for winter inspections.

5. Mathematical Formulation

5.1 Loss Function

The multi-task loss combines localization (Lloc) and confidence (Lconf):L=N1(Lconf+αLloc)

where α=1 balances the terms, and N is matched default boxes.

Localization Loss:
Smooth L1 loss for predicted offsets (cx,cy,w,h):Lloc=i∈pos∑m∈{cx,cy,w,h}∑smoothL1(xim−x^im)

Confidence Loss:
Focal loss [4] addressing class imbalance:Lconf=−i∈pos∑(1−pi)γlog(pi)−i∈neg∑piγlog(1−pi)

with γ=2.

6. Field Deployment Considerations

For UAV integration:

Model Compression:
- Quantization: FP32 → INT8 (2.1× size reduction)
- Pruning: Remove 40% low-weight channels (1.5× speedup)
Power Consumption:
- Jetson Xavier NX: 9.8 W at 45 FPS
- Theoretical coverage: 12.5 km²/day (50m altitude)
Edge Computing:
On-device inference latency: 21.9 ms per frame

7. Conclusion

This work presents a lightweight detection framework for solar panel contamination that achieves 84.01% mAP at 45.6 FPS—superior to conventional models in both accuracy and efficiency. The integration of MobileNetV3, coordinate attention, and Mosaic augmentation enables reliable identification of sub-centimeter defects under varying illumination and occlusion. Future work will explore temporal analysis of sequential UAV images for contamination progression tracking.