Semantic segmentation of infrared images plays a pivotal role in detecting defects and ensuring the operational efficiency of solar panels. Traditional methods often struggle with edge adhesion, background noise, and insufficient feature extraction in complex environments. To address these challenges, this study proposes an enhanced U-Net model tailored for solar panel infrared image segmentation. The model integrates three key innovations: white-edge preprocessing, VGG16-based encoder, and Res-CBAM attention mechanisms. Experimental results demonstrate a significant improvement in segmentation accuracy, achieving a mean Intersection over Union (mIoU) of 99.73% and an accuracy of 99.87%, outperforming state-of-the-art models such as DeepLabV3+ and HRNetV2.

1. Introduction
The global transition toward renewable energy has accelerated the adoption of solar panels. However, surface defects, such as cracks or hotspots, reduce energy conversion efficiency and are often visible as high-intensity regions in infrared (IR) images. Accurate segmentation of solar panels from IR imagery is critical for automated fault detection systems. Conventional segmentation techniques, including region growing and K-means clustering, lack robustness against noise and complex backgrounds. Deep learning models, particularly U-Net, have shown promise in medical imaging but face limitations in large-scale solar panel applications due to information loss and inadequate edge detection.
This work introduces an optimized U-Net architecture that addresses these limitations through:
- White-edge preprocessing to enhance boundary features.
- VGG16 encoder for hierarchical semantic extraction.
- Res-CBAM attention modules to suppress noise and prioritize critical regions.
2. Methodology
2.1 White-Edge Preprocessing
The irregular shapes and low contrast of solar panels in IR images necessitate robust preprocessing. The white-edge technique highlights panel boundaries by assigning pixel values as follows:
- Solar panel region: Pixel value = 1.
- Background: Pixel value = 0.
- Edges: Pixel value = 255.
Mathematically, the processed image I′I′ is generated by:I′=I×(1−M)+M×255I′=I×(1−M)+M×255
where II is the original image, and MM is a binary mask derived from annotated contours. This step amplifies edge features, improving the model’s ability to distinguish adjacent solar panels (Figure 1).
2.2 VGG16 Encoder
The original U-Net encoder is replaced with the first 13 convolutional layers of VGG16. This modification enhances shallow feature extraction while maintaining computational efficiency. For an input image of size 512×512×3512×512×3, the encoder generates multi-scale feature maps:Block 1: 512×512×64Block 2: 256×256×128Block 3: 128×128×256Block 4: 64×64×512Block 5: 32×32×512Block 1: 512×512×64Block 2: 256×256×128Block 3: 128×128×256Block 4: 64×64×512Block 5: 32×32×512
These features are propagated to the decoder via skip connections, preserving spatial details critical for solar panel segmentation.
2.3 Res-CBAM Attention Mechanism
The Convolutional Block Attention Module (CBAM) combines channel and spatial attention to refine feature maps. Let F∈RH×W×CF∈RH×W×C denote an input feature map.
Channel Attention:Mc(F)=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))F′=Mc(F)⊗FMc(F)F′=σ(MLP(AvgPool(F))+MLP(MaxPool(F)))=Mc(F)⊗F
where σσ is the sigmoid function, and ⊗⊗ denotes element-wise multiplication.
Spatial Attention:Ms(F′)=σ(f7×7([AvgPool(F′);MaxPool(F′)]))F′′=Ms(F′)⊗F′Ms(F′)F′′=σ(f7×7([AvgPool(F′);MaxPool(F′)]))=Ms(F′)⊗F′
where f7×7f7×7 is a 7×7 convolutional layer.
Res-CBAM integrates residual connections to mitigate gradient vanishing:O=F+F′′O=F+F′′
This module enhances the model’s focus on solar panel edges while suppressing irrelevant background features.
3. Experimental Setup
3.1 Dataset and Training
A dataset of 2,000 IR images of solar panels was collected using an HT20 thermal camera mounted on a DJI M300RTK drone. The images, spanning residential and industrial installations, were split into training (80%), validation (10%), and testing (10%) sets.
Training Parameters:
- Framework: PyTorch 2.3.0
- Hardware: NVIDIA RTX 3080 Ti
- Input size: 512×512512×512
- Batch size: 16
- Optimizer: Adam (LR = 1×10−41×10−4)
- Loss: Binary cross-entropy
3.2 Evaluation Metrics
Performance was measured using:
- mIoU:
mIoU=1K∑i=1KTPiTPi+FPi+FNimIoU=K1i=1∑KTPi+FPi+FNiTPi
- Accuracy:
Accuracy=TP+TNTP+TN+FP+FNAccuracy=TP+TN+FP+FNTP+TN
where K=2K=2 (solar panel vs. background).
4. Results and Analysis
4.1 Comparative Performance
The proposed model outperformed existing architectures (Table 1):
Model | mIoU | Accuracy |
---|---|---|
DeepLabV3+ | 97.32% | 98.17% |
PSP-Net | 96.28% | 97.65% |
U-Net | 97.72% | 98.37% |
Res-U-Net | 97.93% | 97.93% |
HRNetV2 | 97.58% | 97.96% |
Proposed | 99.73% | 99.87% |
Key advantages include:
- Edge clarity: Reduced adhesion between adjacent solar panels.
- Noise resilience: Suppressed background interference from buildings or vegetation.
- Detail preservation: Avoided over-smoothing in low-light conditions.
4.2 Ablation Study
Ablation tests confirmed the contribution of each component (Table 2):
Configuration | mIoU | Accuracy |
---|---|---|
Baseline U-Net | 97.72% | 98.37% |
+ White-edge | 99.41% | 99.71% |
+ VGG16 | 99.22% | 99.62% |
+ Res-CBAM | 98.97% | 99.47% |
Full Model | 99.73% | 99.87% |
The synergistic effect of all modules maximized performance, particularly in complex scenes.
4.3 Attention Mechanism Comparison
Res-CBAM outperformed other attention strategies (Table 3):
Attention | mIoU | Accuracy |
---|---|---|
CA | 98.74% | 99.39% |
ECA | 98.79% | 99.41% |
CBAM | 98.72% | 99.37% |
Res-CBAM | 98.97% | 99.49% |
5. Conclusion
This study presents an enhanced U-Net model for semantic segmentation of solar panel IR images. By integrating white-edge preprocessing, a VGG16 encoder, and Res-CBAM attention, the model achieves state-of-the-art accuracy in detecting panel boundaries and suppressing noise. The improvements are validated through rigorous comparisons and ablation studies, demonstrating superior performance in diverse environmental conditions. Future work will explore real-time deployment and scalability to larger solar farms.