The rapid global transition towards renewable energy has propelled photovoltaic (PV) power generation into a cornerstone of sustainable infrastructure. The proliferation of solar panels, encompassing vast utility-scale plants and countless distributed rooftop installations, necessitates efficient and accurate methods for resource inventory and operational health monitoring. Traditional manual inspection and basic image processing techniques are labor-intensive, suffer from subjective judgment, and struggle with complex backgrounds, variable lighting, and shadow occlusion, making them inadequate for large-scale, intelligent management.
To address these challenges, we propose an integrated, multi-scale framework for solar panel recognition and defect diagnosis, leveraging deep learning semantic segmentation. Our core contribution is an optimized DeepLabV3+ model, enhanced with a ResNet-50 backbone and a Convolutional Block Attention Module (CBAM). This framework is applied to both satellite imagery for macro-scale survey and unmanned aerial vehicle (UAV) imagery for micro-scale inspection, establishing a comprehensive pipeline from resource mapping to maintenance guidance.

Methodology: The Improved DeepLabV3+ Model
The standard DeepLabV3+ architecture is a powerful encoder-decoder network renowned for its Atrous Spatial Pyramid Pooling (ASPP) module, which captures multi-scale contextual information. However, its typical backbone, Xception, while effective, carries significant computational complexity. For improved efficiency and feature extraction tailored to solar panel characteristics, we implement key modifications.
First, we replace the Xception backbone with ResNet-50. The residual learning framework of ResNet-50 eases the training of deeper networks and mitigates the vanishing gradient problem. Its bottleneck structure provides a good balance between representational capacity and computational cost. Formally, a residual block can be defined as:
$$ \mathbf{y} = \mathcal{F}(\mathbf{x}, {W_i}) + \mathbf{x} $$
where $\mathbf{x}$ and $\mathbf{y}$ are the input and output vectors, and $\mathcal{F}(\mathbf{x}, {W_i})$ represents the residual mapping to be learned. This allows the network to focus on learning the finer discrepancies that define solar panel edges and textures against diverse backgrounds.
Second, we integrate the Convolutional Block Attention Module (CBAM) into the feature fusion process. CBAM sequentially infers attention maps along both the channel and spatial dimensions, allowing the model to emphasize informative features and suppress irrelevant ones. Given an intermediate feature map $\mathbf{F} \in \mathbb{R}^{C \times H \times W}$, the channel attention $\mathbf{M_c} \in \mathbb{R}^{C \times 1 \times 1}$ is computed as:
$$ \mathbf{M_c}(\mathbf{F}) = \sigma(\text{MLP}(\text{AvgPool}(\mathbf{F})) + \text{MLP}(\text{MaxPool}(\mathbf{F}))) $$
where $\sigma$ is the sigmoid activation. The spatial attention $\mathbf{M_s} \in \mathbb{R}^{1 \times H \times W}$ is then computed from the channel-refined features:
$$ \mathbf{M_s}(\mathbf{F’}) = \sigma( f^{7 \times 7}( [\text{AvgPool}(\mathbf{F’}); \text{MaxPool}(\mathbf{F’})] ) ) $$
Here, $f^{7 \times 7}$ denotes a convolution with a $7 \times 7$ filter. The final output is $\mathbf{F”} = \mathbf{M_s}(\mathbf{F’}) \otimes \mathbf{F’}$. This mechanism is particularly effective in highlighting the structured, rectangular shapes of solar panel arrays in satellite imagery and pinpointing localized defect hotspots in UAV thermal imagery.
The overall architecture of our improved model is summarized in the following table:
| Module | Component | Description & Purpose |
|---|---|---|
| Encoder | Backbone: ResNet-50 | Extracts hierarchical features. Replaces Xception for more efficient and stable training via residual connections. |
| Context Module | ASPP + CBAM Integration | ASPP captures multi-scale context. CBAM dynamically weights channels and spatial locations to focus on critical solar panel features. |
| Decoder | Low-Level Feature Fusion | Combines high-level semantic features from the encoder with detailed low-level features from the backbone to refine solar panel boundaries. |
Experimental Setup and Data Curation
Our study establishes a two-tiered data pipeline to evaluate the model’s performance at different scales.
1. Satellite Imagery for Macro-Scale Recognition
For large-area solar panel inventory, we utilized multi-source satellite imagery (e.g., GF-6, GF-7) covering diverse terrains including urban, suburban, and mountainous regions. A rigorous preprocessing pipeline involving radiometric correction, geometric registration, and data augmentation was applied. We created a pixel-level annotated dataset with the following categories:
| Category | Sub-classes | Description |
|---|---|---|
| Positive Sample (Solar Panel) | 单体组件 (Monomer), 阵列 (Array) | Individual panels and panel clusters. |
| Negative Sample (Background) | Vegetation, Building, Bare Land, Water, Road | Other land cover types to distinguish from. |
From this, 24,000 image tiles were generated and split into training and testing sets in an 8:2 ratio, ensuring the test set contained challenging scenarios like shadow occlusion and small-scale distributed solar panel installations.
2. UAV Imagery for Micro-Scale Defect Detection
For detailed inspection, we deployed a DJI Matrice 300 RTK UAV equipped with a Zenmuse H20T sensor to capture synchronized high-resolution visible and thermal imagery of selected solar panel arrays. Flight planning ensured consistent lighting and high overlap. The data was pre-processed (denoising, color correction, contrast stretching for thermal data) and annotated for defect diagnosis:
| Category | Defect Types |
|---|---|
| Positive Sample (Healthy) | Intact, functional solar panel. |
| Negative Sample (Defective) | Hot spot, Crack, Black edge, Scratch, No-power (shading). |
A dataset of 18,000 UAV image tiles was constructed, following the same 8:2 train-test split, with the test set containing difficult cases like minor cracks and low-contrast defects.
Evaluation Metrics
To quantitatively assess the model’s performance in both recognition and defect detection tasks, we employ a suite of standard metrics. Let $TP$, $TN$, $FP$, and $FN$ denote True Positives, True Negatives, False Positives, and False Negatives, respectively.
$$
\text{Overall Accuracy (OA)} = \frac{TP + TN}{TP + TN + FP + FN}
$$
$$
\text{Precision (P)} = \frac{TP}{TP + FP}
$$
$$
\text{Recall (R)} = \frac{TP}{TP + FN}
$$
$$
\text{F1 Score (F1)} = \frac{2 \times P \times R}{P + R}
$$
$$
\text{Intersection over Union (IoU)} = \frac{TP}{TP + FP + FN}
$$
Results and Analysis
1. Solar Panel Recognition from Satellite Imagery
We compared our improved DeepLabV3+ model against the original DeepLabV3+, U-Net, and Fast-SCNN models on the satellite imagery test set. The quantitative results clearly demonstrate the superiority of our approach.
| Model | OA (%) | Precision (%) | Recall (%) | F1 Score (%) | IoU (%) |
|---|---|---|---|---|---|
| Improved DeepLabV3+ (Ours) | 96.91 | 94.88 | 93.21 | 93.85 | 94.47 |
| DeepLabV3+ | 94.26 | 91.33 | 90.85 | 90.11 | 90.37 |
| U-Net | 91.12 | 89.53 | 89.04 | 88.75 | 88.82 |
| Fast-SCNN | 85.42 | 82.33 | 81.68 | 81.15 | 80.76 |
The improved model achieves the highest scores across all metrics. The integration of ResNet-50 provides more robust feature hierarchies, while the CBAM module effectively suppresses background noise (e.g., bare soil, roads with similar spectral responses) and focuses on the distinctive geometric and textural patterns of solar panel arrays. This leads to fewer false positives and more complete segmentation, especially for irregularly shaped or partially obscured solar panel clusters.
2. Solar Panel Defect Detection from UAV Imagery
Applying the improved DeepLabV3+ model to the UAV defect detection dataset yielded strong results, with performance varying by defect type due to their distinct visual and thermal signatures.
| Defect Type | OA (%) | Precision (%) | Recall (%) | F1 Score (%) | IoU (%) |
|---|---|---|---|---|---|
| Hot Spot | 95.77 | 93.68 | 93.33 | 92.89 | 92.52 |
| No-Power | 97.92 | 94.56 | 95.28 | 94.95 | 93.74 |
| Black Edge | 90.33 | 88.07 | 89.35 | 89.04 | 88.74 |
| Crack / Damage | 88.47 | 86.35 | 85.62 | 86.22 | 85.56 |
| Scratch | 84.32 | 81.45 | 81.97 | 81.17 | 80.58 |
| Overall Defects | 91.36 | 88.82 | 89.11 | 88.85 | 88.23 |
Analysis: Defects with strong thermal signatures (Hot Spot, No-Power) were detected with exceptional accuracy (F1 > 92%). The CBAM module proved crucial here, learning to weight the thermal channel heavily to identify abnormal temperature gradients characteristic of a faulty solar panel. Black Edge and Crack detection, relying more on visible-light texture discontinuities, also achieved good performance (F1 ~ 88-89%), though they were sometimes confused with dust accumulation or natural panel seams. As anticipated, Scratch detection was the most challenging, with the lowest metrics (F1 ~ 81%), due to the fine-scale, low-contrast nature of scratches which often resemble faint stains or reflections on the solar panel surface.
The overall defect detection accuracy of over 91% demonstrates the model’s high reliability for automated inspection. The fusion of visible and thermal data through a powerful attention-based network is key to this success, enabling the model to correlate visual artifacts with thermal anomalies for a more confident diagnosis of the solar panel‘s health state.
Conclusion
In this work, we developed and validated an improved DeepLabV3+ deep learning model for comprehensive solar panel management. By incorporating ResNet-50 and the CBAM attention mechanism, the model achieves superior performance in accurately segmenting solar panel arrays from complex satellite imagery, outperforming several benchmark models. Furthermore, we established a practical multi-scale framework that extends this model to the precise detection of operational defects using UAV-based visible and thermal imagery.
The system provides a significant efficiency leap over manual methods, enabling rapid, large-area resource inventory and routine, automated condition monitoring. The high accuracy in detecting critical defects like hot spots and power loss offers actionable intelligence for predictive maintenance, directly supporting the operational reliability and economic viability of PV power plants. Future work will focus on enhancing the detection of subtle defects like micro-scratches, potentially through higher-resolution sensors or specialized data augmentation techniques, and on integrating this pipeline into a real-time monitoring platform for the smart management of solar panel assets.
