Research on Solar Panel Extraction Method Based on Improved U-Net

Abstract

Against the backdrop of China’s dual carbon goals, the technology of solar photovoltaic panels (PV) is rapidly developing. The installation characteristics of PV are large area, irregular, and multi-scene, necessitating an efficient method for obtaining PV data to provide technical support for real-time detection of its type, location, quantity, and range. Due to the vast number of PV images and the lack of commonality in training areas, traditional supervised classification methods often suffer from low accuracy and efficiency. This paper proposes a deep learning model based on convolutional neural networks (CNNs) to identify and extract high-resolution remote sensing images for precise PV detection.

This research primarily involves:

Constructing a feature enhancement fusion network by integrating the Convolutional Block Attention Module (CBAM) and coordinate attention into the U-Net architecture.
Conducting experiments on publicly available PV datasets, demonstrating that the proposed method outperforms other models in terms of mean Intersection over Union (mIoU) and mean F1-score (mF1).
Applying the algorithm to the Panda Power Station in Datong, Shanxi Province, to monitor PV installations over multiple years.

Keywords: Solar Panels, Supervised Classification, U-Net Convolutional Neural Network, Semantic Segmentation, High-Resolution Remote Sensing Images

1. Introduction

In recent years, the Chinese government has prioritized renewable energy development to achieve its dual carbon goals. Solar photovoltaic panels (PV) have emerged as a significant contributor to clean energy production. However, the irregular installation patterns and vast areas covered by PV present challenges for effective monitoring and data acquisition. This study focuses on developing an efficient method for extracting PV information from high-resolution remote sensing images using an improved U-Net CNN model.

1.1 Background and Significance

Solar panels are installed in various scenarios, including rooftops, farmlands, and open areas, making it difficult to apply uniform monitoring methods. Traditional methods such as supervised classification struggle with low accuracy and high costs due to the lack of common training areas across different PV images. This study aims to address these issues by leveraging deep learning techniques, specifically an enhanced U-Net model, to achieve high-precision PV extraction.

1.2 Research Objectives

The primary objectives of this research are:

To construct a feature enhancement fusion network capable of precise PV extraction from high-resolution remote sensing images.
To evaluate the performance of the proposed method through comparisons with traditional classification methods and other deep learning models.
To apply the improved model to real-world scenarios, specifically the Panda Power Station, for long-term PV monitoring.

2. Literature Review

2.1 Remote Sensing and PV Extraction

Remote sensing has long been used for land cover classification and monitoring. Traditional methods rely on spectral signatures and spatial relationships to differentiate land covers, including PV installations. However, these methods struggle with complex scenes and irregular PV layouts (Otukei et al., 2009; Abbas et al., 2016).

2.2 Deep Learning for Semantic Segmentation

Recent advancements in deep learning, particularly CNNs, have revolutionized semantic segmentation tasks. Models such as U-Net (Ronneberger et al., 2015), DeepLabV3+ (Chen et al., 2018), and PSPNet (Zhao et al., 2017) have demonstrated remarkable performance in biomedical and natural image segmentation. However, their direct application to PV extraction remains unexplored.

3. Methodology

3.1 Improved U-Net Architecture

The proposed model builds upon the U-Net architecture by incorporating CBAM and coordinate attention mechanisms. CBAM enhances feature representations by applying channel and spatial attention, while coordinate attention focuses on precise position encoding.

3.1.1 U-Net Backbone

The original U-Net consists of an encoder-decoder structure with skip connections between corresponding layers. This allows the model to preserve spatial information during downsampling and upsampling.

3.1.2 CBAM Integration

CBAM is integrated into both the encoder and decoder of U-Net, applying channel and spatial attention sequentially. Channel attention recalibrates feature maps across channels, while spatial attention highlights salient regions.

3.1.3 Coordinate Attention

Coordinate attention is embedded within residual blocks to encode precise position information. It enhances the model’s ability to distinguish PV panels from similar-looking backgrounds.

3.2 Dataset Preparation

Publicly available PV datasets, including images from water surfaces, grasslands, saline-alkali lands, croplands, and shrubwoods, are used for training and validation. Data augmentation techniques such as rotation, noise addition, brightness variation, and Gaussian filtering are applied to expand the dataset.

4. Experiments and Results

4.1 Experimental Setup

The model is trained using an Adam optimizer with a learning rate of 0.001 and a batch size of 16. The loss function is cross-entropy, and evaluations are based on mIoU, mF1, precision, and recall.

4.2 Ablation Study

An ablation study is conducted to analyze the contribution of each component within the improved U-Net model. The results show that the integration of CBAM and coordinate attention significantly improves PV extraction performance.

Model Configuration	mIoU	mF1	Precision	Recall
Original U-Net	88.50%	91.37%	90.53%	94.71%
U-Net + Residual Blocks	89.01%	91.89%	90.92%	95.21%
U-Net + CBAM	90.62%	92.57%	91.32%	95.74%
Improved U-Net (Final)	92.67%	94.35%	92.47%	96.32%

4.3 Comparative Experiments

The improved U-Net is compared with traditional supervised classification methods and other semantic segmentation models (PSPNet, SegNet, DeepLabV3+, HRNet). The proposed model consistently outperforms others in terms of mIoU and mF1, demonstrating its effectiveness in PV extraction.

Model	Dataset	mIoU	mF1
Supervised Classification	WaterSurface	87.7%	90.8%
PSPNet	WaterSurface	89.7%	92.2%
SegNet	WaterSurface	90.8%	91.2%
DeepLabV3+	WaterSurface	90.2%	92.5%
HRNet	WaterSurface	91.2%	93.1%
Improved U-Net	WaterSurface	92.7%	94.4%

(Similar comparisons are conducted for other datasets and summarized in the appendix.)

4.4 Application to Panda Power Station

The improved U-Net model is applied to remote sensing images of the Panda Power Station in Datong, Shanxi Province, for multi-year PV change detection. The results reveal the expansion of PV installations over time, providing valuable insights for power station management.

5. Discussion

5.1 Model Performance

The proposed model’s integration of CBAM and coordinate attention mechanisms significantly enhances feature representation and position encoding, leading to improved PV extraction performance. The ablation study confirms the contribution of each component.

5.2 Comparison with Existing Methods

The comparative experiments demonstrate that the improved U-Net outperforms traditional supervised classification and other deep learning models in terms of accuracy and robustness. This highlights the model’s suitability for complex PV scenes.

5.3 Practical Applications

The successful application to the Panda Power Station showcases the model’s potential for real-world PV monitoring. Long-term tracking of PV installations can inform maintenance schedules, capacity planning, and environmental impact assessments.

6. Conclusion

This study presents an improved U-Net model for precise solar panel extraction from high-resolution remote sensing images. By integrating CBAM and coordinate attention mechanisms, the model achieves state-of-the-art performance, outperforming traditional and deep learning-based methods. The application to the Panda Power Station validates the model’s practical utility for PV monitoring.