Solar Panel Contour Extraction and Localization via Computer Vision

In the context of global energy scarcity and the pursuit of pollution-free alternatives, renewable clean energy sources have become a focal point of attention. Among various clean energy options, photovoltaic power generation stands as a key direction for national development. Solar power plants are typically constructed in open areas with ample sunlight, which often face arid climates and significant wind-blown dust. The accumulation of dust and sand on solar panel surfaces can substantially reduce the efficiency of photovoltaic power generation. To address this, we propose leveraging computer vision techniques to identify and localize solar panels, enabling automated cleaning via robotic arms mounted on vehicles. This approach aims to enhance cleaning efficiency, with the core challenge lying in the accurate localization of solar panel positions.

Traditionally, methods for extracting solar panel contours have relied on conventional image processing techniques, which primarily detect edges based on pixel intensity differences. However, these methods often suffer from robustness issues due to external factors like lighting variations and reflections. More recently, deep learning-based approaches, such as semantic segmentation and object detection, have shown promise but face limitations in terms of speed, accuracy, or generalization. In this work, we introduce an integrated system that combines an improved dust detector, an enhanced contour extractor, and a depth measurement model to precisely locate solar panels and assess their orientation. Our contributions include modifications to neural network architectures to boost performance and a novel depth measurement model based on monocular camera projection principles.

The overall framework begins with a dust detector that classifies whether solar panels require cleaning. For panels identified as dusty, a contour extractor segments the outer boundaries, and a depth measurement model computes the distance and tilt angle relative to the camera. This information can then be transmitted to a robotic arm for automated cleaning. The following sections detail each component, including architectural improvements, experimental validation, and comparative analysis. We emphasize the practical applications in automated production lines, where efficiency and accuracy are paramount for maintaining solar panel performance.

Dust Detector: Enhanced Classification with Attention Mechanisms

To determine whether solar panels need cleaning, we developed a dust detector based on a modified ResNet50 network. The original ResNet50 comprises 49 convolutional layers and one fully connected layer, organized into five stages. Our modifications focus on two aspects: the initial convolutional layer and the incorporation of attention mechanisms. First, we replaced the standard 7×7 convolution in the Conv1 stage with a combination of 3×3 and 5×5 convolutions. This change reduces parameters while maintaining the receptive field, leading to deeper network layers and improved feature extraction. The structure can be represented as:

$$ \text{Input} \rightarrow [3\times3\ \text{Conv}] \rightarrow [5\times5\ \text{Conv}] \rightarrow \text{ReLU} \rightarrow \text{MaxPool} $$

Second, we integrated a Convolutional Block Attention Module (CBAM), which sequentially applies channel and spatial attention. The channel attention mechanism computes weights by aggregating global average and max pooling, followed by a shared multilayer perceptron. The spatial attention mechanism combines average and max pooling across channels and applies a convolutional layer. Formally, for an input feature map $ F $, the channel attention $ M_c $ and spatial attention $ M_s $ are computed as:

$$ M_c(F) = \sigma(\text{MLP}(\text{AvgPool}(F)) + \text{MLP}(\text{MaxPool}(F))) $$

$$ M_s(F) = \sigma(f^{7\times7}([\text{AvgPool}(F); \text{MaxPool}(F)])) $$

where $ \sigma $ denotes the sigmoid function, and $ f^{7\times7} $ is a 7×7 convolution. The final output is $ F’ = M_s(M_c(F) \otimes F) \otimes (M_c(F) \otimes F) $, where $ \otimes $ denotes element-wise multiplication. This attention mechanism allows the model to focus on relevant features, such as dust patterns on solar panels, enhancing classification accuracy.

We trained the dust detector on a dataset of 1000 images, augmented through random scaling, cropping, rotation, and contrast adjustments to simulate real-world conditions. The dataset was split into training, validation, and test sets in an 8:1:1 ratio. Experimental results show that our modifications improved accuracy by 3.68% and inference speed by 42.95% compared to the original ResNet50. Table 1 summarizes the performance comparison with other backbone networks.

Table 1: Performance Comparison of Different Backbone Networks for Dust Detection
Backbone Network	Classification Accuracy (%)	Inference Speed (FPS)
VGG-16	89.2	39.43
Original ResNeXt50	93.4	55.74
ConvNeXt	95.4	49.52
Our Improved ResNet50	95.7	79.86

The higher accuracy and speed make our dust detector suitable for real-time applications in solar panel maintenance. By quickly identifying dusty solar panels, the system can prioritize cleaning tasks, optimizing resource allocation.

Contour Extractor: Lightweight Semantic Segmentation with VoVNet and GSConv

For solar panels flagged as dusty, we employ a contour extractor based on an enhanced DeepLabV3+ architecture. Semantic segmentation assigns a label to each pixel, allowing precise extraction of the outer boundaries of solar panels. Our modifications aim to reduce computational complexity while maintaining high accuracy. First, we replace the standard Xception backbone with VoVNet27-slim, a lightweight network that aggregates features through One-Shot Aggregation (OSA) modules. The OSA module concatenates features from multiple convolutional layers with varying receptive fields, capturing rich contextual information. The structure of VoVNet27-slim is detailed in Table 2.

Table 2: Structure of Modified VoVNet27-slim Backbone
Stage	Output Stride	Layers
Stage 1	2	GSConv, GSConv, GSConv
Stage 2 (OSA)	4	GSConv (64 channels, repeated 5 times), Concat & 1×1 Conv (128 channels)
Stage 3 (OSA)	8	GSConv (80 channels, repeated 5 times), Concat & 1×1 Conv (256 channels)
Stage 4 (OSA)	16	GSConv (96 channels, repeated 5 times), Concat & 1×1 Conv (384 channels)

Notably, we substitute all 3×3 convolutions in VoVNet27-slim with GSConv (Group Shuffle Convolution). GSConv combines standard convolution, depthwise convolution, and channel shuffling to reduce parameters. It can be expressed as:

$$ \text{GSConv}(X) = \text{Shuffle}(\text{Conv}_{1\times1}(X) \oplus \text{DWConv}_{5\times5}(\text{Conv}_{1\times1}(X))) $$

where $ \oplus $ denotes element-wise addition, and Shuffle reorganizes channels to enhance information flow. This design lowers model complexity without sacrificing performance.

In the Decoder of DeepLabV3+, we replace the standard 3×3 convolution with an MBConv (Mobile Inverted Bottleneck Convolution) block after feature fusion from the Encoder. MBConv uses depthwise separable convolutions and squeeze-and-excitation layers, further reducing parameters. The process is:

$$ X’ = \text{Conv}_{1\times1}(\text{ReLU}(\text{DWConv}(\text{Conv}_{1\times1}(X)))) + X $$

where DWConv denotes depthwise convolution. This modification accelerates inference while preserving accuracy.

We trained the contour extractor on a dataset of 1000 images, augmented to 6400 samples via transformations mimicking various angles and lighting conditions. Evaluation metrics include mean Intersection over Union (mIoU) and frames per second (FPS). As shown in Table 3, our improved model achieves a 99.2% mIoU with only 2.025 million parameters, outperforming classical segmentation models in speed and efficiency.

Table 3: Comparison of Segmentation Models for Solar Panel Contour Extraction
Model	Backbone	Inference Speed (FPS)	Parameters (Millions)	mIoU (%)
FCN-16	VGG-16	184.50	18.644	97.1
U-Net	Custom	152.65	7.783	95.3
Original DeepLabV3+	Xception	78.94	37.867	99.1
Our Improved DeepLabV3+	VoVNet27-slim	224.22	2.025	99.2

The contour extractor robustly handles diverse scenarios, such as partial occlusions or reflections on solar panels. By accurately segmenting boundaries, it provides a foundation for subsequent depth measurement.

Depth Measurement Model: Monocular Camera Projection for Distance and Orientation

Once the contour of a solar panel is extracted, we compute its distance and orientation relative to the camera using a depth measurement model based on monocular camera projection. This model assumes that solar panels are approximately planar and arranged in parallel arrays. We consider only the pitch angle between the camera plane and the solar panel plane, as roll and yaw angles are minimal in typical installations. The model is illustrated in Figure 8 of the original text, but here we describe it mathematically.

Let $ O-XYZ $ be the camera coordinate system, and $ O-xy $ be the image coordinate system. The solar panel’s contour in the image is represented by four corner points $ Q_1, Q_2, Q_3, Q_4 $ in pixel coordinates. Using camera intrinsic matrix $ K $, we convert these to normalized image coordinates. The actual solar panel has physical length $ L $ and width $ D $. In camera coordinates, the corresponding points are $ P_1, P_2, P_3, P_4 $.

Based on perspective projection, the distance from the camera center to the upper edge $ Z_{c1} $ and lower edge $ Z_{c2} $ of the solar panel can be derived using similar triangles:

$$ \frac{l_{Q_3,Q_4}}{L} = \frac{f}{Z_{c1}} \Rightarrow Z_{c1} = \frac{fL}{l_{Q_3,Q_4}} $$

$$ \frac{l_{Q_1,Q_2}}{L} = \frac{f}{Z_{c2}} \Rightarrow Z_{c2} = \frac{fL}{l_{Q_1,Q_2}} $$

where $ l_{Q_i,Q_j} $ is the pixel distance between points $ Q_i $ and $ Q_j $, and $ f $ is the focal length. The tilt angle $ \theta $ of the solar panel relative to the horizontal plane is then:

$$ \theta = \arccos\left(\frac{Z_{c2} – Z_{c1}}{D}\right) $$

Prior to these calculations, we preprocess the segmented contour by detecting the largest connected component and performing corner detection to identify the four vertices. This ensures robustness against noise or false positives.

We validated the depth measurement model using a monocular camera (DxK33GX264e) in a controlled environment. Solar panels were placed at known distances ranging from 3500 mm to 4100 mm. Table 4 compares our triangulation method with an area mapping method from literature.

Table 4: Error Analysis of Depth Measurement Methods for Solar Panels
Test Case	Actual Distance (mm)	Triangulation Method	Area Mapping Method
		Measured Distance (mm)	Error (mm)	Measured Distance (mm)	Error (mm)
1	3543	3555	12	3572	29
2	3845	3806	39	3876	31
3	4047	4002	45	3981	66
4	3585	3541	44	3524	61
5	3501	3487	14	3514	13
6	3775	3764	11	3784	9

Our triangulation method achieves a maximum error of 45 mm, which is lower than the minimum error of the area mapping method in most cases, demonstrating higher accuracy for solar panel localization. This precision is crucial for robotic arms to approach and clean solar panels effectively.

Experimental Setup and Comprehensive Results

We conducted extensive experiments to evaluate the entire pipeline. The platform comprised an AMD EPYC 7543 CPU, NVIDIA GTX 3090 GPU, 100 GB RAM, CUDA 11.3, and PyTorch 1.10.0. Datasets for both dust detection and contour extraction were carefully curated to include variations in dust accumulation, lighting, and solar panel orientations. Augmentation techniques like random scaling, cropping, rotation, and contrast adjustment were applied to enhance generalization.

For training, we used a batch size of 8 over 50 epochs, with an initial learning rate of $10^{-3}$, minimized to $10^{-5}$ via cosine annealing. The optimizer was SGD. The dust detector achieved 95.7% accuracy at 79.86 FPS, while the contour extractor reached 99.2% mIoU at 224.22 FPS. The depth measurement model yielded an average distance error of less than 30 mm across multiple tests.

To further analyze performance, we compared our contour extractor with other state-of-the-art segmentation networks on solar panel imagery. As shown in Table 5, our model balances speed and accuracy, making it ideal for real-time applications.

Table 5: Detailed Comparison of Segmentation Models on Solar Panel Datasets
Model	Backbone	Precision (%)	Recall (%)	F1-Score (%)	Inference Time (ms)
FCN-16	VGG-16	96.5	97.0	96.7	5.42
U-Net	Custom	94.8	95.1	94.9	6.55
DeepLabV3+ (Original)	Xception	98.9	99.0	98.9	12.67
Our Model	VoVNet27-slim	99.1	99.3	99.2	4.46

The high F1-score and low inference time underscore the effectiveness of our architectural choices. Additionally, we tested the system under varying environmental conditions, such as cloudy days or partial shadows, and observed consistent performance for solar panel detection and contour extraction. This robustness is essential for deployment in real-world solar farms.

Discussion and Future Work

Our integrated approach demonstrates significant improvements in accuracy and speed for solar panel contour extraction and localization. The dust detector, with its attention mechanisms, reliably identifies panels needing cleaning. The contour extractor, leveraging lightweight convolutions, precisely segments boundaries even in complex scenes. The depth measurement model, based on projective geometry, provides accurate distance and orientation data. Together, these components enable automated robotic cleaning, reducing labor costs and increasing efficiency for solar panel maintenance.

However, some limitations remain. The depth measurement model assumes planar solar panels and minimal rotational angles; in practice, panels may have curved surfaces or significant tilts beyond pitch. Future work could incorporate more sophisticated geometric models or use depth cameras for enhanced accuracy. Additionally, the system’s performance under extreme weather conditions, such as heavy rain or snow, requires further testing. We plan to expand the dataset to include such scenarios and explore adaptive algorithms that adjust to environmental changes.

Another direction is to integrate real-time tracking for moving robotic arms, ensuring continuous alignment with solar panels during cleaning. This could involve combining visual odometry with our contour extraction for dynamic pose estimation. Moreover, the principles developed here could be extended to other renewable energy infrastructures, such as wind turbine blade inspection or solar thermal collector maintenance.

Conclusion

In this work, we presented a comprehensive computer vision system for solar panel contour extraction and localization. By enhancing neural network architectures with attention mechanisms, lightweight convolutions, and optimized decoders, we achieved high performance in dust detection and semantic segmentation. Our depth measurement model, based on monocular camera projection, accurately computes distances and tilt angles, facilitating automated robotic cleaning. Experimental results validate the superiority of our methods over existing approaches, highlighting their practical value in automated production and maintenance of solar energy systems. As the demand for clean energy grows, such technologies will play a pivotal role in ensuring the efficiency and longevity of solar power installations.