Identification and Detection of Solar Panel Occlusion Problems Based on Convolutional Neural Network

Abstract: With the increasing demand for renewable energy, solar photovoltaic power generation has become an important energy source. However, the occlusion problems of solar panels, such as dust, bird droppings, and shadows, seriously affect the power generation efficiency. This paper focuses on the identification and detection of these occlusion problems. Firstly, it reviews the research progress of solar panel occlusion problems and convolutional neural networks. Then, it proposes improved models for dust source identification, dust accumulation degree recognition, and bird droppings and shadow detection. Experimental results show that these models have good performance in accuracy, parameter quantity, and computational complexity. Finally, the paper summarizes the research work and looks forward to future research directions.

1. Introduction

1.1 Research Background and Significance

In the context of “carbon peaking and carbon neutrality”, new energy – based power systems will play a more prominent role in the future. Solar photovoltaic energy, as an important part of renewable energy, can effectively alleviate the shortage of traditional energy and protect the ecological environment. However, the power generation efficiency of solar panels is easily affected by many factors, such as solar radiation intensity, panel tilt angle, and surface occlusions like dust, bird droppings, and shadows.

The shape, particle size, and composition of dust particles on the surface of solar panels vary in different regions. Research shows that these dust particles can directly affect the absorption of solar radiation by the panels, reducing their performance and causing energy losses in power stations. Therefore, studying the source of dust on the panel surface is of great significance for optimizing the location of power stations and resource allocation.

In northwestern regions of China, a large number of solar power stations have been built due to rich solar and land resources. However, the dry and windy climate there leads to severe dust accumulation on panel surfaces. Excessive dust accumulation significantly impacts the photoelectric conversion efficiency. Currently, the judgment of dust accumulation degree mainly relies on the experience of operation and maintenance personnel, which has problems such as low accuracy and poor real – time performance. Therefore, intelligent assessment of dust accumulation degree is crucial for maximizing solar energy utilization, reducing costs, and improving power generation efficiency.

In addition to dust, bird droppings and shadows are also common occlusions on the panel surface. If not cleaned in time, they will not only reduce the power generation efficiency but also cause hot spot effects and other problems. At present, the detection of these occlusions mainly depends on power station workers, which has poor real – time performance and cannot meet the needs of intelligent operation and maintenance. Therefore, intelligent detection and timely cleaning of these occlusions are of great practical significance for improving power generation and ensuring the long – term stable operation of power stations.

Deep learning – based image recognition technology provides a solution for the identification and detection of common occlusions on solar panel surfaces. It has higher efficiency, lower cost, and higher accuracy compared with traditional image processing technology. Therefore, this paper combines deep learning knowledge to study the occlusion problems of solar panels.

1.2 Research Progress at Home and Abroad

1.2.1 Research Progress on Solar Panel Occlusion Problems

In 2016, Kazem et al. studied the impact of dust accumulation on the performance of solar panels in six cities in northern Oman and provided suggestions for power station location selection based on dust characteristics. In 2017, Wu et al. proposed a detection method for uneven dust accumulation on panel surfaces using grey prediction theory and a maximum power point tracking method combined with a genetic algorithm. In 2019, Zhao et al. used the ResNet50 network model to identify the dust accumulation status of solar panels with an accuracy of 81%. In 2021, Fan et al. established a theoretical model between dust accumulation and power generation efficiency. Wu et al. proposed a mathematical model to predict the impact of dust particle shape on panel transmittance. In 2022, Lange et al. found that the type of dust particles had a significant impact on the wear of anti – reflective coatings. Sun et al. improved the ResNeXt50 network model using supervised contrast loss function and coordinate attention mechanism for dust accumulation degree recognition. Li et al. proposed a bird droppings coverage detection method based on migration learning and Mask – RCNN. Wu et al. improved the RetinaNet network model for shadow detection. In 2023, Wei et al. improved the YOLOv5s network model to detect hot spots and other occlusions.

1.2.2 Research Progress of Convolutional Neural Network

In the research progress of convolutional neural networks for classification, in 2012, the AlexNet network model was proposed and achieved excellent results. In 2015, the VGGNet network model was proposed, which deepened the network layers. In 2016, the ResNet network model was proposed to solve the problems of gradient disappearance and model degradation. In 2017, the DenseNet network model was proposed to strengthen feature propagation and reduce parameters. In 2018, the SENet network model was proposed to improve the expression ability. The ShuffleNet network model was also proposed in 2018 to reduce computational cost. In 2018, the ShuffleNetV2 network model was proposed with practical design guidelines. In 2020, the GhostNet network model was proposed to generate more feature maps with fewer parameters.

In the research progress of convolutional neural networks for detection, in 2014, the R – CNN network model was proposed. In 2015, the Fast R – CNN network model was proposed for improvement. In 2015, the Faster R – CNN network model was proposed to integrate multiple functions. In 2016, the YOLO network model was proposed for real – time detection. In 2017, the YOLOv2 network model was proposed with improved accuracy and speed. In 2018, the YOLOv3 network model was proposed with further performance improvement. In 2020, the YOLOv4 network model was proposed with high accuracy and speed. In 2021, the TPH – YOLOv5 network model was proposed for improvement. In 2022, the YOLOv6 network model was proposed with enhanced performance. In 2022, the YOLOv7 network model was proposed with improved detection accuracy. In 2023, the DC – YOLOv8 network model was proposed to improve detection accuracy.

1.3 Main Research Contents of This Paper

This paper focuses on the following aspects:

For the identification of the source of dust on the solar panel surface, an improved ShuffleNetV2 network model is proposed. The Mish activation function is introduced to improve the expression ability of the model. The coordinate attention mechanism module is integrated to enhance feature extraction and reduce parameters. Mixed depth convolution is used to capture features of different scales.
For the recognition of the dust accumulation degree on the solar panel surface, an improved DenseNet169 network model is proposed. An improved attention mechanism module is integrated to improve feature extraction. Asymmetric convolution and grouped convolution are used to reconstruct the feature extraction module to reduce parameters and improve accuracy. Transfer learning is introduced to enhance generalization ability.
For the detection of bird droppings and shadows on the solar panel surface, an improved YOLOv8s network model is proposed. The GhostConv and C2fGhost modules are used to reconstruct the backbone network to reduce parameters and computation. A small target detection head is added to improve the detection accuracy of small targets. The GE attention mechanism module is integrated into the neck feature fusion part to enhance feature extraction.

1.4 Organization of This Paper

This paper is organized as follows:
Chapter 1 introduces the research background, significance, and related research progress of solar panel occlusion problems and convolutional neural networks.
Chapter 2 briefly describes the basic knowledge of convolutional neural networks, including convolution layer, activation layer, pooling layer, and fully connected layer. It also introduces lightweight convolution and attention mechanism, as well as related network models such as ShuffleNetV2, DenseNet169, and YOLOv8s.
Chapter 3 presents the improved ShuffleNetV2 network model for dust source identification on the solar panel surface, including the introduction of Mish activation function, CA attention mechanism, and mixed depth convolution. Experimental results and analysis are provided to 验证 the effectiveness of the proposed model.
Chapter 4 proposes the improved DenseNet169 network model for dust accumulation degree recognition on the solar panel surface. The improved attention mechanism, asymmetric convolution module, and transfer learning are introduced. Experimental results and analysis are carried out to evaluate the performance of the model.
Chapter 5 describes the improved YOLOv8s network model for the detection of bird droppings and shadows on the solar panel surface. The improved backbone network, small target detection head, and GE attention mechanism are introduced. Experimental results and analysis are presented to demonstrate the superiority of the proposed model.
Chapter 6 summarizes the research work of this paper and looks forward to future research directions.

2. Preliminary Knowledge

2.1 Convolutional Neural Network

A convolutional neural network is a deep feedforward neural network with a convolutional structure. It can reduce the memory and parameters of deep networks and alleviate overfitting. It has advantages such as local connection, weight sharing, and translation invariance.

2.1.1 Convolution Layer

The convolution layer can retain the spatial features of the input image. Shallow convolution layers have small receptive fields and can only capture local features, while deep layers can capture global features. The input image is convolved with learnable convolution kernels to extract features. The output size of the convolution operation can be calculated according to specific formulas.

2.1.2 Activation Layer

The activation layer is composed of activation functions, which introduce nonlinearity to help the network learn complex patterns. Common activation functions include Tanh, ReLU, and Sigmoid.

2.1.3 Pooling Layer

The pooling layer is a method for downsampling images. Common pooling methods include max pooling and average pooling. It can reduce the size of the feature map and retain important information.

2.1.4 Fully Connected Layer

The fully connected layer is located at the top of the convolutional neural network and acts as a “classifier”. It maps the input feature vector to the final output value through dot product and activation function.

2.2 Lightweight Convolution

2.2.1 Standard Convolution

For standard convolution, the total number of parameters is calculated based on the size of the input feature map, convolution kernel, and the number of output channels.

2.2.2 Grouped Convolution

Grouped convolution can significantly reduce the parameters and computation of the convolutional neural network model. It divides the input feature map channels into groups and uses different convolution kernels for each group.

2.2.3 Depthwise Separable Convolution

Depthwise separable convolution consists of depth convolution and pointwise convolution. It first performs depth convolution on each channel separately and then performs pointwise convolution to combine the channels. This can reduce the number of parameters while maintaining the performance of the model.

2.3 Attention Mechanism

The attention mechanism is a commonly used method in deep learning to focus on relevant parts of the data. It can improve the performance and generalization ability of the model.

2.3.1 Channel Attention

Channel attention calculates the importance of each channel and enhances or suppresses different channels to improve the model performance. The SE module is a classic channel attention mechanism that learns the correlation between channels.

2.3.2 Spatial Attention

Spatial attention enables the model to learn the weights of different regions in the image and focus on important regions. The STN is a typical spatial attention mechanism that can transform and capture important region features.

2.4 Related Method Introduction

2.4.1 ShuffleNetV2 Network

ShuffleNetV2 is a lightweight convolutional neural network model designed for devices with limited computing power. It uses pointwise grouped convolution and channel shuffle operation to reduce parameters and computation while maintaining accuracy.

2.4.2 DenseNet169 Network

DenseNet is a densely connected network model that strengthens feature propagation and reduces the number of parameters. The DenseNet169 network model is composed of alternating feature extraction modules and transition layers.

2.4.3 YOLOv8s Network

YOLOv8s is a target detection model that can achieve one – step object recognition and location. It uses a new architecture and includes a backbone network, neck network, and detection head. The model can improve accuracy and speed and is suitable for real – time object detection.

3. Identification of Dust Source on Solar Panel Surface Based on Improved ShuffleNetV2 Network

3.1 Improved ShuffleNetV2 Network

3.1.1 Mish Activation Function

The Mish activation function is introduced to improve the expression ability of the network model. Compared with the ReLU function, Mish can better integrate feature information into the neural network.

3.1.2 CA Attention Mechanism

The CA attention mechanism module is integrated into the model to consider both channel and spatial information. It can improve the recognition accuracy of the model while reducing the number of parameters and computational complexity.

3.1.3 Mixed Depth Convolution

Mixed depth convolution is used to capture different scale features. By using convolution kernels of different sizes, the model can better adapt to the diversity of dust particles on the solar panel surface.

3.1.4 Network Model Structure

The structure of the ShuffleNetV2 network model is improved by replacing the ReLU function with the Mish function, integrating the CA attention mechanism, and using mixed depth convolution. The improved model can capture different scale features and improve the recognition performance.

3.2 Experimental Materials and Methods

3.2.1 Data Collection and Expansion

Data samples are collected from four different regions, and 239 dust particle accumulation images are obtained. To avoid overfitting, data augmentation methods such as brightness enhancement, horizontal flipping, rotation, and translation are used to expand the dataset to 718 images, including 574 training images and 144 testing images. The dust samples from different regions show certain differences, which provides a basis for studying the source of dust.

Region	Dust Particle Characteristics
Yijinhuoluoqi	Some are flocculent – like particles
Yulin	Some are spherical – like particles
Weinan	Varying shapes and sizes
Other Region	Different morphological features

3.2.2 Experimental Environment and Parameter Settings

The experiment is conducted under the Windows 10 operating system, with an Intel(R) Core(TM) i9 – 10900F CPU and an RTX 3060 GPU with 12GB of video memory. Python 3.18 and the PyTorch 1.8.0 deep learning framework are used. The input image size is unified to 224×224. The loss function is the cross – entropy loss function. The Adam optimization algorithm is used for parameter update, with an initial learning rate of 0.001. The training and testing datasets are divided in an 8:2 ratio, the batch size is 24, and the maximum number of iterations is 100. Transfer learning is applied based on the pre – trained ShuffleNetV2 model.

3.2.3 Evaluation Metrics

The classification accuracy (Accuracy), number of parameters (Params), and floating – point operations (FLOPs) are used as evaluation metrics. Accuracy is calculated as the proportion of correctly predicted samples in the test set. The number of parameters measures the size of the model, and FLOPs evaluates the computational complexity of the model.

3.3 Experimental Results and Analysis

3.3.1 Comparison Experiment of Incorporating Attention Mechanism Module

After incorporating the CA attention mechanism module into the ShuffleNetV2 network model, the recognition accuracy of the model is improved by 1.54 percentage points. The number of parameters is reduced to 1.99M, and the FLOPs is reduced to 0.11G. This shows that the CA attention mechanism can effectively enhance the feature extraction ability of the model and improve its performance.

Model	Accuracy (%)	Params (M)	FLOPs (G)
ShuffleNetV2	87.44	2.28	0.15
ShuffleNetV2 + CA	88.98	1.99	0.11

3.3.2 Comparison Experiment of Different Mixed Depth Convolution Kernels

Three groups of different – scale convolution kernels are embedded in the core module of the ShuffleNetV2 network model for experimentation. The results show that using a combination of 3×3 and 5×5 convolution kernels can improve the recognition accuracy of the model by 1.61 percentage points with a small increase in parameters and FLOPs.

Algorithm	Accuracy (%)	Params (M)	GFLOPs (G)
ShuffleNetV2	87.44	2.28	0.15
ShuffleNetV2(3,5)	89.05	2.30	0.15
ShuffleNetV2(3,5,7)	88.90	2.32	0.16
ShuffleNetV2(3,5,7,9)	88.35	2.35	0.16

3.3.3 Ablation Experiment of Different Improvement Modules

Ablation experiments are conducted by adding different improvement strategies to the ShuffleNetV2 network model. The results indicate that the proposed method can effectively improve the recognition accuracy of the model. The Mish activation function improves the accuracy by 0.65 percentage points. The CA attention mechanism further improves the accuracy by 1.54 percentage points and reduces the parameters and FLOPs. The mixed depth convolution kernels with 3×3, 5×5, and 7×7 sizes are more conducive to improving the recognition accuracy. Compared with the original ShuffleNetV2 network model, the improved model has an accuracy of 92.25%, with 2.03M parameters and 0.12G FLOPs.

Algorithm	Accuracy (%)	Params (M)	GFLOPs (G)
ShuffleNetV2	87.44	2.28	0.15
ShuffleNetV2 + Mish	88.09	2.28	0.15
ShuffleNetV2 + CA	88.98	1.99	0.11
ShuffleNetV2 + Mish + CA	90.12	1.99	0.11
ShuffleNetV2(3,5) + Mish + CA	91.28	2.01	0.11
ShuffleNetV2(3,5,7) + Mish + CA	92.25	2.03	0.12
ShuffleNetV2(3,5,7,9) + Mish + CA	91.19	2.06	0.12

3.3.4 Comparison Experiment of Different Network Models

The improved ShuffleNetV2 network model is compared with existing classic network models such as MobileNetV2, ResNet34, and GhostNet. The results show that the improved model outperforms these models in terms of accuracy. It has an accuracy improvement of 1.11 percentage points compared to MobileNetV2, 4.51 percentage points compared to GhostNet, and 0.65 percentage points compared to ResNet34. This validates the effectiveness of the proposed model.

Algorithm	Accuracy (%)	Params (M)	GFLOPs (G)
ShuffleNetV2	87.44	2.28	0.15
MobileNetV2	91.14	3.51	0.32
ResNet34	91.60	11.69	1.82
GhostNet	87.74	5.18	0.15
Improved ShuffleNetV2	92.25	2.03	0.12

3.4 Chapter Summary

In this chapter, an improved ShuffleNetV2 network model is proposed for identifying the source of dust on the solar panel surface. The Mish activation function, CA attention mechanism, and mixed depth convolution are incorporated to improve the model’s performance. Experimental results show that the improved model has higher recognition accuracy and lower computational complexity compared to the original model, providing an effective solution for dust source identification on solar panels.

4. Recognition of Dust Accumulation Degree on Solar Panel Surface Based on Improved DenseNet169 Network

4.1 Improved DenseNet169 Network

4.1.1 Improved Attention Mechanism

An improved attention mechanism module, ESM, is proposed. It combines global average pooling and global standard deviation pooling for dimensionality reduction and uses a one – dimensional convolution to capture channel – wise information. The ESM module can enhance the model’s ability to focus on important features and improve the recognition accuracy.

4.1.2 Improved Asymmetric Convolution Module

The asymmetric convolution module is introduced to improve the feature extraction ability of the model. It uses 3×3, 3×1, and 1×3 convolutions in parallel during training and fuses them into a 3×3 convolution during inference. Additionally, a new convolution module, ACGBlock, is proposed, which uses grouped convolution and conditional convolution to further reduce parameters and improve the model’s capacity.

4.1.3 Transfer Learning

Transfer learning is employed to enhance the generalization ability and convergence speed of the model. The improved DenseNet169 network model is pre – trained on the Mini – ImageNet dataset, and the pre – trained weights are used to initialize the model parameters. Then, fine – tuning is performed on the solar panel dust accumulation dataset.

4.1.4 Network Model Structure

The DenseNet169 network model is improved by integrating the ESM attention mechanism module, replacing 1×1 convolutions with grouped convolutions and channel shuffle operation, and using the ACGBlock module instead of 3×3 convolutions. These improvements enhance the model’s performance while reducing computational resources.

4.2 Experimental Materials and Methods

4.2.1 Data Collection and Expansion

Data samples are collected by manually photographing solar panels in a power station at noon. 437 dust accumulation images are obtained and divided into three categories: mild, moderate, and severe dust accumulation. Data augmentation methods such as random flipping, rotation, affine transformation, translation, and Gaussian blur are used to expand the dataset to 1897 images, including 1517 training images and 380 testing images.

Dust Accumulation Degree	Image Characteristics
Mild	Slight dust coverage, less impact on panel color and texture
Moderate	Moderate dust accumulation, visible change in panel appearance
Severe	Heavy dust layer, significant reduction in panel transparency

4.2.2 Experimental Environment and Parameter Settings

The experiment is carried out under the Ubuntu 18.04 operating system, with an Intel(R) Xeon(R) Platinum 8255C CPU and an RTX 2080 Ti GPU with 11GB of video memory. Python 3.8 and the PyTorch 1.8.1 deep learning framework are used. The input image size is set to 224×224. The cross – entropy loss function is adopted. The Ranger optimization algorithm is used for parameter update, with an initial learning rate of 0.001. The training and testing datasets are divided in an 8:2 ratio, the batch size is 32, and the maximum number of iterations is 100.

4.2.3 Evaluation Metrics

The classification accuracy, number of parameters, and FLOPs are used as evaluation metrics to comprehensively evaluate the performance of the network model on the dataset.

4.3 Experimental Results and Analysis

4.3.1 Comparison Experiment of Different Optimization Algorithms

Six optimization algorithms, including SGD, Adam, AdamW, Radam, RMSprop, and Ranger, are applied to the original DenseNet169 network model with four different initial learning rates. The results show that the Ranger optimization algorithm achieves the best recognition effect when the learning rate is 0.001, with an accuracy of 84.12%.

Optimization Algorithm	Learning Rate
	0.0001	0.005	0.001	0.01
SGD	69.60	79.43	77.10	79.26
Adam	82.81	75.25	78.28	75.15
AdamW	83.01	73.57	78.63	73.82
Radam	79.77	76.34	78.65	73.53
RMSprop	80.96	73.11	77.12	71.52
Ranger	79.23	82.01	84.12	81.78

4.3.2 Comparison Experiment of Different Attention Mechanism Modules

Different attention mechanism modules, including SE, ECA, SRM, SimAM, and ESM, are integrated into the DenseNet169 network model for comparison. The results show that the ESM module has the highest accuracy improvement, with an increase of 2.67 percentage points compared to the original model. The ESM module can effectively capture global information and improve the feature extraction ability of the model.

Algorithm	Accuracy (%)	Params (M)	GFLOPs (G)
DenseNet169	84.12	12.49	3.42
DenseNet169 + SE	84.93	12.51	3.42
DenseNet169 + ECA	86.17	12.49	3.42
DenseNet169 + SRM	87.05	12.50	3.42
DenseNet169 + SimAM	83.91	12.49	3.42
DenseNet169 + ESM	86.79	12.49	3.42

The ESM module achieves a better balance between accuracy improvement and parameter efficiency. It outperforms the SE module, which increases the parameter count slightly due to the use of fully connected layers for channel dimension transformation. The SRM module also shows a relatively high accuracy improvement but with a similar parameter increase. In contrast, the SimAM module has a lower accuracy than the original DenseNet169 network model.

4.3.3 Ablation Experiment of Different Improvement Modules

Ablation experiments are conducted by adding different improvement strategies to the DenseNet169 network model. The results show that each improvement strategy has a positive impact on the model performance.

Replacing the 1×1 convolutions in the Dense Block with grouped convolutions and channel shuffle operation significantly reduces the model parameters and FLOPs while maintaining similar accuracy. The model parameters decrease by 6.23M, and the FLOPs decrease by 1.16G.

Substituting the 3×3 convolutions in the Dense Block with the proposed ACGBlock module improves the recognition accuracy by 1.95 percentage points and reduces the parameters and FLOPs. The model parameters decrease by 2.46M, and the FLOPs decrease by 1.1G.

Integrating the ESM attention mechanism module into the feature extraction module increases the accuracy by 2.67 percentage points with almost no change in parameters and FLOPs.

Finally, introducing transfer learning further improves the final recognition accuracy to 88.52%.

The combination of these improvement strategies results in a more efficient and accurate model for recognizing the dust accumulation degree on solar panels. The improved DenseNet169 network model shows better performance in both accuracy and computational complexity compared to the original model.

GConv	ACGBlock	ESM	Transfer Learning	Accuracy (%)	Params (M)	GFLOPs (G)
				84.12	12.49	3.42
			√	83.88	6.26	2.26
		√		86.07	10.03	2.32
		√	√	86.79	12.49	3.42
	√			86.50	12.49	3.42
	√		√	85.47	3.80	1.16
	√	√		87.52	10.03	2.32
	√	√	√	85.95	6.26	2.26
√				87.58	3.80	1.16
√			√	88.52	3.80	1.16

4.3.4 Comparison Experiment of Different Network Models

The improved DenseNet169 network model is compared with eight mainstream network models. The results show that the improved DenseNet169 network model has the highest accuracy and the lowest computational complexity among these models.

Compared to the original DenseNet169 network model, the improved model has an accuracy improvement of 4.40 percentage points, a parameter reduction of 8.69M, and a FLOPs reduction of 2.26G. ShuffleNetV2, GhostNet, and EfficientNet have lower parameter and FLOPs but lower accuracy. The proposed model outperforms these models in terms of overall performance, providing a more effective solution for recognizing the dust accumulation degree on solar panels.

Algorithm	Accuracy (%)	Params (M)	GFLOPs (G)
DenseNet169	84.12	12.49	3.42
ResNet50	80.29	25.56	4.12
ResNeXt50	79.41	25.03	4.27
MobieNetV2	75.13	3.51	0.32
ShuffleNetV2	76.46	2.28	0.15
GhostNet	75.01	5.18	0.15
EfficientNet	78.86	5.29	0.39
RegNet	82.38	20.65	4.01
MobileViT	81.43	5.50	0.70
Improved DenseNet169	88.52	3.80	1.16

4.4 Chapter Summary

In this chapter, an improved DenseNet169 network model is proposed for recognizing the dust accumulation degree on solar panels. The ESM attention mechanism, asymmetric convolution module, and transfer learning are incorporated to enhance the model’s performance. Experimental results show that the improved model has higher recognition accuracy and lower computational complexity compared to the original model and other mainstream models, providing a valuable reference for low – cost recognition of solar panel dust accumulation degree.

5. Detection of Bird Droppings and Shadows on Solar Panel Surface Based on Improved YOLOv8s Network

5.1 Improved YOLOv8s Network

5.1.1 Improved Backbone Feature Extraction Network

The GhostConv and C2fGhost modules are used to replace the standard convolution and C2f module in the original YOLOv8s network model’s backbone network. The GhostConv module divides the input feature map into two parts and combines them after convolution and linear operation to reduce parameters and computation. The C2fGhost module replaces the Bottleneck module in the C2f module with GhostBottleneck modules to further reduce the model’s computational complexity.

5.1.2 Small Target Detection Head

A 160×160 small target detection head is added to the original YOLOv8s network model to improve the detection accuracy of small target objects such as bird droppings and small shadows. This additional detection head can capture more detailed features of small targets and enhance the model’s ability to detect them.

5.1.3 GE Attention Mechanism

The GE attention mechanism module is integrated into the neck feature fusion part of the YOLOv8s network model. The GE module consists of Gather and Excite parts, which can extract features from local spatial positions and rescale them back to the original size. This helps the model focus on important features and improves the detection performance.

5.1.4 Network Model Structure

The improved YOLOv8s network model combines the improved backbone network, small target detection head, and GE attention mechanism. This structure reduces the model’s parameters and computation while improving the detection accuracy of small target occlusions on solar panels.

5.2 Experimental Materials and Methods

5.2.1 Data Collection and Expansion

Data samples are collected by photographing solar panels in a power station in July and January from multiple angles. 726 images containing bird droppings and shadows are obtained and expanded to 2904 images using data augmentation methods such as contrast enhancement, noise addition, and horizontal mirror flipping. The dataset is divided into training, validation, and testing sets with 1742, 580, and 582 images respectively.

Data Source	Image Quantity	Characteristics
July	622	Different lighting conditions, various occlusion sizes and positions
January	104	Different weather conditions, unique occlusion patterns

5.2.2 Experimental Environment and Parameter Settings

The experiment is conducted under the Ubuntu 18.04 operating system, with an Intel(R) Xeon(R) Gold 6330 CPU and an RTX 3090 GPU with 24GB of video memory. Python 3.8 and the PyTorch 1.8.1 deep learning framework are used. The input image size is set to 640×640. The cross – entropy loss function is adopted. The SGD optimization algorithm is used for parameter update, with an initial learning rate of 0.01, a weight decay coefficient of 0.0005, and a momentum parameter of 0.937. The training, validation, and testing datasets are divided in a 6:2:2 ratio, the batch size is 32, the number of threads is 16, and the maximum number of iterations is 200.

5.2.3 Evaluation Metrics

Precision (P), recall (R), mean average precision (mAP), number of parameters (Params), and FLOPs are used as evaluation metrics. Precision is the proportion of correctly predicted positive samples, recall is the proportion of correctly predicted positive samples among all actual positive samples, and mAP is the average of the area under the precision – recall curve. These metrics comprehensively evaluate the performance of the network model in detecting bird droppings and shadows on solar panels.

5.3 Experimental Results and Analysis

5.3.1 Comparison Experiment of Different Backbone Networks

Three different lightweight convolutions are used to design YOLOv8s – DS, YOLOv8s – GS, and YOLOv8s – Ghost network models for comparison. The results show that all three improved backbone networks can reduce the parameters and computation of the YOLOv8s network model. The YOLOv8s – Ghost network model has the best performance, with a parameter reduction of 2.86M and a FLOPs reduction of 7.80G, while maintaining relatively high detection accuracy.

Algorithm	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	Params (M)	GFLOPs (G)
YOLOv8s	93.02	88.05	94.61	65.12	11.14	28.62
YOLOv8s – DS	93.91	83.72	92.14	61.31	10.44	19.82
YOLOv8s – GS	91.16	86.81	93.92	63.46	10.01	23.41
YOLOv8s – Ghost	93.21	87.13	93.81	63.24	8.28	20.82

5.3.2 Comparison Experiment of Different Attention Mechanism Modules

Eight different attention mechanism modules, including GC, CA, EMA, SimAM, TA, PSA, SA, and GE, are integrated into the YOLOv8s network model for comparison. The results show that different attention mechanism modules have different effects on the model performance. The GE attention mechanism module has the highest improvement in R and mAP@0.5, while the SimAM attention mechanism module has the highest improvement in mAP@0.5:0.95 without increasing parameters and computation. Considering all evaluation metrics, the GE attention mechanism module is selected as it can improve the detection accuracy without significantly increasing the computational cost.

Algorithm	Attention	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	Params (M)	GFLOPs (G)
YOLOv8s		93.02	88.05	94.61	65.12	11.14	28.62
YOLOv8s	GC	94.41	88.52	94.82	65.14	11.22	28.71
YOLOv8s	CA	94.43	89.34	95.15	65.72	11.16	28

Algorithm	Attention	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	Params (M)	GFLOPs (G)
YOLOv8s		93.02	88.05	94.61	65.12	11.14	28.62
YOLOv8s	GC	94.41	88.52	94.82	65.14	11.22	28.71
YOLOv8s	CA	94.43	89.34	95.15	65.72	11.16	28.72
YOLOv8s	EMA	92.52	88.81	95.06	65.21	11.14	28.74
YOLOv8s	SimAM	93.42	87.82	94.82	66.23	11.14	28.61
YOLOv8s	TA	92.35	89.52	95.13	65.71	11.14	28.93
YOLOv8s	PSA	94.81	87.61	95.42	65.16	11.83	29.73
YOLOv8s	SA	95.13	86.57	94.81	65.32	11.14	28.72
YOLOv8s	GE	93.31	89.92	95.87	65.52	11.18	28.71

Visualization results using a randomly selected image from the test set show that different attention mechanism modules focus on different regions of the image. The SA attention mechanism module highlights the boundaries of the occlusions more clearly, while the GE attention mechanism module emphasizes the regions with higher feature importance. These visualizations provide insights into how different attention mechanisms affect the model’s detection performance.

5.3.3 Ablation Experiment of Different Improvement Modules

Ablation experiments are conducted by adding different improvement strategies to the YOLOv8s network model. The results show that each improvement strategy has a significant impact on the model performance.

Replacing the standard convolution and C2f module in the backbone network with the GhostConv and C2fGhost modules reduces the model parameters and computation. The model parameters are reduced by 2.86M, and the FLOPs are reduced by 7.80G.

Adding a small target detection head improves the detection accuracy of small targets. The P, R, mAP@0.5, and mAP@0.5:0.95 metrics are increased by 4.10, 6.78, 3.10, and 4.90 percentage points respectively.

Integrating the GE attention mechanism module into the neck feature fusion part further improves the detection performance. The P, R, mAP@0.5, and mAP@0.5:0.95 metrics are increased by 0.19, 0.69, 0.70, and 1.60 percentage points respectively.

The combination of these improvement strategies results in the best performance. The improved YOLOv8s network model has a significant improvement in detection accuracy compared to the original model, with a parameter reduction of 3.32M. The model can effectively detect bird droppings and shadows on solar panels, especially improving the detection of small target bird droppings.

Ghost	P2	GE	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	Params (M)	GFLOPs (G)
			93.02	88.05	94.61	65.12	11.14	28.62
		√	93.21	87.13	93.81	63.24	8.28	20.82
	√		97.12	94.83	97.71	70.02	10.63	37.03
		√	93.31	89.92	95.87	65.52	11.18	28.71
	√	√	97.02	94.31	97.42	70.41	7.78	29.23
√			94.42	86.93	94.05	62.73	8.32	20.89
√		√	97.31	95.52	98.41	71.62	10.68	37.11
√	√		97.52	95.31	97.93	70.53	7.82	29.14

The detailed detection results for bird droppings and shadows show that the improved model has higher precision and recall for both types of occlusions. The mAP@0.5 and mAP@0.5:0.95 metrics are also significantly improved, indicating better overall detection performance.

Algorithm	Type	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)
YOLOv8s	Bird Droppings	92.71	84.92	92.75	53.42
	Shadows	93.36	91.01	96.42	76.62
	All	93.02	88.05	94.61	65.12
Improved YOLOv8s	Bird Droppings	96.53	96.06	97.71	62.83
	Shadows	98.51	94.72	98.35	78.27
	All	97.52	95.31	97.93	70.53

5.3.4 Comparison Experiment of Different Network Models

The improved YOLOv8s network model is compared with other target detection models such as YOLOv3 – tiny, YOLOv6s, YOLOv8s – Rep, YOLOv8s, and YOLOv8m. The results show that the improved YOLOv8s network model has the best performance in terms of mAP@0.5 and mAP@0.5:0.95.

Compared to the original YOLOv8s network model, the improved model has an mAP@0.5 improvement of 3.32 percentage points and an mAP@0.5:0.95 improvement of 5.41 percentage points. The model also has a lower parameter count, making it more efficient. Visualization results of randomly selected images from the test set show that the improved model can detect bird droppings and shadows more accurately, especially for small target bird droppings.

Algorithm	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	Params (M)	GFLOPs (G)
YOLOv3 – tiny	95.82	92.51	95.36	60.21	8.67	13.07
YOLOv6s	93.51	84.28	91.32	61.84	16.36	44.21
YOLOv8s – Rep	95.15	88.42	95.81	65.08	13.46	35.52
YOLOv8s	93.02	88.05	94.61	65.12	11.14	28.62
YOLOv8m	94.41	88.86	95.53	66.61	25.95	79.03
Improved YOLOv8s	97.52	95.31	97.93	70.53	7.82	29.14

5.4 Chapter Summary

In this chapter, an improved YOLOv8s network model is proposed for detecting bird droppings and shadows on solar panels. The GhostConv and C2fGhost modules, small target detection head, and GE attention mechanism are incorporated to improve the model’s performance. Experimental results show that the improved model has higher detection accuracy and lower computational complexity compared to the original model and other target detection models, providing an effective solution for detecting occlusions on solar panels.

6. Conclusion

This paper focuses on the identification and detection of common occlusion problems on solar panel surfaces, including dust source identification, dust accumulation degree recognition, and bird droppings and shadow detection. The following main research results are achieved:

An improved YOLOv8s network model is proposed for detecting bird droppings and shadows. The GhostConv and C2fGhost modules, small target detection head, and GE attention mechanism are used to improve the model’s detection accuracy and reduce computational complexity. The improved model outperforms other target detection models in performance.

An improved ShuffleNetV2 network model is proposed for dust source identification. The Mish activation function, CA attention mechanism, and mixed depth convolution are used to improve the model’s recognition accuracy and reduce computational complexity. Experimental results show that the improved model has better performance compared to the original model.

An improved DenseNet169 network model is proposed for dust accumulation degree recognition. The ESM attention mechanism, asymmetric convolution module, and transfer learning are incorporated to enhance the model’s performance. The improved model has higher recognition accuracy and lower computational complexity, providing a valuable reference for dust accumulation degree recognition.