Enhanced Detection of Minor Defects on Solar Panel Surfaces Using an Improved Dn-YOLOv7 Algorithm

The escalating global energy crisis and environmental degradation have intensified the demand for renewable energy sources. Among these, solar power generation stands as a prominent representative. By early 2024, the installed capacity of solar photovoltaic (PV) power generation in China had reached approximately 610 GW, accounting for a significant portion of the total power generation capacity. The operational efficiency and stability of PV power plants are critically dependent on the health of individual solar panels. Defects such as micro-cracks, finger interruptions (broken grids), and snail trails (spots) can severely degrade performance and lead to potential safety hazards. Consequently, with the rapid expansion of the solar power industry, the development of efficient and reliable methods for inspecting solar panel surfaces has become indispensable.

Target detection in aerial imagery, a common method for large-scale solar panel farm inspection, is notoriously susceptible to image quality degradation. In many scenarios, noise and complex backgrounds can severely interfere with detection efficacy, sometimes rendering it entirely ineffective. Aerial images are particularly prone to corruption by various noise types, primarily Gaussian noise and impulse (salt-and-pepper) noise, introduced during long-duration drone flights, data transmission, image cropping, and scaling operations. When the target objects are very small and exhibit a low signal-to-noise ratio (SNR), detection networks often struggle to distinguish between genuine defect features and noise artifacts, leading to increased rates of false positives and missed detections.

Traditional approaches often address noise as a separate pre-processing step using filters (e.g., median, Gaussian) or advanced deep learning-based denoising networks. While effective for specific noise types, these methods can blur image details—a critical drawback for tiny defect detection—and often incur significant computational overhead, hindering real-time application. Conversely, most state-of-the-art object detectors, including the YOLO (You Only Look Once) family, focus architectural improvements on enhancing feature extraction and fusion capabilities but are not explicitly designed to be robust against the noisy conditions prevalent in aerial imagery. This creates a gap where detection performance degrades sharply in practical, noisy environments.

To bridge this gap, this work proposes an Improved Dn-YOLOv7 (De-noising YOLOv7) algorithm specifically designed for robust small-target defect detection on solar panel surfaces under noisy conditions. The core innovations involve integrating a denoising mechanism directly into the detection backbone, employing a loss function better suited for tiny objects, and enhancing spatial awareness in the network. The primary contributions are threefold:

De-noising Block (DnBlock): Inspired by DnCNN (De-noising Convolutional Neural Network), a novel DnBlock module is proposed and integrated into the YOLOv7 backbone. This module learns a residual noise map from the input and uses a noise-tolerant loss function (Mean Absolute Error) to improve robustness.
Normalized Gaussian Wasserstein Distance (NWD): The traditional Complete Intersection over Union (CIoU) loss is replaced with the NWD metric for bounding box regression. NWD models bounding boxes as 2D Gaussian distributions and measures their similarity via the Wasserstein distance, which is more sensitive to minor positional deviations of small objects.
CoordConv Integration: Coordinate Convolution (CoordConv) layers are strategically employed within the detection head and the DnBlock. By explicitly adding coordinate information channels, CoordConv helps the network better localize features and handle the sparse, non-uniform noise patterns resulting from the denoising process, preserving critical spatial information for small solar panel defects.

The proposed method aims to enhance the model’s ability to extract clean features from noisy inputs directly, thereby improving detection accuracy for tiny solar panel defects without a substantial sacrifice in inference speed, making it suitable for deployment on edge devices used in aerial inspection.

Methodology: The Improved Dn-YOLOv7 Architecture

The baseline for our work is YOLOv7, a state-of-the-art single-stage detector known for its speed and accuracy. However, its standard architecture is not optimized for small targets in noisy environments. Our Improved Dn-YOLOv7 introduces key modifications across its three primary components: the Backbone, Neck, and Head. The overall architecture is designed to process noisy solar panel images and reliably output the locations and classes of surface defects.

1.1 Backbone Network with Integrated DnBlock

The backbone is responsible for initial feature extraction. We augment the standard YOLOv7 backbone (composed of CBS, ELAN, and MPConv modules) by inserting our proposed DnBlock after the initial downsampling stages. The DnBlock is designed as a residual network that learns to estimate and subtract noise, following the principle: $\hat{Y} = X – f(X)$, where $X$ is the noisy input feature map, $f(X)$ is the estimated noise residual learned by the network, and $\hat{Y}$ is the cleaner output feature map.

The internal structure of the DnBlock is detailed in the accompanying schematic. It begins by transforming the input features. A key design is the use of a split and concatenation pathway: one branch carries forward a portion of the initial features directly to preserve information, while the other branch processes the features through multiple CBS (Conv-BN-SiLU) layers to extract the noise characteristics. The noise residual is then subtracted via a skip connection. The final step involves a convolution to adjust dimensions, followed by a concatenation with the preserved features from the first branch, resulting in a denoised, higher-dimensional feature map ready for subsequent processing.

A critical aspect of the DnBlock is its loss function for training the denoising task. We replace the commonly used Mean Squared Error (MSE) loss with Mean Absolute Error (MAE). The rationale stems from the theory of noise tolerance in symmetric loss functions. For a classification or regression task with noisy labels/data, a loss function $L$ is considered noise-tolerant if the global minimizer under the noisy data distribution remains the same as under the clean data distribution under certain conditions.

Consider a scenario with $k$ classes. Let $\eta$ be the noise rate (probability a label is corrupted), $y$ be the true label, and $\hat{y}$ be the potentially noisy observed label. For uniform label noise, the risk $R_{\eta}^{L}$ under the noisy distribution is:
$$R_{\eta}^{L}(f) = E_{x,\hat{y}}[L(f(x), \hat{y})] = (1-\eta)E_{x,y}[L(f(x), y)] + \frac{\eta}{k-1} \sum_{i \neq y} E_{x}[L(f(x), i)]$$
This can be simplified to:
$$R_{\eta}^{L}(f) = \frac{C\eta}{k-1} + \left[1 – \frac{k\eta}{k-1}\right] R_{L}(f)$$
where $C$ is a constant. For two different models $f^*$ and $f$, the difference in risk is:
$$R_{\eta}^{L}(f^*) – R_{\eta}^{L}(f) = \left[1 – \frac{k\eta}{k-1}\right] (R_{L}(f^*) – R_{L}(f))$$

Under the condition $\eta < \frac{k-1}{k}$, the factor $\left[1 – \frac{k\eta}{k-1}\right] > 0$. This implies that the ordering of models by their true risk $R_L$ is preserved under the noisy risk $R_{\eta}^{L}$. Therefore, minimizing the risk with the noisy data is equivalent to minimizing the risk with clean data, provided the loss function is symmetric. MAE is a symmetric loss function, whereas MSE is not strictly symmetric in this context, making MAE inherently more robust to label and feature noise during the denoising training phase within the DnBlock.

1.2 Neck and Head with CoordConv and NWD Loss

The neck network, featuring modules like SPPCSPC and upsampling layers, performs multi-scale feature aggregation. We maintain this structure but ensure the flow of denoised features from the backbone.

The head network is responsible for the final detection predictions. Here, we introduce two major changes. First, we replace standard convolutional layers in the detection head with CoordConv layers. A standard convolution possesses translation invariance, which can be a disadvantage for tasks requiring precise spatial localization, especially when dealing with residual noise patterns. CoordConv addresses this by concatenating extra coordinate channels (normalized i and j coordinates) to the input feature map before the convolution operation. This gives the network explicit spatial information, helping it to better correlate denoised features with their precise location on the solar panel surface. The operation can be formulated as follows for an input feature map $F$ of size $H \times W \times C$:
$$F_{\text{coord}} = \text{Concat}(F, \mathbf{I}, \mathbf{J})$$
where $\mathbf{I}$ and $\mathbf{J}$ are coordinate matrices such that $\mathbf{I}_{i,j}=i/H$ and $\mathbf{J}_{i,j}=j/W$. The convolution is then performed on $F_{\text{coord}}$.

Second, and most crucially for small solar panel defects, we replace the CIoU loss for bounding box regression with the Normalized Gaussian Wasserstein Distance (NWD) loss. The IoU-based metrics have inherent shortcomings for tiny objects: a small positional deviation can cause a drastic drop in IoU, and the metric is highly sensitive to the scale of the objects. NWD offers a more graceful similarity measure.

The core idea of NWD is to model a bounding box $B=(c_x, c_y, w, h)$ as a 2D Gaussian distribution $\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$, where the center pixel is considered the mean, and the spread is modeled by the width and height:
$$
\boldsymbol{\mu} = \begin{bmatrix} c_x \\ c_y \end{bmatrix}, \quad \boldsymbol{\Sigma} = \begin{bmatrix} \frac{w^2}{4} & 0 \\ 0 & \frac{h^2}{4} \end{bmatrix}.
$$
For two bounding boxes $B_p$ and $B_t$ (predicted and target), modeled as Gaussians $\mathcal{N}_p(\boldsymbol{\mu}_p, \boldsymbol{\Sigma}_p)$ and $\mathcal{N}_t(\boldsymbol{\mu}_t, \boldsymbol{\Sigma}_t)$, their similarity can be measured by the Wasserstein distance $W$ between the two distributions. The $2^{nd}$-order Wasserstein distance has a closed-form solution for Gaussians:
$$
W_2^2(\mathcal{N}_p, \mathcal{N}_t) = \|\boldsymbol{\mu}_p – \boldsymbol{\mu}_t\|_2^2 + \|\boldsymbol{\Sigma}_p^{1/2} – \boldsymbol{\Sigma}_t^{1/2}\|_F^2.
$$
This distance is then normalized into a similarity measure that ranges between 0 and 1 using an exponential form:
$$
\text{NWD}(\mathcal{N}_p, \mathcal{N}_t) = \exp\left(-\frac{\sqrt{W_2^2(\mathcal{N}_p, \mathcal{N}_t)}}{C}\right).
$$
Here, $C$ is a constant related to the dataset’s average object scale, which we set empirically. The NWD loss for training is then defined as $\mathcal{L}_{\text{NWD}} = 1 – \text{NWD}$. This loss provides a smoother gradient for small objects and is less sensitive to minor scale and location variations, making it far more effective for detecting tiny cracks and spots on solar panels.

Experimental Setup and Results

2.1 Dataset and Implementation Details

We utilized a publicly available dataset of aerial solar panel defect images. The dataset contains 2,700 images annotated with three defect classes: Crack, Broken Grid, and Spot. The data was split into 1,920 images for training, 480 for testing, and 300 for validation. To rigorously evaluate robustness, we created corrupted versions of the test set by adding different types and levels of noise: Gaussian noise with standard deviation $\sigma = 0.12$ and $\sigma = 0.24$ (on a [0,1] pixel intensity scale), and impulse noise with corruption ratios of 15% and 30%.

The model was trained for 200 epochs with an input image size of $640 \times 640$ and a batch size of 64. The Adam optimizer was used with an initial learning rate of 0.01, momentum of 0.9, and weight decay of 0.005. All experiments were conducted on a system with an NVIDIA RTX 3070 GPU.

The primary evaluation metric is the mean Average Precision (mAP) at an IoU threshold of 0.5 (mAP@0.5). To quantify robustness against noise, we also calculate a Precision Shift metric $R$, defined as the average drop in mAP across all noisy test conditions relative to the clean test baseline:
$$R = \frac{1}{N-1} \sum_{i=2}^{N} (P_i – P_1)$$
where $P_1$ is the mAP on the clean test set, $P_i$ is the mAP on the $i$-th noisy test set, and $N$ is the total number of test conditions (clean + noisy variants). A smaller (less negative) $R$ indicates better noise robustness.

2.2 Ablation Study

We conducted an extensive ablation study on the YOLOv7-small (YOLOv7s) baseline to validate the contribution of each proposed component. The configurations and their corresponding results are summarized in the table below.

Exp. #	YOLOv7s	DnBlock	NWD Loss	CoordConv	mAP@0.5 (%)					Speed (FPS)	Precision Shift (R)
					Clean	Gauss (σ=0.12)	Gauss (σ=0.24)	Impulse (15%)	Impulse (30%)
1	✓				93.9	92.3	82.5	88.6	78.6	88.0	-8.40
2	✓	✓			94.1	93.6	86.5	88.7	80.7	81.0	-6.73
3	✓		✓		95.3	94.2	88.9	89.9	80.8	85.0	-6.85
4	✓			✓	95.0	93.3	88.1	88.3	78.9	80.0	-7.85
5	✓	✓	✓		96.3	94.2	90.3	90.0	82.8	87.0	-6.98
6	✓	✓		✓	95.9	94.4	90.1	89.7	82.5	79.0	-6.73
7	✓		✓	✓	96.1	93.6	87.4	90.4	80.2	82.0	-8.20
8 (Ours)	✓	✓	✓	✓	96.6	94.9	91.4	91.2	85.4	78.0	-5.88

The analysis of the ablation table reveals clear insights. The baseline model (Exp. 1) achieves 93.9% mAP on clean images but suffers significant degradation under strong noise (82.5% under heavy Gaussian noise, 78.6% under 30% impulse noise), with a large average precision shift of -8.40.

Adding the DnBlock alone (Exp. 2) already provides a substantial robustness boost, especially against Gaussian noise (mAP rises from 82.5% to 86.5% for σ=0.24) and improves the shift to -6.73. This confirms the module’s effectiveness in learning to suppress noise features. Employing the NWD loss alone (Exp. 3) yields the highest clean mAP at this stage (95.3%) and shows excellent performance under Gaussian noise, highlighting its strength for small solar panel defect localization. Using CoordConv alone (Exp. 4) also shows benefits, particularly in structured noise conditions.

The combination of DnBlock and NWD (Exp. 5) achieves very strong results (96.3% clean, 90.3% on heavy Gaussian noise). The full integration of all three components in our proposed model (Exp. 8) delivers the best overall performance. It attains the highest mAP on the clean test set (96.6%) and, most importantly, the highest mAP across all noisy conditions (91.4% and 85.4% under the strongest Gaussian and impulse noise, respectively). Its precision shift of -5.88 is the smallest among all configurations, proving its superior robustness. The inference speed of 78 FPS remains suitable for real-time aerial inspection tasks, despite the added architectural complexity.

2.3 Comparison with State-of-the-Art Detectors

We further benchmarked our Improved Dn-YOLOv7 against several popular object detectors, including two-stage (Fast R-CNN), single-stage (SSD), and other YOLO-family models (YOLOv5, YOLOv6, YOLOv8). The results are consolidated in the following table.

Model	mAP@0.5 (%)					Speed (FPS)	Precision Shift (R)
	Clean	Gauss (σ=0.12)	Gauss (σ=0.24)	Impulse (15%)	Impulse (30%)
Fast R-CNN	73.9	69.4	62.2	68.6	62.1	30.0	-8.3
SSD	71.4	68.2	62.5	65.1	59.2	32.0	-7.7
YOLOv5	88.4	84.6	76.3	81.2	68.5	91.0	-10.8
YOLOv6	93.4	85.6	75.8	81.3	78.3	98.0	-13.2
YOLOv8	96.7	93.6	90.8	90.9	83.4	118.0	-7.0
Improved Dn-YOLOv7 (Ours)	96.6	94.9	91.4	91.2	85.4	78.0	-5.9

The comparative analysis underscores the effectiveness of our approach. While YOLOv8 achieves the highest clean mAP (96.7%) and the fastest inference speed (118 FPS), its performance under noise, particularly strong impulse noise (83.4%), is lower than our model’s. Crucially, our Improved Dn-YOLOv7 model achieves the highest mAP scores across all four noisy test conditions. It demonstrates the best noise robustness, as evidenced by the smallest precision shift (R = -5.9). This indicates a well-balanced trade-off: we accept a moderate reduction in frame rate compared to the fastest YOLO variants to gain significant and critical improvements in detection accuracy for solar panel defects in challenging, noisy aerial environments. The performance of the older two-stage and single-stage models is significantly lower, highlighting the advancement of the YOLO family and the specific value of our tailored improvements.

Visual comparisons of detection results on noisy images clearly show that our model produces fewer false positives (erroneously detecting noise as a defect) and false negatives (missing actual defects) compared to other models. The bounding boxes for small spots and thin cracks are more accurately and confidently predicted, even when the image is heavily corrupted.

Conclusion and Future Work

In this work, we presented an Improved Dn-YOLOv7 model specifically designed for detecting small surface defects on solar panels in noisy aerial imagery. The model integrates a dedicated DnBlock for in-network feature denoising, utilizes the NWD loss for superior small-object localization, and employs CoordConv layers to enhance spatial coordinate awareness. Extensive experiments on a solar panel defect dataset under various noise corruptions demonstrate that the proposed model significantly outperforms the baseline and other state-of-the-art detectors in noisy conditions. It achieves a high mAP of 96.6% on clean images and maintains robust performance above 85% mAP even under strong Gaussian and impulse noise, with a real-time inference speed of 78 FPS. This balance of accuracy, robustness, and speed makes it a practical solution for automated aerial inspection systems for PV farms.

Despite its strengths, the model has some limitations. The integration of additional modules has led to a decrease in inference speed compared to the most optimized vanilla models. Future work will focus on further architectural optimization and pruning to reduce computational complexity without compromising denoising and detection performance. Additionally, while tested on common synthetic noise types, evaluation on real-world noise from different drone sensors and under varying weather conditions (haze, dust) is necessary. Extending the framework’s capability to handle mixed or unknown noise types through adaptive or blind denoising techniques within the DnBlock represents another promising research direction to enhance the practicality of solar panel inspection systems.