Advanced Defect Classification and Localization for Solar Panels Using Deep Learning and Image Processing

The increasing global deployment of photovoltaic (PV) systems necessitates efficient and reliable inspection methodologies to ensure optimal performance and longevity. Traditional manual inspection of solar panels is labor-intensive, time-consuming, and prone to human error, especially for large-scale solar farms. Consequently, computer vision-based techniques, particularly those utilizing aerial imagery from unmanned aerial vehicles (UAVs), have garnered significant attention. These methods typically analyze thermal (infrared) characteristics of solar panels to identify anomalies such as hot spots, cracks, or faulty bypass diodes. While offering advantages like non-contact assessment and terrain independence, these approaches face challenges related to the limited field of view of single aerial images and the need for robust algorithms capable of handling variable outdoor conditions and providing precise defect localization.

This paper presents a comprehensive framework for the automated classification and precise localization of defects in solar panels. The proposed method overcomes the perspective limitation of single-image analysis by constructing a high-resolution panoramic map of the PV installation. Subsequently, image processing techniques segment individual PV modules, which are then classified using a state-of-the-art deep learning model. This integrated pipeline enables rapid, accurate, and comprehensive assessment of PV plant health.

The core of our methodology is a multi-stage algorithm encompassing image stitching, module segmentation, and defect classification. The first stage addresses the creation of a complete site overview. Instead of relying on common feature detectors like SIFT or ORB, we employ the AKAZE (Accelerated-KAZE) algorithm for its superior performance in nonlinear scale spaces. AKAZE constructs its scale space using nonlinear diffusion filtering, which preserves edges better than linear Gaussian smoothing. The diffusion process is described by:
$$ \frac{\partial L}{\partial t} = \text{div} \left( c(x, y, t) \cdot \nabla L \right) $$
where $L$ is the luminance of the image, and $c(x, y, t)$ is the conduction function, defined as:
$$ c(x, y, t) = g\left( | \nabla L_{\sigma}(x, y, t) | \right) $$
with $g$ typically chosen as:
$$ g = \frac{1}{1 + \frac{|| \nabla L_{\sigma} ||^2}{\lambda^2}} $$
Here, $\nabla L_{\sigma}$ is the gradient of a Gaussian-smoothed version of the image, and $\lambda$ is a contrast factor. This approach allows AKAZE to generate stable and distinctive features even in images with noise or blur. Feature points are detected by finding the extrema of the Hessian determinant across the nonlinear scale space:
$$ L_{\text{Hessian}} = \sigma^2 (L_{xx}L_{yy} – L_{xy}L_{xy}) $$
where $L_{xx}$, $L_{yy}$, and $L_{xy}$ are second-order derivatives and $\sigma$ is the scale parameter.

Following feature detection, descriptors are generated and matched between overlapping aerial images using the k-nearest neighbor algorithm with a distance ratio test:
$$ \frac{d_m}{d_n} < T $$
where $d_m$ and $d_n$ are the distances to the closest and second-closest neighbor, respectively, and $T$ is a threshold (e.g., 0.8). Outliers are removed using the RANSAC algorithm to estimate a robust homography matrix $H$. This $3 \times 3$ matrix defines the projective transformation needed to align one image with another:
$$
\begin{bmatrix}
x’ \\
y’ \\
z’
\end{bmatrix} = H \begin{bmatrix}
u \\
v \\
1
\end{bmatrix} = \begin{bmatrix}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23} \\
a_{31} & a_{32} & a_{33}
\end{bmatrix} \begin{bmatrix}
u \\
v \\
1
\end{bmatrix}
$$
The final coordinates in the target image are obtained as $x = x’/z’$ and $y = y’/z’$. This process is applied sequentially to stitch multiple visible-light and infrared images into two separate, aligned panoramas of the PV plant.

The second stage involves segmenting individual solar panels from the stitched panorama. Leveraging the distinct color difference between the solar panels (typically dark blue or black) and the background (ground, grass, racking), we convert the visible-light panorama to the HSV color space. A carefully defined color threshold isolates the PV array regions, generating a binary mask. This mask is refined using morphological operations (closing and opening) to remove noise and fill small gaps. The refined mask is then applied to the co-registered infrared panorama via a bitwise AND operation, extracting the thermal signature of only the PV modules. Finally, contour detection is performed on the mask to find the bounding rectangle of each module, and a perspective transformation is applied to correct for any skew, resulting in a standardized, rectangular image of each solar panel for classification.

The third and crucial stage is defect classification. We employ the EfficientNet-B0 architecture, a convolutional neural network optimized through compound scaling of network depth, width, and input resolution. Its core building block is the MBConv module with squeeze-and-excitation (SE) attention. The MBConv module first expands the channel dimension using a $1 \times 1$ convolution, then applies a depthwise separable convolution for spatial feature extraction, followed by a SE block that recalibrates channel-wise feature responses. The SE operation involves global average pooling to squeeze global spatial information, followed by two fully connected layers that learn a channel-specific weighting vector:
$$ z_c = \mathbf{F}_{sq}(u_c) = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} u_c(i, j) $$
$$ s = \mathbf{F}_{ex}(z, W) = \sigma(g(z, W)) = \sigma(W_2 \delta(W_1 z)) $$
where $u_c$ is the input feature map for channel $c$, $z_c$ is the squeezed scalar, $\delta$ is the ReLU activation, $\sigma$ is the sigmoid function, and $W_1$ and $W_2$ are learned weights. The final output is obtained by scaling the original features with the learned activations $s$. This mechanism allows the model to focus on more informative features relevant to defects in solar panels.

The model is trained using the cross-entropy loss function, which measures the discrepancy between the predicted class probabilities and the true labels:
$$ \mathcal{L} = – \frac{1}{N} \sum_{i=1}^{N} \sum_{c=1}^{M} y_{i,c} \log(p_{i,c}) $$
where $N$ is the batch size, $M$ is the number of defect classes, $y_{i,c}$ is the ground truth indicator (1 if sample $i$ belongs to class $c$, else 0), and $p_{i,c}$ is the predicted probability.

To evaluate the performance of our classification model, we utilize four standard metrics: Accuracy, Precision, Recall, and F1-Score. Their calculations based on True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) are as follows:

$$ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} $$

$$ \text{Precision} = \frac{TP}{TP + FP} $$

$$ \text{Recall} = \frac{TP}{TP + FN} $$

$$ \text{F1-Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} $$

We conducted extensive experiments to validate the proposed framework. The dataset consisted of 7,000 infrared images of individual solar panels, resized to 24×40 pixels, categorized into four classes: Normal, Hot Spot, Crack, and Faulty Bypass Diode. The dataset was split into training, validation, and test sets as shown in the table below.

Dataset Split	Class	Number of Images
Training Set	Normal	1,571
	Hot Spot	1,469
	Faulty Bypass Diode	1,172
	Crack	688
Validation Set	Normal	228
	Hot Spot	217
	Faulty Bypass Diode	171
	Crack	84
Test Set	Normal	461
	Hot Spot	440
	Faulty Bypass Diode	331
	Crack	168

Data augmentation techniques including horizontal/vertical flipping and random rotation were applied during training to improve model robustness. The EfficientNet-B0 model was trained using the Adam optimizer with an initial learning rate of 0.001, a batch size of 16, and gradient clipping for 200 epochs.

We compared the performance of our chosen EfficientNet-B0 model against other prominent architectures on the same test set. The results, summarized in the table below, demonstrate the superior performance of EfficientNet-B0 for the task of defect classification in solar panels.

Network Model	Accuracy	Precision	Recall	F1-Score
ResNet101	0.8957	0.8951	0.8925	0.8936
MobileNet_V3	0.9214	0.9169	0.9134	0.9148
RegNet	0.9085	0.9029	0.9034	0.9028
EfficientNet_B0	0.9371	0.9313	0.9320	0.9311

The final output of the complete pipeline is a visually annotated panorama where each detected solar panel is enclosed by a bounding box color-coded according to its classified condition (e.g., green for normal, red for crack, blue for hot spot, cyan for faulty bypass diode). This provides maintenance teams with an intuitive and precise map for targeted intervention.

In conclusion, this paper presents an effective and integrated deep learning-based framework for the automated inspection of solar panels. The method combines robust image stitching using the AKAZE algorithm, precise module segmentation via color-based processing, and high-accuracy defect classification with the EfficientNet-B0 model. The system successfully addresses the limitations of single-image analysis by providing a holistic view of the PV installation and precisely locating defective modules. Experimental results confirm that the approach achieves high performance metrics, with the classification model attaining an accuracy of 93.71%, precision of 93.13%, recall of 93.20%, and an F1-score of 93.11%. This non-contact, efficient, and scalable solution is highly suitable for the routine monitoring and maintenance of large-scale photovoltaic power plants, ensuring their reliable and efficient operation.