A Deep Learning-Based Approach for Classification and Localization of Defects in Solar Panels

In recent years, the adoption of solar energy has surged, with photovoltaic systems becoming a cornerstone of renewable energy infrastructure. However, the efficiency and longevity of these systems heavily depend on the condition of individual solar panels. Defects such as hotspots, cracks, and diode failures can significantly reduce power output and lead to system failures. Traditional inspection methods, which often involve manual checks, are time-consuming, labor-intensive, and prone to human error. To address these challenges, we propose a comprehensive approach that leverages deep learning and computer vision techniques for automated defect classification and localization in photovoltaic panels. This method integrates image stitching, segmentation, and classification to provide a scalable solution for large-scale solar farms.

The core of our approach involves processing infrared and visible light images captured by unmanned aerial vehicles (UAVs). These images are first stitched together to form a high-resolution panoramic view of the photovoltaic array, overcoming the limitations of single-image analysis. Subsequently, image segmentation techniques isolate individual solar panels, and a deep learning model classifies defects based on infrared characteristics. By combining these steps, our system achieves high accuracy in identifying and locating faults, enabling proactive maintenance and minimizing energy losses. This paper details each component of the methodology, presents experimental results, and discusses the implications for real-world applications.

Image stitching is a critical first step in our pipeline, as it allows us to create a cohesive view of the entire photovoltaic installation from multiple overlapping images. We employ the AKAZE (Accelerated-KAZE) algorithm for feature detection and matching due to its robustness in handling scale variations, noise, and blurring effects commonly encountered in outdoor environments. Unlike linear scale-space methods like SIFT, AKAZE utilizes nonlinear diffusion filtering to preserve edge details and improve feature stability. The nonlinear diffusion process is governed by the equation:

$$ \frac{\partial L}{\partial t} = \text{div} \left( c(x, y, t) \cdot \nabla L \right) $$

where $ L $ represents the image luminance, $ \text{div} $ and $ \nabla $ denote the divergence and gradient operators, respectively, $ t $ is the time parameter, and $ c(x, y, t) $ is the conduction function that controls diffusion. The conduction function is defined as:

$$ c(x, y, t) = g(|\nabla L_\sigma(x, y, t)|) $$

with

$$ g = \frac{1}{1 + \frac{|\nabla L_\sigma|^2}{\lambda^2}} $$

Here, $ \nabla L_\sigma $ is the gradient of the Gaussian-smoothed image, and $ \lambda $ is a diffusion factor. AKAZE solves this equation using the Fast Explicit Diffusion (FED) algorithm, which iteratively updates the image as follows:

$$ L_{i+1} = [I + \tau A(L_i)] L_i $$

where $ I $ is the identity matrix, $ A(L_i) $ is the conduction matrix, and $ \tau $ is the time step. This process generates a nonlinear scale space where feature points are detected by computing the Hessian matrix for each pixel across scales. The Hessian matrix, which captures local curvature information, is given by:

$$ L_{\text{Hessian}} = \sigma^2 (L_{xx} L_{yy} – L_{xy} L_{xy}) $$

In this equation, $ \sigma $ is the scale parameter, and $ L_{xx}, L_{yy}, $ and $ L_{xy} $ are second-order derivatives. Feature points are identified as local extrema in this scale space, and descriptors are generated for matching. We use k-nearest neighbor matching with a distance ratio threshold to find correspondences between images:

$$ \frac{d_m}{d_n} < T $$

where $ d_m $ and $ d_n $ are the distances to the nearest and second-nearest neighbors, respectively, and $ T $ is a threshold value. Outliers are removed using the RANSAC algorithm, and a homography matrix is estimated to align images through perspective transformation. The transformation is expressed as:

$$ \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} $$

where $ (u, v) $ are the coordinates in the source image, and $ (x’, y’) = (x/z, y/z) $ are the transformed coordinates. Bilinear interpolation is applied to blend images seamlessly, resulting in a panoramic view that facilitates comprehensive analysis of the photovoltaic array.

Following image stitching, we proceed to segment individual solar panels from the panoramic image. This step is crucial for isolating regions of interest and enabling precise defect localization. Our segmentation pipeline leverages the distinct color characteristics of photovoltaic panels in visible light images. First, we convert the image to the HSV color space to separate luminance from chrominance, which enhances the contrast between panels and the background. Thresholding operations are applied to create a binary mask that highlights panel regions. Morphological operations, such as erosion and dilation, are then used to refine the mask by removing noise and filling gaps. The refined mask is applied to the infrared image to extract thermal data corresponding to each panel. Contour detection algorithms identify the boundaries of individual panels, and perspective transformation corrects for any geometric distortions, ensuring that each panel is represented in a standardized format for classification.

For defect classification, we employ the EfficientNet_B0 architecture, which optimizes model performance by balancing network depth, width, and resolution. This approach ensures efficient computation while maintaining high accuracy. The core building block of EfficientNet is the MBConv module, which incorporates depthwise separable convolutions to reduce parameter count and computational cost. The MBConv module includes a Squeeze-and-Excitation (SE) attention mechanism that enhances feature representation by adaptively weighting channel-wise features. The SE operation involves global average pooling to capture global information, followed by fully connected layers that learn channel-specific weights. These weights are used to recalibrate the feature maps, emphasizing informative channels and suppressing irrelevant ones. The overall structure of the MBConv module can be summarized as a sequence of operations: expansion convolution, depthwise convolution, SE attention, and projection convolution, with residual connections to mitigate gradient vanishing.

We train the classification model using a dataset of 7,000 infrared images of solar panels, each resized to 24×40 pixels. The images are categorized into four classes: normal, hotspot defects, diode failures, and cracks. The dataset is split into training, validation, and test sets in a 7:2:1 ratio, as shown in the table below:

Dataset	Class	Number of Images
Training Set	Normal	1,571
	Hotspot Defect	1,469
	Diode Failure	1,172
	Crack Defect	688
Validation Set	Normal	461
	Hotspot Defect	440
	Diode Failure	331
	Crack Defect	168
Test Set	Normal	228
	Hotspot Defect	217
	Diode Failure	171
	Crack Defect	84

To enhance model generalization, we apply data augmentation techniques including horizontal and vertical flipping, random rotation, and brightness adjustments. The model is trained using the Adam optimizer with an initial learning rate of 0.001 and a batch size of 16. Cross-entropy loss is used as the objective function:

$$ L = -\frac{1}{N} \sum_{i=1}^{N} \sum_{c=1}^{M} y_{ic} \log(p_{ic}) $$

where $ N $ is the number of samples, $ M $ is the number of classes, $ y_{ic} $ is a binary indicator for class membership, and $ p_{ic} $ is the predicted probability. Training is conducted for 200 epochs with gradient clipping to prevent explosion, and the best model is selected based on validation performance.

We evaluate the classification performance using accuracy, precision, recall, and F1 score, defined as follows:

$$ \text{Accuracy} = \frac{TP + TN}{TP + FN + FP + TN} $$

$$ \text{Precision} = \frac{TP}{TP + FP} $$

$$ \text{Recall} = \frac{TP}{TP + FN} $$

$$ \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} $$

Here, TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. The following table compares the performance of EfficientNet_B0 with other models on the test set:

Model	Accuracy	Precision	Recall	F1 Score
ResNet101	0.8957	0.8951	0.8925	0.8936
MobileNet_V3	0.9214	0.9169	0.9134	0.9148
RegNet	0.9085	0.9029	0.9034	0.9028
EfficientNet_B0	0.9371	0.9313	0.9320	0.9311

The results demonstrate that EfficientNet_B0 outperforms other architectures across all metrics, achieving an accuracy of 93.71%. This highlights its suitability for defect classification in photovoltaic systems. Additionally, we analyze the training process by plotting accuracy curves over epochs, which show consistent improvement without signs of overfitting, validating the robustness of our approach.

In the localization phase, defects are mapped back onto the panoramic image using the segmentation masks. Each detected panel is annotated with a bounding box color-coded by defect type: green for normal, red for cracks, blue for hotspots, and cyan for diode failures. This visual representation allows maintenance teams to quickly identify and address faulty panels. The integration of stitching, segmentation, and classification creates an end-to-end system that efficiently handles large-scale photovoltaic inspections.

Our experiments confirm the effectiveness of the proposed method in real-world scenarios. The AKAZE-based stitching algorithm successfully generates seamless panoramas even under varying lighting conditions, while the segmentation pipeline accurately isolates panels despite complex backgrounds. The classification model not only achieves high performance but also generalizes well to unseen data, reducing false positives and negatives. This comprehensive approach addresses key challenges in photovoltaic maintenance, such as scalability and accuracy, paving the way for automated inspection systems.

In conclusion, we have developed a deep learning-based framework for defect classification and localization in solar panels that combines advanced image processing techniques with state-of-the-art neural networks. By leveraging AKAZE for image stitching, HSV-based segmentation, and EfficientNet_B0 for classification, our method achieves high precision and recall in identifying defects. The system’s non-contact nature and automation make it ideal for large photovoltaic farms, where manual inspections are impractical. Future work will focus on extending the approach to video streams and incorporating temporal analysis for dynamic defect detection. Overall, this research contributes to the sustainable operation of photovoltaic systems by enabling rapid, accurate, and cost-effective maintenance.