A Comparative Study on Hot Spot Detection Methods for Solar Panels Based on Thermal Infrared Imaging

The pursuit of “carbon peak” and “carbon neutrality” targets has positioned photovoltaic (PV) power generation as a pivotal branch of the new energy sector. Solar panels, being exposed to harsh outdoor environments for extended periods, are highly susceptible to coverage by dust, bird droppings, and other debris. This coverage can lead to the shading of local cell groups, causing abnormal temperature rises and the formation of hot spots. These hot spots significantly degrade the power generation efficiency of the PV system and pose potential safety risks. Therefore, the timely and accurate detection of hot spots on solar panels is a critical task for ensuring the safe and efficient operation of PV power plants. This research delves into hot spot detection methodologies, comparing traditional image processing techniques with modern machine learning-based object detection algorithms, and proposes an improved framework for enhanced performance.

Currently, methods for hot spot identification can be broadly categorized into two groups: those based on traditional image processing algorithms and those leveraging machine learning, particularly deep learning-based object detection. Traditional methods often rely on manually designed features like color, gradient, or texture, making them highly dependent on image quality. Their performance typically deteriorates in the presence of noise or when the hot spot features are subtle. In contrast, data-driven deep learning methods can autonomously learn robust and multi-dimensional feature representations from data, offering superior generalization and accuracy in complex scenarios. This study investigates both pathways, analyzing the limitations of conventional approaches and subsequently proposing an improved You Only Look Once version 4 (YOLOv4) model tailored for hot spot detection in thermal infrared images of solar panels.

Hot Spot Detection Based on Traditional Image Processing

Traditional image processing techniques for target detection involve applying physical transformations to an image to highlight key characteristics such as color or edges. For hot spot detection, this often translates to segmenting high-temperature regions or identifying their boundaries based on pixel intensity variations.

Region Segmentation-Based Methods

These methods aim to partition an image into regions, separating the hot spot (foreground) from the rest of the panel (background).

1. Threshold Segmentation: This is a fundamental technique where an image is first converted to grayscale. A threshold value is then selected, either empirically or through adaptive algorithms. Pixels with intensity values above (or below) this threshold are classified as the target (hot spot), while others are treated as background.
$$ g(x,y) = \begin{cases} 255 & \text{if } f(x,y) \geq T \\ 0 & \text{otherwise} \end{cases} $$
where $f(x,y)$ is the original grayscale pixel value, $g(x,y)$ is the output binary pixel value, and $T$ is the threshold. The primary challenge lies in selecting an appropriate $T$, especially when the background exhibits similar灰度 values to the hot spot, leading to significant false positives or missed detections.

2. Histogram Equalization: This method enhances image contrast by redistributing the intensity values. It stretches the dynamic range of the grayscale levels, making brighter areas (like hot spots) more distinct. However, it can also amplify noise and the contrast of non-hot-spot areas with similar initial brightness, complicating the subsequent segmentation task.

3. HSV Color Extraction: Thermal images are sometimes pseudo-colored to represent temperature gradients. Converting such an image to the Hue, Saturation, and Value (HSV) color space can facilitate segmentation based on color. The hot spot area, often mapped to a specific color (e.g., white or red in a “hot iron” colormap), can be extracted by defining a range in the HSV space. This method provides an intuitive way to isolate regions but is heavily dependent on the consistency of the color mapping and can fail if the background contains pixels with similar HSV values.

Method	Core Principle	Key Advantage	Primary Limitation for Hot Spots
Threshold Segmentation	Intensity-based pixel classification	Simple and computationally fast	Highly sensitive to threshold choice; fails with similar foreground/background intensity.
Histogram Equalization	Global contrast enhancement	Can improve visibility of hot spots	Amplifies noise and irrelevant details, increasing segmentation difficulty.
HSV Color Extraction	Segmentation in color space	Intuitive for pseudo-colored thermal images	Depends on colormap consistency; susceptible to color-based false positives.

Edge Detection-Based Methods

Instead of segmenting regions, these methods identify the boundaries of hot spots by detecting significant changes in intensity (edges).

1. Sobel Operator: The Sobel operator is a discrete differentiation operator that computes an approximation of the gradient of the image intensity function. It uses two 3×3 kernels (for horizontal and vertical derivatives) convolved with the original image.
$$ G_x = \begin{bmatrix} -1 & 0 & +1 \\ -2 & 0 & +2 \\ -1 & 0 & +1 \end{bmatrix} * I, \quad G_y = \begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ +1 & +2 & +1 \end{bmatrix} * I $$
The gradient magnitude is calculated as $G = \sqrt{G_x^2 + G_y^2}$. Pixels with high gradient magnitude are considered edge points. While effective for clear boundaries, it is highly sensitive to noise, which is common in thermal imagery, resulting in cluttered edge maps with many spurious edges from background texture.

2. Canny Edge Detector: This is a multi-stage algorithm considered optimal for edge detection. It involves Gaussian filtering to reduce noise, gradient calculation (similar to Sobel), non-maximum suppression to thin edges, and double thresholding with hysteresis tracking to finalize detected edges. The need to set high and low thresholds ($T_{high}$, $T_{low}$) makes its performance parameter-dependent. Inconsistent thermal backgrounds can lead to either broken hot spot contours or edges leaking into the background.

Method	Core Principle	Key Advantage	Primary Limitation for Hot Spots
Sobel Operator	Gradient approximation	Simple and efficient	Extremely sensitive to noise; produces thick, noisy edges.
Canny Detector	Multi-stage optimal edge detection	Provides thin, well-localized edges	Performance heavily relies on threshold selection; struggles with weak or noisy boundaries.

Analysis of Traditional Image Processing Methods

The experimental application of the aforementioned traditional methods to thermal images of solar panels reveals a fundamental challenge: they demand a high degree of differentiation between the target (hot spot) and the background. Region-based methods fail when the hot spot’s灰度 or color signature is similar to that of normal panel areas or dirt. Edge-based methods are profoundly affected by the inherent noise and texture present in thermal imagery captured by standard infrared cameras, leading to cluttered outputs with numerous false edges. These methods lack robustness and generalizability because they rely on hand-crafted features that may not capture the complex appearance of hot spots under varying conditions. Consequently, there is a clear need for more adaptive and robust detection techniques that are less dependent on pristine image quality, paving the way for data-driven, deep learning approaches.

Proposed Hot Spot Detection Method Based on Improved YOLOv4

Deep learning-based object detection methods, governed by data rather than manual feature design, excel at learning hierarchical representations. They can discern hot spots even when their immediate灰度 contrast is low, by contextual understanding of the panel structure and anomaly patterns. This research proposes an enhanced version of the YOLOv4 object detector specifically optimized for the hot spot detection task on solar panels.

Dataset Construction

A fundamental prerequisite for effective deep learning is a comprehensive and reliable dataset. Acquiring a large number of real-world thermal images of faulty solar panels with hot spots is challenging due to operational constraints. Therefore, a hybrid data collection strategy was employed:

Real Hot Spots: A subset of images was obtained through field surveys of operational PV plants.
Simulated Hot Spots: Inspired by laboratory experiments (e.g., using materials like toothpaste or soil to partially cover cells and induce the hot spot effect), simulated hot spots were created on panels under controlled conditions to augment the dataset.

The collected raw images underwent a rigorous processing pipeline:

Data Screening: Initial filtering to remove blurry or irrelevant images.
Data Augmentation: Techniques such as random rotation (±15°), horizontal/vertical flipping, scaling, and cropping were applied to increase dataset size and variability, improving model generalization. Color jittering was avoided to preserve the temperature information encoded in the pseudo-color.
Secondary Screening: Post-augmentation filtering to ensure quality.
Annotation: Each hot spot in every image was manually labeled by drawing bounding boxes using annotation tools (e.g., LabelImg), generating the ground truth data required for supervised learning.

The final constructed dataset comprised 1,410 thermal infrared images with a resolution of 560×350 pixels, each containing annotated hot spot bounding boxes, split into a training set (80%) and a testing set (20%).

Improved YOLOv4 Network Architecture

YOLOv4 is a state-of-the-art, single-stage object detector known for its excellent balance between speed and accuracy. The standard YOLOv4 architecture consists of: a Backbone (CSPDarknet53) for feature extraction, a Neck (SPP module and PANet) for multi-scale feature aggregation, and a Head (YOLO Head) for final detection prediction. To better suit the hot spot detection task, particularly for potential deployment on edge devices, the backbone network is optimized.

Proposed Improvement: MobileNetV3 Backbone. The original CSPDarknet53, while powerful, is computationally heavy. We replace it with MobileNetV3, a lightweight architecture designed for mobile vision applications. MobileNetV3 combines the depthwise separable convolutions from MobileNetV1, the linear bottleneck and inverted residual structures from MobileNetV2, and incorporates the squeeze-and-excitation (SE) channel attention mechanism. Furthermore, it uses the h-swish activation function, defined as:
$$ \text{h-swish}(x) = x \frac{\text{ReLU6}(x + 3)}{6} $$
which provides a good trade-off between accuracy and computational cost compared to ReLU6. The SE mechanism allows the model to adaptively recalibrate channel-wise feature responses, enhancing informative features (like those corresponding to hot spots) and suppressing less useful ones. To further reduce parameters, large kernel convolutions (e.g., 5×5, 7×7) in the original MobileNetV3 design are decomposed into stacks of 3×3 kernels. This modified backbone significantly reduces the number of parameters and floating-point operations (FLOPs), leading to faster inference while maintaining, and even improving, feature extraction capability for our specific task.

The overall architecture of our improved YOLOv4 is: Input → MobileNetV3 Backbone → SPP Module → PANet Neck → YOLO Head → Output. The SPP module captures multi-scale contextual information, and PANet effectively fuses features from different levels, which is crucial for detecting hot spots of various sizes.

Experimental Validation and Comparative Analysis

The proposed model was implemented using the Keras framework with a TensorFlow backend. Training leveraged transfer learning; the MobileNetV3 backbone was initialized with weights pre-trained on ImageNet. The model was trained on our hot spot dataset with an initial learning rate of 0.001, using a dynamic learning rate scheduler (decay factor of 0.9).

The model’s learning progress and performance were evaluated using standard metrics. The Precision-Recall (PR) curve and the F1-Score curve (where $F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}$) on the test set demonstrate robust performance. Precision measures the accuracy of positive predictions, while Recall measures the ability to find all positive instances. The F1-score is their harmonic mean.

The trained model was then deployed for detection. It successfully localized hot spots in unseen thermal images with high confidence scores, accurately drawing bounding boxes around single and multiple hot spots without significant false positives or misses. Quantitative evaluation on the entire test set yielded the following results for our improved model: an Average Precision (AP) of 93.42%, an Intersection over Union (IoU) of 92.31%, a Precision of 94.36%, and a Recall of 92.27%. IoU is calculated as:
$$ \text{IoU} = \frac{\text{Area of Overlap}}{\text{Area of Union}} $$
between the predicted bounding box and the ground truth box.

To rigorously validate our approach, we compared it against three other prominent object detection models—SSD, Faster R-CNN, and the original YOLOv4—under identical experimental conditions (same hardware, training strategy, and dataset).

Model	Average Precision (AP) (%)	IoU (%)	Precision (%)	Recall (%)
SSD	87.29	89.69	87.54	85.39
Faster R-CNN	92.17	93.61	91.26	90.68
Original YOLOv4	90.03	91.81	91.28	90.12
Improved YOLOv4 (Ours)	93.42	92.31	94.36	92.27

Qualitative and Quantitative Analysis: Visually, while all models could detect obvious hot spots, SSD often missed detections in complex multi-hot-spot scenarios. Faster R-CNN and the original YOLOv4 showed occasional misses or false alarms on ambiguous targets. Our improved YOLOv4 model demonstrated the most consistent and accurate detection across the test suite. Quantitatively, as shown in the table, our model achieved the highest AP and Precision. The improvement in Precision (94.36%) is notable, being 6.82%, 3.10%, and 3.08% higher than SSD, Faster R-CNN, and original YOLOv4, respectively. The integration of the MobileNetV3 backbone with its SE attention mechanism is a key factor, enabling the model to focus more effectively on discriminative features associated with hot spots on solar panels, thereby enhancing detection accuracy and reliability.

Conclusion

This study conducted a comprehensive investigation into hot spot detection for solar panels using thermal infrared imagery. The analysis of traditional image processing methods, including region segmentation and edge detection techniques, revealed their inherent limitations: a strong dependency on high image quality and susceptibility to noise and background clutter, leading to frequent false positives and missed detections. These shortcomings highlight the need for more robust solutions.

To address these challenges, the research pivoted to deep learning-based object detection. An improved YOLOv4 model was proposed, featuring a lightweight and effective MobileNetV3 backbone equipped with a channel attention mechanism. This architectural modification optimized the network for the specific task, reducing computational complexity while enhancing feature discriminability. A dedicated dataset was constructed using a combination of real and simulated hot spots to facilitate effective training.

Experimental results demonstrate the superiority of the proposed approach. The improved YOLOv4 model achieved an AP of 93.42% and a detection precision of 94.36%, outperforming several state-of-the-art detectors including the original YOLOv4, SSD, and Faster R-CNN in the comparative study. The model exhibits consistent and accurate detection capabilities even in challenging scenarios with multiple or subtle hot spots. This research confirms the significant potential of optimized deep learning models for automated, reliable, and efficient health monitoring of photovoltaic systems, offering substantial engineering application value for the maintenance of solar panels.