In the field of renewable energy, solar panels are critical components for harnessing solar power. However, the efficiency of solar panels is significantly compromised by the accumulation of dust, debris, and shadows on their surfaces. To address this, automated cleaning robots have been developed, but their effectiveness hinges on accurate target recognition. Traditional methods often struggle with variations in lighting and environmental interference, leading to suboptimal performance. In this article, we propose a novel target recognition method for solar panel cleaning robots that integrates binocular vision systems and Gabor filters. This approach aims to enhance the precision and robustness of identifying cleaning targets, such as dirt and obstructions, on solar panels. By leveraging three-dimensional spatial information and texture feature extraction, we strive to overcome the limitations of existing techniques and contribute to the advancement of intelligent maintenance for solar energy systems.
The core of our method lies in three key steps: hand-eye calibration for the robot, texture feature extraction using binocular vision and Gabor filters, and target identification via a support vector machine (SVM) classifier. We begin by establishing a mathematical model for the robot’s motion and performing hand-eye calibration to align the coordinate systems of the solar panels and the robot. This ensures that the robot can accurately perceive and interact with its environment. Next, we employ binocular vision to capture images of the solar panels, construct a 3D model through stereo matching and disparity computation, and then apply Gabor filters to extract multi-scale and multi-directional texture features from the solar panel surfaces. These features serve as inputs to an SVM-based classifier, which outputs the centroid coordinates of cleaning targets, enabling precise localization and recognition. Throughout this process, we emphasize the importance of solar panels as the primary context, and our experiments demonstrate the method’s efficacy in real-world scenarios.

Hand-eye calibration is a fundamental step in robotics vision, particularly for solar panel cleaning robots, as it bridges the gap between the robot’s coordinate system and the solar panel’s coordinate system. This alignment is crucial for accurate target recognition and subsequent cleaning operations. We model the robot’s pose states using homogeneous transformation matrices. Let $C$ represent the transformation matrix derived from the robot’s encoder outputs at different poses. It is expressed as:
$$ C = \begin{bmatrix} a_{11}/j_0 & a_{12}/j_0 & \cdots & a_{1\nu}/j_0 \\ a_{21}/j_0 & a_{22}/j_0 & \cdots & a_{2\nu}/j_0 \\ \vdots & \vdots & \ddots & \vdots \\ a_{\sigma 1}/j_0 & a_{\sigma 2}/j_0 & \cdots & a_{\sigma\nu}/j_0 \end{bmatrix} $$
where $a_{\sigma\nu}$ denotes the coordinate solution value at the $\sigma$-th encoder output with value $\nu$, and $j_0$ is a scaling factor. For our system, we adopt an eye-in-hand configuration, meaning the camera is mounted on the robot’s end-effector. The mathematical model for this setup is given by:
$$ X_n = \|C\| \times \exp\left(-\frac{\omega_s \cdot dt}{\|\phi_c / L_0\|^2}\right) $$
Here, $\omega_s$ represents the confidence value of calibration, $dt$ is the cross-entropy loss function, $\phi_c$ is an intermediate calibration variable, and $L_0$ is a resolution coefficient. The inverse transformation matrix $T’$, which relates the camera to the robot’s end-effector, is computed as:
$$ T’ = X_n \sum_{i=1}^{N} \frac{G_0 \|\mu_c\|}{y_s} $$
In this equation, $N$ is the number of robot pose types, $G_0$ is the hyperbolic tangent function, $\mu_c$ is the fixed transformation matrix, and $y_s$ is the minimum depth distance of the visual camera. To refine the calibration, we introduce error correction using least squares minimization. The attribute constraint $E_r$ and calibration error $F_t$ are defined as:
$$ E_r = \|T’\| \cdot \beta_0 \cdot p_0 $$
$$ F_t = \frac{E_r}{\|\zeta\| / h_m} $$
where $\beta_0$ is a regularization coefficient, $p_0$ is a gradient operator, $\zeta$ is the sensing matrix within the camera’s field of view, and $h_m$ is the rotation angle of the end-effector. By incorporating slack variables, we correct the calibration error and obtain the coordinates of the solar panel in the robot’s workspace:
$$ O_z = F_t \times \zeta_f \times b_j $$
Here, $\zeta_f$ is a scale-variable Gaussian function, and $b_j$ is a slack variable. This calibration process ensures that the solar panel’s position is accurately mapped, facilitating reliable target recognition. The parameters involved in hand-eye calibration are summarized in Table 1, which outlines key variables and their descriptions for clarity.
| Parameter | Symbol | Description |
|---|---|---|
| Transformation Matrix | $C$ | Homogeneous matrix from encoder outputs |
| Scaling Factor | $j_0$ | Normalizes coordinate values |
| Confidence Value | $\omega_s$ | Indicates calibration reliability |
| Cross-Entropy Loss | $dt$ | Measures calibration error |
| Intermediate Variable | $\phi_c$ | Used in calibration model |
| Resolution Coefficient | $L_0$ | Adjusts model sensitivity |
| Inverse Transformation | $T’$ | Relates camera to end-effector |
| Fixed Matrix | $\mu_c$ | Constant transformation component |
| Minimum Depth | $y_s$ | Camera’s closest perceptible distance |
| Attribute Constraint | $E_r$ | Derived from calibration model |
| Calibration Error | $F_t$ | Final error after correction |
| Workspace Coordinates | $O_z$ | Solar panel position in robot frame |
Following calibration, we focus on texture feature extraction from the solar panels using binocular vision and Gabor filters. Binocular vision mimics human stereoscopic perception, enabling the capture of depth information from solar panel surfaces. We simultaneously acquire images from two cameras and perform stereo matching to align corresponding pixels. The matching factor $X_S$ is calculated as:
$$ X_S = O_z \times \lg\left(\frac{\tau_t}{v_b \times w_0}\right) $$
where $\tau_t$ is a pixel comparison function, $v_b$ denotes the matching direction of the reference pixel, and $w_0$ is the number of pixels in the reference neighborhood. Under epipolar constraints, matching proceeds from right to left until the distance from the reference pixel to the image edge reaches a preset threshold. The disparity $A_u$ is then computed as:
$$ A_u = \frac{X_S}{\chi_h (1 + f_r)} $$
Here, $\chi_h$ represents the matching window size, and $f_r$ is the disparity search range. Using this disparity, we construct a 3D model of the solar panels, which provides a spatial representation for further analysis. To extract texture features, we employ a bank of Gabor filters, defined by a modulation function. The filter output $g_y$ is given by:
$$ g_y = \delta_e \times \omega_k, \quad \delta_e = A_u \cdot \theta_s \cdot s_e $$
where $\omega_k$ is the number of frequencies in the filter, $\delta_e$ is a discrete coefficient, $\theta_s$ is the frequency of the modulation function, and $s_e$ is the rotation angle of the filter. By convolving the grayscale image of the solar panels with these filters at multiple scales and orientations, we obtain feature response values $D_y$:
$$ D_y = \frac{g_y}{\|\gamma_c\|} $$
In this formula, $\gamma_c$ denotes the spatial aspect ratio of the filter bank. The texture features $W_q$ of the solar panels are derived by aggregating these responses:
$$ W_q = \sum_{t=1}^{Q} D_y \times \zeta_t $$
where $Q$ is the total number of feature responses, and $\zeta_t$ is the wavelength of the $t$-th filter. This process captures intricate details on the solar panel surfaces, such as dust patterns or shadows, which are essential for distinguishing cleaning targets. The parameters for binocular vision and Gabor filtering are detailed in Table 2, highlighting the configuration used in our method.
| Parameter | Value | Description |
|---|---|---|
| Image Resolution | 1920 px × 1080 px | Dimensions of captured solar panel images |
| Frame Rate | 30 fps | Speed of image acquisition |
| Focal Length | 8 mm | Camera lens focal length |
| Stereo Baseline | 120 mm | Distance between cameras for depth perception |
| Gabor Frequency Range | 0.05–0.5 Hz | Range for multi-scale texture analysis |
| Gabor Orientation Angles | 0°–170° | Directions for filtering solar panel textures |
| Matching Window Size | $\chi_h$ = 5 px | Size of window for stereo matching |
| Disparity Search Range | $f_r$ = 50 px | Range for computing depth from solar panels |
| Filter Spatial Aspect Ratio | $\gamma_c$ = 0.5 | Aspect ratio of Gabor filters |
With the texture features extracted, we proceed to target recognition on the solar panels. We use a sliding window approach to detect edge points on the solar panel surfaces, defined as:
$$ F_e = \frac{W_q}{\rho_g \times \varphi_d} $$
where $\rho_g$ is a linear kernel function, and $\varphi_d$ is the dimension of the transformed space. An SVM classifier is then constructed to differentiate between clean solar panel areas and regions requiring cleaning. The decision function of the SVM is formulated as:
$$ a(F_e) = F_e \times \exp\left(-\frac{\iota_c}{d_r}\right) $$
Here, $\iota_c$ is an affine transformation matrix, and $d_r$ is the depth of spatial points. This function generates a classifier $F_L$ for identifying targets:
$$ F_L = a(F_e) \times \varepsilon \times \psi_t $$
where $\varepsilon$ is an empirical constant, and $\psi_t$ is the width of the sliding window. Within the detected edge regions, we apply a tracking algorithm to determine pixel values in the target area:
$$ E” = R_k \times P_q $$
In this equation, $R_k$ is a simulation parameter, and $P_q$ is an orthogonal unit rotation matrix. By inputting $E”$ into the classifier, we obtain the centroid coordinates $(x_h, y_h)$ of the cleaning target on the solar panels:
$$ x_h(X) = E” \times \eta_j, \quad y_h(X) = \frac{E”}{d} $$
where $\eta_j$ is a hyperplane parameter, and $d$ is a binarization coefficient. This process enables precise localization of dirt, dust, or other obstructions on the solar panels, guiding the robot’s cleaning actions. To illustrate the workflow, the key equations involved in target recognition are summarized in Table 3, which provides a concise reference for the mathematical framework.
| Step | Equation | Purpose |
|---|---|---|
| Edge Detection | $F_e = \frac{W_q}{\rho_g \times \varphi_d}$ | Identifies boundaries on solar panel surfaces |
| SVM Decision Function | $a(F_e) = F_e \times \exp\left(-\frac{\iota_c}{d_r}\right)$ | Classifies texture features from solar panels |
| Classifier Generation | $F_L = a(F_e) \times \varepsilon \times \psi_t$ | Produces target recognition model for solar panels |
| Target Tracking | $E” = R_k \times P_q$ | Computes pixel values in target regions on solar panels |
| Centroid Localization | $x_h(X) = E” \times \eta_j, \quad y_h(X) = \frac{E”}{d}$ | Outputs coordinates of cleaning targets on solar panels |
To validate our method, we conducted extensive experiments using a GZCR-2000 solar panel cleaning robot. This robot is designed for autonomous operation on solar panels, capable of removing sand, dust, and snow. Its dimensions are 33 cm × 48 cm × 24 cm, with a weight of 40 kg, making it suitable for various solar panel configurations. We configured a binocular vision system with the parameters listed in Table 2 to capture images of the solar panels. The system included cameras, a calibration board, UWB base stations, adjustable LED lights, motor drivers, and sensors. The LED lights allowed us to simulate different lighting conditions, such as bright sunlight or low-light environments, to test the robustness of our target recognition on solar panels. During calibration, we used a checkerboard pattern with a minimum grid size of 60 px × 60 px, and we performed stereo calibration to obtain intrinsic parameters for the cameras. The Gabor filters were applied with frequency steps of 0.05 Hz and orientation increments of 10°, ensuring comprehensive texture analysis of the solar panels. For the SVM classifier, we selected an RBF kernel and optimized hyperparameters through grid search cross-validation, setting the penalty coefficient $C = 0.1$ and the RBF parameter $\gamma = 0.1$. The dataset consisted of images from solar panels, split into 80% for training and 20% for testing, to evaluate recognition performance.
The experimental results demonstrated the effectiveness of our method. We applied our algorithm to recognize targets on solar panels, such as dust accumulations and shadows. The recognition outcomes showed high accuracy, with misjudgment rates consistently below 0.25% across multiple trials. For comparison, we evaluated two alternative methods: a Kalman filter-based approach and a deep learning-based technique. As shown in Table 4, our method outperformed both in terms of misjudgment rates, highlighting its superiority for solar panel cleaning tasks. The table summarizes the misjudgment percentages from eight experimental runs, illustrating the consistency of our approach.
| Experiment Number | Kalman Filter Method (%) | Deep Learning Method (%) | Our Method (%) |
|---|---|---|---|
| 1 | 0.47 | 0.36 | 0.14 |
| 2 | 0.66 | 0.41 | 0.10 |
| 3 | 0.57 | 0.29 | 0.06 |
| 4 | 0.49 | 0.33 | 0.02 |
| 5 | 0.55 | 0.47 | 0.15 |
| 6 | 0.60 | 0.52 | 0.11 |
| 7 | 0.41 | 0.56 | 0.20 |
| 8 | 0.39 | 0.58 | 0.22 |
In addition to accuracy, we assessed the recognition time, a critical factor for real-time operation on solar panels. Table 5 presents the time consumption in milliseconds for each method across the same experimental runs. Our method exhibited lower latency, with times ranging from 2.1 ms to 2.8 ms, compared to the other methods that often exceeded 3 ms. This efficiency stems from the streamlined hand-eye calibration and feature extraction processes, which reduce computational overhead when processing images of solar panels.
| Experiment Number | Kalman Filter Method (ms) | Deep Learning Method (ms) | Our Method (ms) |
|---|---|---|---|
| 1 | 3.8 | 3.3 | 2.6 |
| 2 | 3.3 | 3.4 | 2.5 |
| 3 | 3.4 | 3.7 | 2.1 |
| 4 | 3.7 | 3.5 | 2.8 |
| 5 | 3.5 | 3.3 | 2.3 |
| 6 | 3.6 | 3.4 | 2.4 |
| 7 | 3.5 | 3.1 | 2.7 |
| 8 | 3.1 | 3.6 | 2.5 |
To further quantify performance, we computed the AUC (Area Under Curve) values from ROC curves, which measure the trade-off between true positive and false positive rates. For solar panel cleaning robots, an AUC above 0.7 is typically required for reliable target recognition. Our method achieved AUC values exceeding 0.88, surpassing both the threshold and the comparative methods. This indicates a high level of discriminative power in identifying cleaning targets on solar panels, even under challenging conditions like variable illumination. The integration of binocular vision provides depth cues that enhance spatial understanding, while Gabor filters capture textural nuances that are invariant to lighting changes. This combination proves particularly effective for solar panels, where surface reflections and shadows can obscure targets.
In conclusion, our proposed method for target recognition in solar panel cleaning robots, based on binocular vision and Gabor filters, offers significant improvements in accuracy and efficiency. By meticulously calibrating the robot’s vision system, extracting robust texture features from solar panels, and employing an SVM classifier, we achieve precise localization of cleaning targets with minimal error. The experimental validation confirms that our approach reduces misjudgment rates to below 0.25% and maintains low recognition times, making it suitable for real-world deployment on solar panels. However, we acknowledge that the computational complexity of Gabor filters, involving complex operations, may pose challenges for real-time processing in resource-constrained environments. Future work could explore hardware acceleration techniques, such as FPGA implementations, to optimize filter computations and further enhance the speed of target recognition on solar panels. Additionally, expanding the method to handle diverse solar panel configurations and environmental scenarios will be crucial for broader applicability. Overall, this research contributes to the advancement of intelligent cleaning solutions for solar energy systems, ensuring that solar panels operate at peak efficiency through automated and accurate maintenance.
