In the context of global efforts toward carbon peaking and carbon neutrality, solar power generation systems have gained unprecedented development opportunities as a crucial component of clean energy. Solar inverters, being the core equipment in these systems, have seen continuous technological advancements. Among various multilevel inverter (MLI) topologies, the neutral-point-clamped (NPC) three-level inverter is widely adopted in solar power applications due to its cost-effectiveness, low switching losses, and high output voltage. However, the reliability of solar inverters is paramount, and switch device failures, particularly insulated gate bipolar transistor (IGBT) open-circuit faults, can severely impact system performance. When an IGBT fails, the solar inverter may continue operating temporarily, but prolonged neglect can lead to increased total harmonic distortion (THD) in the output current, affecting grid integration and potentially causing secondary faults. Since short-circuit faults are often rapidly converted to open-circuit faults by protection mechanisms, this study focuses exclusively on IGBT open-circuit faults in solar inverters.
Existing fault diagnosis methods for solar inverters can be broadly classified into non-contact and contact approaches. Non-contact methods, such as infrared thermography, acoustic monitoring, and electromagnetic radiation sensing, utilize external devices to monitor inverter status without physical intrusion. While these methods avoid disrupting normal operation, they face challenges in diagnostic accuracy and susceptibility to external interference. Contact methods, on the other hand, involve direct measurement of electrical parameters at the solar inverter output or physical inspection of components. Although these methods offer improved precision, they often entail increased time and cost. Recent years have witnessed the growing application of machine learning in solar inverter fault diagnosis, particularly in combination with contact-based methods. Techniques such as feature matrix-based joint approximate diagonalization-independent component analysis (JADE-ICA) with neural networks (NN), ICEEMDAN-FE with support vector machines (SVM), current path tracing with voltage vector residual evaluation, and Markov transition field-convolutional neural networks (MTF-CNN) have demonstrated high accuracy. However, these methods often involve extensive data analysis, which can reduce diagnostic efficiency. To address this, we propose a novel fault signal combination integrated with machine learning for efficient and accurate diagnosis of IGBT open-circuit faults in solar inverters.
Our approach involves designing a unique set of fault signals, including three-phase currents at the solar inverter output, Concordia-transformed currents, active power, and reactive power. These signals are processed using machine learning algorithms for fault classification and identification. We begin by constructing a simulation model in MATLAB/Simulink, incorporating a custom fault module. Data for the fault signals are collected from the solar inverter output under various conditions. Machine learning models are then trained and tested on this dataset. Simulation results indicate that the random forest (RF) classifier outperforms decision tree (DT), k-nearest neighbors (KNN), and LightGBM classifiers in terms of accuracy, stability, and robustness for solar inverter fault diagnosis.

The solar power grid-connected system comprises several key components: a photovoltaic (PV) array, a Boost DC-DC converter for voltage step-up, a bridge balance circuit for current regulation, an NPC three-level solar inverter, an LCL filter for harmonic suppression, a three-phase voltage source, and a dual-loop control system. The PV array generates direct current (DC), which is boosted by the Boost converter to an appropriate level for the solar inverter. The NPC solar inverter then converts DC to alternating current (AC). The bridge balance circuit mitigates current asymmetry and ripple, enhancing device safety and lifespan. The LCL filter reduces high-frequency harmonics before the AC is fed into the grid. Maximum power point tracking (MPPT) is employed to optimize the PV array’s power output under varying irradiance conditions, thereby improving energy efficiency and grid power quality in solar inverter systems.
The NPC three-level solar inverter operates in three states per phase: high level (P), zero level (O), and low level (N). Considering phase A as an example, in the P state, IGBTs Q1 and Q2 are turned on (state “1”), while Q3 and Q4 are off (state “0”). The output voltage is \( U_{dc}/2 \), where \( U_{dc} \) is the input voltage, regardless of current direction. In the O state, Q2 and Q3 are on, and the output voltage is 0. In the N state, Q3 and Q4 are on, yielding an output voltage of \( -U_{dc}/2 \). The voltage polarity remains unchanged with current direction, and the upper and lower IGBT pairs (Q1-Q2 and Q3-Q4) maintain a complementary relationship. This operational principle is fundamental to understanding fault behaviors in solar inverters.
IGBT open-circuit faults in the NPC solar inverter can be categorized into same-phase and cross-phase double-switch faults. For same-phase faults in phase A, if Q1 and Q2 fail open, the P and O states are disrupted, resulting in a current waveform that loses the positive half-cycle. If Q2 and Q3 fail, all states are affected, and the output current approaches zero. If Q3 and Q4 fail, only the N state is functional, leading to a waveform with only the negative half-cycle. Cross-phase faults involve IGBTs in different phases, and their analysis relies on simulation due to the interconnected nature of the solar inverter system. The impact of these faults on solar inverter output underscores the need for reliable diagnostic methods.
To simulate these faults, we developed a model in Simulink using sinusoidal pulse width modulation (SPWM) and voltage-current dual-loop control. The model represents a PV array with 47 series and 10 parallel connections, capable of a maximum rated power of approximately 100 kW. The DC side voltage is 750 V, and the grid-side line voltage is 380 V at 50 Hz. Under normal conditions, with an irradiance of 1000 W/m² and temperature of 25°C, the solar inverter output waveforms are sinusoidal. The total harmonic distortion (THD) is calculated using fast Fourier transform (FFT), yielding a value of 0.41%, which is below the 5% threshold for grid compliance, confirming the model’s validity for solar inverter applications.
A fault module was designed using Constant, Switch, and Step blocks in Simulink to induce IGBT open-circuit faults at specific times (e.g., 0.54 s) by setting the gate signal to “0”. This module allows controlled fault injection without altering other circuit parameters. We considered 24 fault scenarios, including same-phase and cross-phase double-switch faults, each assigned a unique label from 0 to 23. For instance, same-phase faults in phase A for Q1-Q2, Q2-Q3, and Q3-Q4 are labeled 0, 1, and 2, respectively. Cross-phase faults between phases A and B, such as Q1-Q5, are labeled 9. This comprehensive fault set enables thorough testing of the diagnostic method for solar inverters.
For fault feature selection, we leverage the three-phase currents at the solar inverter output, Concordia-transformed currents, active power, and reactive power. The Concordia transform, which converts three-phase currents to α-β coordinates, helps extract essential features while reducing data dimensionality. The transformation is given by:
$$ I_{\alpha} = \frac{2}{3} \left( i_A – \frac{1}{2} i_B – \frac{1}{2} i_C \right) $$
$$ I_{\beta} = \frac{2}{3} \left( \frac{\sqrt{3}}{2} i_B – \frac{\sqrt{3}}{2} i_C \right) $$
This transformation preserves the inherent characteristics of the solar inverter and simplifies subsequent analysis. Active power \( P \) and reactive power \( Q \) are derived from the output voltages and currents. Under fault conditions, these signals exhibit distinct patterns, as observed in simulation waveforms. For example, a fault in Q1-Q2 causes phase A current to lose its positive half-cycle, while Concordia currents show corresponding distortions. Active and reactive power plots also display deviations from normal operation. These features are crucial for machine learning-based diagnosis in solar inverters.
Data collection involved simulating each fault scenario under 20 different irradiance levels ranging from 200 to 1000 W/m² to mimic real-world conditions. The simulation time was 1 s, with a sampling frequency of 100 kHz. Data from the interval 0.5 s to 0.6 s (around the fault inception at 0.54 s) were extracted for three-phase currents, Concordia currents, active power, and reactive power. A total of 3,360 fault samples were gathered, stored in CSV format for further processing. This dataset represents a wide range of operating conditions for solar inverters.
To handle the high dimensionality of the feature set, we applied principal component analysis (PCA). PCA transforms the original features into a set of orthogonal components, retaining over 99% of the variance to preserve essential information. This reduction mitigates the risk of overfitting and improves computational efficiency for solar inverter fault diagnosis. The dataset was split into 70% for training and 30% for testing across all classifiers.
We evaluated four machine learning classifiers: decision tree (DT), k-nearest neighbors (KNN), LightGBM, and random forest (RF). The performance was assessed using confusion matrices, F1 scores, accuracy, recall, and area under the receiver operating characteristic curve (AUC-ROC). The RF classifier demonstrated superior results, with an accuracy of 99.90%, F1 score of 0.999, and recall of 99.90%. In contrast, DT and KNN showed lower accuracy (95.04% and 96.33%, respectively) and were more prone to misclassification, as evident in their confusion matrices. LightGBM performed well (99.50% accuracy) but slightly lagged behind RF. The processing times for all classifiers were minimal, with total times under 57 seconds, ensuring practicality for real-time solar inverter diagnostics.
| Classifier | F1 Score | Recall (%) | Accuracy (%) | Data Processing Time (s) | Training Time (s) | Testing Time (s) | Total Time (s) |
|---|---|---|---|---|---|---|---|
| DT | 0.950503 | 95.03 | 95.04 | 55.4414 | 0.1009 | 0.0006 | 55.5429 |
| KNN | 0.963205 | 96.32 | 96.33 | 55.4414 | 0.0006 | 0.1871 | 55.6291 |
| LightGBM | 0.995039 | 99.50 | 99.50 | 55.4414 | 0.6313 | 0.0180 | 56.0907 |
| RF | 0.999008 | 99.90 | 99.90 | 55.4414 | 1.4841 | 0.0149 | 56.9404 |
Feature importance analysis revealed that RF distributes importance more evenly across features, reducing reliance on any single signal and enhancing robustness to noise. DT and LightGBM, however, showed higher dependence on specific features, making them vulnerable to feature noise. For example, in RF, Concordia currents and active power had balanced importance scores, whereas in DT, one feature dominated. This uniformity in RF contributes to its stability in solar inverter fault diagnosis.
The AUC-ROC curves further validated RF’s superiority. RF achieved an AUC of 1.0, indicating perfect classification under varying thresholds. At high thresholds, RF maintained a high true positive rate with minimal false positives, whereas DT (AUC = 0.96) struggled with false positives at low thresholds and missed true positives at high thresholds. KNN and LightGBM also achieved AUCs of 1.0 but showed slight limitations in robustness compared to RF. These results highlight RF’s consistent performance across operational conditions for solar inverters.
In conclusion, we have presented a novel fault diagnosis approach for IGBT open-circuit faults in NPC three-level solar inverters. By combining inverter output three-phase currents, Concordia-transformed currents, active power, and reactive power with machine learning, we achieve high diagnostic accuracy. The RF classifier, in particular, excels with 99.90% accuracy, outperforming DT, KNN, and LightGBM. Its balanced feature importance and robust AUC-ROC characteristics make it ideal for solar inverter applications. Future work could explore real-time implementation and adaptation to other solar inverter topologies to further enhance reliability in renewable energy systems.
The mathematical foundation of the Concordia transform is critical for feature extraction in solar inverters. The α-β currents are derived as follows:
$$ I_{\alpha} = \frac{2}{3} \left( i_A – \frac{1}{2} i_B – \frac{1}{2} i_C \right) $$
$$ I_{\beta} = \frac{2}{3} \left( \frac{\sqrt{3}}{2} i_B – \frac{\sqrt{3}}{2} i_C \right) $$
Additionally, active power \( P \) and reactive power \( Q \) are calculated using:
$$ P = \frac{1}{T} \int_{0}^{T} (v_A i_A + v_B i_B + v_C i_C) \, dt $$
$$ Q = \frac{1}{T} \int_{0}^{T} \frac{1}{\sqrt{3}} \left[ (v_B – v_C) i_A + (v_C – v_A) i_B + (v_A – v_B) i_C \right] \, dt $$
where \( v_A, v_B, v_C \) are the phase voltages, and \( i_A, i_B, i_C \) are the phase currents. These equations form the basis for feature computation in solar inverter fault diagnosis.
The PCA transformation can be expressed as:
$$ \mathbf{Y} = \mathbf{X} \mathbf{W} $$
where \( \mathbf{X} \) is the original feature matrix, \( \mathbf{W} \) is the projection matrix composed of eigenvectors, and \( \mathbf{Y} \) is the transformed feature set. This reduction is essential for efficient machine learning processing in solar inverter systems.
In summary, the integration of these features with RF classification provides a robust solution for fault detection in solar inverters, ensuring high performance across diverse operating conditions. The methodology underscores the importance of feature engineering and classifier selection in advancing solar inverter reliability and grid stability.
