1. Introduction
In the context of global efforts toward carbon neutrality, solar energy systems have become indispensable components of clean energy infrastructure. Among these systems, the solar inverter plays a critical role in converting direct current (DC) generated by photovoltaic (PV) arrays into grid-compatible alternating current (AC). However, the reliability and efficiency of solar inverters are heavily dependent on the health of their switching components, particularly insulated gate bipolar transistors (IGBTs). Open-circuit faults in IGBTs can lead to increased harmonic distortion, degraded power quality, and even cascading failures. Traditional fault diagnosis methods, whether non-invasive (e.g., infrared thermography) or invasive (e.g., electrical parameter measurement), face challenges such as low accuracy, high computational cost, or operational disruptions. This study proposes a machine learning-driven approach to diagnose IGBT open-circuit faults in neutral-point-clamped (NPC) three-level solar inverters, leveraging a novel combination of fault features and the random forest (RF) classifier.

2. Methodology
2.1 System Architecture
The grid-connected PV system comprises:
- PV Array: Generates DC power under varying irradiance (200–1000 W/m²).
- Boost Converter: Elevates DC voltage to 750 V for inverter input.
- NPC Three-Level Inverter: Converts DC to AC using 12 IGBTs (4 per phase).
- LCL Filter: Reduces harmonic content in the output current.
- Control System: Implements dual-loop control (voltage and current) and maximum power point tracking (MPPT).
The NPC inverter operates in three states per phase: high (P), neutral (O), and low (N). IGBTs are labeled Q1Q1–Q12Q12, with switching states defined as:State={1(IGBT ON)0(IGBT OFF)State={10(IGBT ON)(IGBT OFF)
2.2 Fault Signal Combination
Four fault features are extracted for diagnosis:
- Three-Phase Currents (iA,iB,iCiA,iB,iC): Sampled at 100 kHz.
- Concordia-Transformed Currents (Iα,IβIα,Iβ):
Iα=23(iA−12iB−12iC)Iβ=23(32iB−32iC)Iα=32(iA−21iB−21iC)Iβ=32(23iB−23iC)
- Active Power (PP) and Reactive Power (QQ): Calculated using instantaneous power theory.
2.3 Fault Simulation
A Simulink model replicates 24 types of double IGBT open-circuit faults (Table 1). Faults are introduced at 0.54 s during a 1 s simulation, with data collected from 0.5–0.6 s to capture transient effects.
Table 1: IGBT Fault Labeling
| Fault Type | Phase | Faulty IGBTs | Label |
|---|---|---|---|
| In-Phase Double-Tube | A | Q1,Q2Q1,Q2 | 0 |
| A | Q2,Q3Q2,Q3 | 1 | |
| … | … | … | … |
| Cross-Phase | B & C | Q5,Q12Q5,Q12 | 23 |
3. Machine Learning Framework
3.1 Data Preprocessing
- Dimensionality Reduction: Principal component analysis (PCA) retains >99% variance.
- Dataset Split: 3,360 samples (70% training, 30% testing).
3.2 Classifiers
Four algorithms are evaluated:
- Decision Tree (DT)
- k-Nearest Neighbors (KNN)
- LightGBM
- Random Forest (RF)
3.3 Performance Metrics
- Accuracy: Accuracy=TP+TNTP+TN+FP+FNAccuracy=TP+TN+FP+FNTP+TN
- F1 Score: F1=2×Precision×RecallPrecision+RecallF1=2×Precision+RecallPrecision×Recall
- AUC-ROC: Area under the receiver operating characteristic curve.
4. Results and Analysis
4.1 Classification Performance
Table 2: Classifier Comparison
| Classifier | F1 Score | Accuracy (%) | Training Time (s) |
|---|---|---|---|
| DT | 0.9505 | 95.04 | 0.1009 |
| KNN | 0.9632 | 96.33 | 0.0006 |
| LightGBM | 0.9950 | 99.50 | 0.6313 |
| RF | 0.9990 | 99.90 | 1.4841 |
The RF classifier achieves near-perfect accuracy (99.90%), outperforming other models due to its ensemble structure and robustness to noise.
4.2 Feature Importance
RF distributes importance evenly across features (Figure 1), reducing dependency on any single parameter. In contrast, DT and LightGBM overemphasize specific features, increasing vulnerability to outliers.
Figure 1: Feature Importance Scores
| Feature | RF | DT | LightGBM |
|---|---|---|---|
| iAiA | 0.18 | 0.35 | 0.28 |
| iBiB | 0.17 | 0.22 | 0.31 |
| IαIα | 0.20 | 0.18 | 0.15 |
| PP | 0.22 | 0.15 | 0.12 |
| 0.23 | 0.10 | 0.14 |
4.3 ROC Analysis
The RF classifier attains an AUC of 1.0, demonstrating perfect separability between fault classes (Figure 2). DT (AUC = 0.96) shows higher false positives at extreme thresholds.
Figure 2: AUC-ROC Curves
| Classifier | AUC |
|---|---|
| DT | 0.96 |
| KNN | 1.00 |
| LightGBM | 1.00 |
| RF | 1.00 |
5. Discussion
The proposed method addresses key limitations in solar inverter fault diagnosis:
- Efficiency: RF processes 3,360 samples in <57 s, enabling real-time applications.
- Robustness: Concordia transformation and PCA mitigate noise and dimensionality issues.
- Scalability: The framework adapts to variable irradiance (200–1000 W/m²) and multi-level inverters.
Compared to existing techniques (e.g., SVM, CNN), RF avoids overfitting and computational overhead while maintaining high accuracy.
6. Conclusion
This study presents a robust machine learning framework for diagnosing IGBT open-circuit faults in NPC three-level solar inverters. By combining三相currents, Concordia-transformed signals, active power, and reactive power, the RF classifier achieves 99.90% accuracy, surpassing DT, KNN, and LightGBM. The method’s efficiency, scalability, and noise immunity make it suitable for real-world solar inverter maintenance, ensuring grid stability and longevity in renewable energy systems.
