The global energy landscape is undergoing a significant transformation, often termed the “third industrial revolution,” characterized by a decisive shift from fossil fuels towards clean and renewable energy sources. Within this transition, solar photovoltaic (PV) power generation has emerged as a critical component due to its vast potential. While a substantial portion of electricity is still derived from thermal power, the abundant solar resources present a formidable opportunity for sustainable energy substitution. This positive trend lays a solid foundation for the further exploitation of solar energy. However, the practical operation of PV power generation systems is often plagued by a substantial volume of anomalous data. These anomalies stem from various sources, including data transmission errors, sensor malfunctions, and component failures, making system maintenance increasingly crucial.
The photovoltaic inverter is a vital component within a PV generation system, acting as the critical interface between the solar panels and the power grid. Its primary function is to convert the direct current (DC) electricity produced by the panels into alternating current (AC) electricity suitable for grid injection or local consumption. The health status of the photovoltaic inverter directly impacts the safety, stability, and efficiency of the entire power system. Conventional maintenance strategies for photovoltaic inverters often rely on scheduled, periodic checks or reactive repairs after a fault has already occurred. This approach leaves maintenance personnel with limited visibility into the real-time health or impending failure conditions of the photovoltaic inverter. Therefore, developing advanced anomaly detection technology capable of accurately and promptly identifying the health or fault status of photovoltaic inverters is of paramount importance. Such technology can empower operators to formulate scientific and rational inspection and maintenance schedules, ensuring stable system operation while significantly reducing operational and repair costs.

Anomaly detection remains a primary challenge across numerous application domains, including power plants and smart factories. While analyzing data from installed sensors is a standard approach for identifying system irregularities, the increasing number of sensors makes this task progressively more difficult. To enhance the efficiency of analyzing multi-variate sensor data, deep learning-based anomaly detection methods have been developed, building upon traditional statistical fault diagnosis techniques. Deep learning algorithms offer distinct advantages over classical machine learning and statistical methods for anomaly detection, primarily due to their superior capability to extract complex, high-dimensional features from data. Among deep learning approaches, Generative Adversarial Networks (GANs) have become a mainstream method in image and video-based anomaly detection because of their ability to model complex, high-dimensional image distributions. The core assumption in methods like AnoGAN is that a generative model can learn the latent space of a normal data distribution. It identifies whether given data corresponds to the normal distribution by attempting to remap it back into the learned latent vector space. Furthermore, GANomaly builds upon AnoGAN by introducing an additional encoder network to achieve a more robust encoding of the latent space vectors for normal data.
On another front, Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks are recognized as effective methods for processing sequential data through recurrent connections. However, these methods typically assume full connectivity between variables, making it difficult to interpret correlations between variables as their number increases. The Convolutional Long Short-Term Memory (ConvLSTM) network enhances the standard LSTM by incorporating convolutional operations, making it more effective for extracting spatiotemporal features from data like image sequences. To fully leverage the ConvLSTM structure and intuitively represent multi-variate time-series data, this work investigates transforming time-series data into two-dimensional images. For instance, the Gramian Angular Field (GAF) and Markov Transition Field (MTF) can encode univariate time-series into images by capturing temporal autocorrelation. For multi-variate time-series data, feature matrices based on inner products between variables have been proposed.
Inspired by these advancements, this paper focuses on the anomaly detection for photovoltaic inverter operational data. The core idea involves transforming the multi-variate time-series data from a photovoltaic inverter into a sequence of two-dimensional images and redesigning a GAN-based network structure specifically for anomaly detection, incorporating ConvLSTM. This enables the model to learn the mapping from a sequence of images to the next, thereby analyzing both the temporal correlations within the time-series data and the inter-variable correlations through convolutional filtering. The remainder of this paper is organized as follows: First, a novel anomaly detection framework for photovoltaic inverter multi-variate time-series, based on GAN and image encoding, is proposed. Then, the model’s training process is detailed. Finally, the effectiveness of the method is validated through experiments using real-world multi-variate time-series data from photovoltaic inverters.
Problem Formulation and Framework Overview
The operational data for a photovoltaic inverter predominantly consists of normal samples, with fault-related anomaly samples being rare. This creates a highly imbalanced dataset, making it suitable for approaches where a GAN learns the distribution of normal data to detect deviations. Since GAN-based methods excel at processing image data, this paper proposes converting inverter time-series data into image sequences using an angular field transformation. The framework is based on a modified GANomaly architecture, retrofitted with ConvLSTM layers to handle the sequential nature of the data effectively. ConvLSTM not only retains the advantages of LSTM in processing time-series data and extracting temporal features but also leverages convolutional computations to extract spatial features, thereby improving model accuracy.
Problem Statement
Definition of Anomalous Sample: As illustrated in the concept figure, a sliding window is applied to preprocess the time-series data. Let \( t \) denote a specific time point. The model takes the data from the \( w \) days preceding time \( t \) as a single sample. If a fault occurs within this \( w \)-day window—meaning the operational data exhibits anomalies or the photovoltaic inverter fails to operate normally—the sample is defined as anomalous. Otherwise, it is considered a normal sample.
Formal Description of the Photovoltaic Inverter Time-Series Anomaly Detection Problem: Consider a time-series dataset \( D = \{(X_1, y_1), (X_2, y_2), \ldots, (X_n, y_n)\} \) containing \( n \) samples. Here, \( X_i \in \mathbb{R}^{w \times d} \) represents a photovoltaic inverter time-series data segment, where \( d \) denotes the number of attributes (e.g., voltages, currents, temperature) measuring the health status, and \( w \) is the length of the time-series window. The label \( y_i \in \{0, 1\} \) indicates the health (0) or fault (1) status. The dataset \( D \) is split into a training set \( D_{\text{train}} \) and a test set \( D_{\text{test}} \). Crucially, \( D_{\text{train}} \) contains only healthy (normal) data, while \( D_{\text{test}} \) contains a mix of healthy and faulty data, with healthy samples vastly outnumbering faulty ones.
During the training phase, the model learns on \( D_{\text{train}} \) by minimizing a composite loss function, simultaneously learning the distribution of normal samples in both the sample space and a deep latent space. During the testing phase on \( D_{\text{test}} \), an anomaly score \( A(X) \) is computed for each sample. Samples not observed during training (i.e., faults) are expected to yield higher \( A(X) \) values. An optimal threshold \( \phi \) is selected using a criterion such as maximizing the F1-score. Samples with \( A(X) \geq \phi \) are classified as faulty, while those with \( A(X) < \phi \) are classified as healthy.
Gramian Angular Field (GAF) Method
The Gramian Angular Field provides a novel way to represent a time-series by encoding it in a polar coordinate system rather than the Cartesian system. The transformation from an original time-series to a GAF image involves the following steps for a time-series \( X = \{x_1, x_2, \ldots, x_n\} \) of length \( n \):
First, the series is rescaled to the interval \([-1, 1]\) using min-max normalization:
$$ \tilde{x}_i = \frac{(x_i – \max(X)) + (x_i – \min(X))}{\max(X) – \min(X)} $$
Next, the rescaled time-series \( \tilde{X} \) is represented in polar coordinates by encoding the value as the angular cosine and the time stamp as the radius:
$$
\begin{cases}
\theta_i = \arccos(\tilde{x}_i), & -1 \le \tilde{x}_i \le 1, \tilde{x}_i \in \tilde{X} \\
r_i = \frac{t_i}{N}, & t_i \in \mathbb{N}
\end{cases}
$$
where \( t_i \) is the time stamp and \( N \) is a constant factor to adjust the span of the polar coordinate system.
This mapping is bijective for \( \theta \in [0, \pi] \) since \( \cos(\theta) \) is monotonic in this interval. It also preserves absolute temporal relations. The Gramian Angular Summation Field (GASF) is then defined as:
$$ G_{\text{GASF}} = \begin{bmatrix}
\cos(\theta_1 + \theta_1) & \ldots & \cos(\theta_1 + \theta_n) \\
\vdots & \ddots & \vdots \\
\cos(\theta_n + \theta_1) & \ldots & \cos(\theta_n + \theta_n)
\end{bmatrix} = \tilde{X}^T \cdot \tilde{X} – \sqrt{I – \tilde{X}^2}^T \cdot \sqrt{I – \tilde{X}^2} $$
where \( I \) is the unit row vector \([1, 1, \ldots, 1]\). The resulting GAF image is symmetric and encapsulates temporal correlations, allowing reconstruction of the time-series from the image.
For multi-variate photovoltaic inverter data at a single time step (a vector of attributes), a similar angular encoding can be applied to generate a 2D image representing the correlations between different attributes at that instance. This transformation effectively converts the operational state of the photovoltaic inverter into a visual representation, where normal and anomalous states often exhibit distinctly different patterns, which is highly beneficial for deep learning-based detection.
Convolutional LSTM (ConvLSTM)
Long Short-Term Memory networks are adept at transmitting past information to the present and uncovering relationships in sequential data. However, traditional LSTMs use fully-connected layers, which do not explicitly capture spatial or local structural information. The ConvLSTM addresses this by replacing the fully-connected operations within the LSTM gates with convolutional operations. This allows the network to capture spatiotemporal correlations, making it particularly suitable for sequences of images or data with a spatial structure, such as our encoded time-series images.
The key equations for a ConvLSTM cell are as follows:
$$ i_t = \sigma(W_{xi} * x_t + W_{hi} * H_{t-1} + W_{ci} \circ C_{t-1} + b_i) $$
$$ f_t = \sigma(W_{xf} * x_t + W_{hf} * H_{t-1} + W_{cf} \circ C_{t-1} + b_f) $$
$$ C_t = f_t \circ C_{t-1} + i_t \circ \tanh(W_{xc} * x_t + W_{hc} * H_{t-1} + b_c) $$
$$ o_t = \sigma(W_{xo} * x_t + W_{ho} * H_{t-1} + W_{co} \circ C_t + b_o) $$
$$ H_t = o_t \circ \tanh(C_t) $$
where \(*\) denotes the convolution operator, \( \circ \) denotes the Hadamard product, \( \sigma \) is the sigmoid activation function, \( x_t \) is the input tensor, \( H_t \) is the hidden state tensor, \( C_t \) is the cell state tensor, \( W \) terms are convolutional kernels, and \( b \) terms are bias terms.
Proposed Model Architecture
The proposed anomaly detection model is composed of four core subnetworks: a Discriminator (D), a primary Encoder (Enc1), a secondary Encoder (Enc2), and a Decoder (Dec). The overall architecture is depicted in the framework diagram.
1. The Generator (G) – Autoencoder Network: This subnetwork, formed by Enc1 and Dec, functions as the generator in the GAN framework. Its purpose is to learn the distribution of normal input data and reconstruct the input images. The process is formalized as:
$$ z = \text{Enc1}(x), \quad \hat{x} = \text{Dec}(z) $$
The primary encoder, Enc1, compresses the input image \( x \in \mathbb{R}^{w \times d} \) into a bottleneck latent vector \( z \in \mathbb{R}^l \) using ConvLSTM layers, BatchNorm, and LeakyReLU activations. This vector \( z \) is assumed to be the minimal, optimal representation of \( x \). The decoder, Dec, upsamples \( z \) back to the original image dimensions to produce the reconstruction \( \hat{x} \), using transposed convolutional layers, ReLU, BatchNorm, and a final tanh activation.
2. The Secondary Encoder (Enc2): This is a unique component of the proposed method. It has an identical structure to Enc1 but with independently learned parameters. Its role is to compress the reconstructed image \( \hat{x} \) back into a latent vector:
$$ \hat{z} = \text{Enc2}(\hat{x}) $$
The dimension of \( \hat{z} \) is the same as \( z \). The rationale is that for normal samples (seen during training), the reconstruction \( \hat{x} \) should be good, leading to similar latent representations \( z \) and \( \hat{z} \). For anomalous samples, the poor reconstruction should result in a significant discrepancy between \( z \) and \( \hat{z} \).
3. The Discriminator (D): This is a standard convolutional network (based on DCGAN’s discriminator) that classifies its input as real (from the true data distribution) or fake (generated by G). Its adversarial training with the generator ensures that the generator produces increasingly realistic samples.
Model Training and Anomaly Scoring
The model is trained end-to-end by optimizing a combination of three loss functions.
1. Adversarial Loss (\(L_{adv}\)): This ensures the generator produces realistic reconstructions that can fool the discriminator.
$$ L_{adv} = \mathbb{E}_{x \sim p_x}[\log D(x)] + \mathbb{E}_{x \sim p_x}[\log(1 – D(\hat{x}))] $$
The objective is \( \min_G \max_D L_{adv} \).
2. Contextual Loss (\(L_{con}\)): An L1 reconstruction loss that explicitly enforces similarity between the input and its reconstruction, leading to less blurry results than L2.
$$ L_{con} = \mathbb{E}_{x \sim p_x} \| x – \hat{x} \|_1 $$
3. Encoder Loss (\(L_{enc}\)): This is the core loss for anomaly detection. It minimizes the distance between the latent vectors from the original input and its reconstruction.
$$ L_{enc} = \mathbb{E}_{x \sim p_x} \| z – \hat{z} \|_2 $$
The total training loss is a weighted sum:
$$ L_{total} = \omega_{adv} L_{adv} + \omega_{con} L_{con} + \omega_{enc} L_{enc} $$
where \( \omega \) are weighting parameters, typically set to 1 for simplicity.
Anomaly Score: After training on normal data only, an anomaly score \( A(x) \) is computed for a test sample \( x’ \):
$$ A(x’) = \lambda \cdot R(x’) + (1 – \lambda) \cdot L(x’) $$
where:
- \( R(x’) = \| x’ – \hat{x}’ \|_1 \) is the reconstruction score (contextual loss).
- \( L(x’) = \| z’ – \hat{z}’ \|_2 \) is the latent representation score (encoder loss).
- \( \lambda \) is a weighting parameter controlling the relative importance (empirically set to 0.9).
The scores for the test set are then scaled to the range [0, 1]:
$$ \hat{A}(x’) = \frac{A(x’) – \min(A)}{\max(A) – \min(A)} $$
A threshold \( \phi \) is determined by maximizing the F1-score on the test set (or a validation set). Finally, a sample is classified as anomalous if \( \hat{A}(x’) \geq \phi \), and normal otherwise.
Experimental Evaluation
Baseline Methods and Dataset
The proposed model, referred to as ConvLSTM-GAN, is compared against several baseline methods:
- Isolation Forest (IForest): An efficient ensemble method for anomaly detection based on isolating observations.
- Local Outlier Factor (LOF): A density-based method for identifying local outliers.
- Standard LSTM-based Autoencoder: A sequence-to-sequence model using LSTM layers in both encoder and decoder.
- Original GANomaly: The baseline GANomaly model which uses standard convolutional networks without sequential modeling.
Dataset: The experiments use real-world operational data from photovoltaic inverters across five similarly sized PV power stations in Ningxia, China. The multivariate data includes attributes such as three-phase voltages and currents, line-to-line voltages, inverter conversion efficiency, power factor, and internal temperature. Data was collected every 10 minutes from 8:00 to 18:00 daily. For anomaly detection timeliness, the 6 data points within an hour are treated as one sequence. Data from a 12-day period yielded 18,000 data points, forming 3,000 sequences. After preprocessing, the training set contained 2,400 sequences (14,400 points) of normal data only. The test set contained 600 sequences (3,600 points), including 18 anomalous sequences (123 points) related to actual inverter faults.
Experimental Setup and Results
The model was implemented in TensorFlow 2.2 and optimized using the Adam optimizer with a learning rate of \(2 \times 10^{-3}\), and momentums \(\beta_1=0.5, \beta_2=0.999\). The loss weights were set to \(\omega_{adv}=\omega_{con}=\omega_{enc}=1\). The ConvLSTM hidden size was 250. Training was conducted for 2,000 epochs, with model checkpoints saved based on performance to avoid overfitting.
Performance was evaluated using the Area Under the ROC Curve (AUC), Precision (P), Recall (R), and the F1-score. Recall is particularly critical in photovoltaic inverter fault detection, as missing a fault (false negative) can lead to significant operational losses.
| Model | AUC | Precision (P) | Recall (R) | F1-Score |
|---|---|---|---|---|
| Isolation Forest (IForest) | — | 0.1715 | 0.358 | 0.232 |
| Local Outlier Factor (LOF) | — | 0.1724 | 0.041 | 0.066 |
| GANomaly | 0.611 | 0.848 | 0.228 | 0.359 |
| LSTM Autoencoder | 0.838 | 0.850 | 0.548 | 0.667 |
| Proposed ConvLSTM-GAN | 0.912 | 0.909 | 0.625 | 0.741 |
The results clearly demonstrate the superiority of the proposed method. The density-based LOF method performed poorly, likely due to the “curse of dimensionality” affecting distance calculations in high-dimensional space. IForest showed moderate recall but very low precision, indicating many false alarms, and is also sensitive to imbalanced data. The original GANomaly model achieved high precision (0.848) but very low recall (0.228). This suggests that while its convolutional networks are good at identifying clear anomalies, they fail to capture the temporal context crucial for the sequential photovoltaic inverter data, leading to many missed detections. The standard LSTM Autoencoder performed significantly better, confirming the importance of modeling temporal dependencies, achieving an F1-score of 0.667. Finally, the proposed ConvLSTM-GAN model achieved the best overall performance across all metrics, with the highest AUC (0.912), precision (0.909), recall (0.625), and F1-score (0.741). This validates the core hypothesis: combining the adversarial and latent-space consistency training of GANomaly with the spatiotemporal feature extraction capability of ConvLSTM, applied to imaged time-series data, creates a powerful framework for detecting anomalies in photovoltaic inverter operational data.
Conclusion
This paper presented a novel anomaly detection and localization framework for photovoltaic inverters based on Generative Adversarial Networks combined with time-series imaging. The key innovation lies in transforming multivariate operational time-series data from a photovoltaic inverter into a sequence of two-dimensional images using the Gramian Angular Field, and processing this sequence with a deeply integrated ConvLSTM network within a GAN-based architecture. This design enables the model to effectively learn both the temporal dynamics and the inter-variable correlations present in normal photovoltaic inverter operation. The requirement for only normal data during training makes it practical for real-world scenarios where fault samples are scarce. Experimental results on a real photovoltaic inverter dataset confirm the framework’s effectiveness, significantly outperforming traditional and other deep learning-based baseline methods. Future work will focus on further refining the network architecture and exploring the integration of attention mechanisms to potentially enhance both the precision and recall of the anomaly detection system for photovoltaic inverters.
