Chinese Simplified (简体中文) | English | Vietnamese (Tiếng Việt)
With the acceleration of the global energy transition, renewable energy sources such as wind and solar power are accounting for an increasingly large share of modern power systems. While this transformation promotes the development of clean energy, the inherent randomness and volatility of renewable generation have also introduced strong nonlinearity and uncertainty into electricity spot market prices. Accurate day-ahead electricity price forecasting has therefore become a core requirement for power producer bidding strategies, grid dispatching, and market regulation. Based on the multi-source heterogeneous dataset provided by the 2025 Future Cup (Chinese Simpl.: 未来杯; Vietnamese: Cúp Vị Lai) National College Big Data Challenge, this project develops an end-to-end solution covering data governance, market mechanism analysis, and high-precision forecasting.
High-quality data is the cornerstone of successful modeling. To address the challenges posed by heterogeneous data sources—including meteorological variables, load demand, renewable generation, and historical electricity prices—we first constructed a robust data preprocessing pipeline. To handle missing values and noise in the raw data, an efficient self-supervised imputation framework combined with Hankel matrix regularization was adopted, enabling intelligent completion of missing values while preserving intrinsic temporal structures. Subsequently, a hierarchical temporal alignment strategy was applied to unify data sampled at different frequencies into a standardized 15-minute resolution, laying a solid foundation for subsequent analysis.
Before model construction, an in-depth statistical attribution analysis of electricity market operation mechanisms was conducted. In addition to characterizing the spatiotemporal distribution of extreme electricity prices, a Mann–Whitney U test revealed a statistically significant lag effect of approximately 8 hours between market states and electricity price deviations. This finding provided a critical temporal window for feature engineering. To capture complex market regimes, spectral clustering was employed to restructure multidimensional features, identifying multiple representative operating scenarios (e.g., “high wind–solar output” or “high-temperature high-load” conditions). These scenario labels were incorporated as key model features. Combined with SHAP-based interpretability analysis, the suppressive effect of wind and solar generation on electricity prices and the nonlinear influence of temperature were quantitatively assessed, ensuring model transparency.
The core contribution of this project lies in the construction of a feature-fusion time series forecasting model based on Transformers and attention mechanisms. Given that a single model architecture struggles to simultaneously capture long-term trends and abrupt local fluctuations, a novel three-branch parallel architecture was designed. The first branch employs a Transformer encoder–decoder to model long-range temporal dependencies; the second branch utilizes cross-temporal feature attention to retrieve historically similar meteorological scenarios for analogical reasoning; and the third branch applies a progressive multilayer perceptron (MLP) to directly learn nonlinear mappings between features and prices. This design enables the model to simultaneously exhibit capabilities analogous to “macroscopic trend analysis,” “historical experience analogy,” and “intuitive reaction.” A dynamic fusion module adaptively integrates the outputs of the three branches according to sample characteristics, significantly enhancing robustness in the presence of price spikes and severe volatility.
In summary, this project delivers a closed-loop electricity market forecasting system and successfully produces high-precision 96-point day-ahead electricity price prediction sequences on the test set, demonstrating the superiority of deeply fused models for complex time series regression tasks. All engineering components—including data preprocessing scripts, trained model weights, and inference code—have been fully open-sourced to provide a reproducible reference for big data analytics and AI applications in electricity markets.
Only major files are listed; non-essential files are omitted.
Electricity-Price-Forecasting
│ B250068.pdf <-- Competition paper (PDF)
│ LICENSE
│ README
│
├─mechine_learning
│ │ data3.csv
│ │ spectral_clusting_findK.py <-- Spectral clustering: k selection
│ │ test.py
│ │ xgboost_best.py <-- ML models for Questions 3 and 4
│ │ xgboost_try.py
│ │
│ └─output_k=6_best
│
└─three_path_model
│ data.csv <-- Preprocessed dataset
│ draw_stat_chart.py <-- Statistical visualization
│ imputation.py <-- Missing value imputation (training + application)
│ interpolate.py <-- Temporal interpolation and alignment
│ mwu.py <-- Mann–Whitney U test (p-value, e-value, heatmaps)
│ scaler.pt <-- Feature normalization parameters
│ three_path_model_epoch80.pth <-- Best model weights (192 time steps)
│ tptsm.py <-- Day-ahead electricity price forecasting (training & inference)
│
└─images
II. Heatmap of the Relationship Between Unit Commitment Status and Real-Time / Day-Ahead Price Differences Based on the Mann–Whitney U Test ($p$ -values)
This figure illustrates key factors influencing unit commitment decisions in electricity trading. Specifically, it depicts the impact of the difference between day-ahead and real-time electricity prices at each time step within a ±100-step window centered on the target time.
III. SHAP Analysis of Exogenous Variable Impacts on Electricity Prices Based on XGBoost and Spectral Clustering
Results of accuracy experiments, ablation studies, and gradient attribution analyses are provided in the paper.
Copyright © 2025–2026 HE Feifan (Chinese Simpl.: 何非凡; Vietnamese: HÀ Phi Phàm), DU Yu (Chinese Simpl.: 杜宇; Vietnamese: ĐỖ Vũ), School of Electronic and Information Engineering, Lanzhou Jiaotong University (Chinese Simpl.: 兰州交通大学电子与信息工程学院; Vietnamese: Đại Học Giao thông Lan Châu, Học Viện Điện Tử Và Công Nghệ Thông Tin)
