You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,17 +18,18 @@ This method allows for the unrestricted creation of high-quality time series dat
18
18
19
19
### 🔥 News
20
20
21
+
**[Feb. 2026]** Since all stationary time series can be obtained by exciting a linear time-invariant system with white noise, we propose [a learnable series generation method](https://github.com/wwhenxuan/S2Generator/blob/main/s2generator/simulator/arima.py) based on the ARIMA model. This method ensures the generated series is highly similar to the inputs in autocorrelation and power spectrum density.
22
+
21
23
**[Sep. 2025]** Our paper "Synthetic Series-Symbol Data Generation for Time Series Foundation Models" has been accepted by **NeurIPS 2025**, where **[*SymTime*](https://arxiv.org/abs/2502.15466)** pre-trained on the $S^2$ synthetic dataset achieved SOTA results in fine-tuning of forecasting, classification, imputation and anomaly detection tasks.
22
24
23
25
## 🚀 Installation <aid="Installation"></a>
24
26
25
-
We have highly encapsulated the algorithm and uploaded the code to PyPI. Users can download the code through `pip`.
26
-
27
+
We have highly encapsulated the algorithm and uploaded the code to PyPI:
27
28
~~~
28
29
pip install s2generator
29
30
~~~
30
31
31
-
We only used [`NumPy`](https://numpy.org/), [`Scipy`](https://scipy.org/)and [`matplotlib`](https://matplotlib.org/)when developing the project.
32
+
We used [`NumPy`](https://numpy.org/), [`Pandas`](https://pandas.pydata.org/), and [`Scipy`](https://scipy.org/)to build the data science environment, [`Matplotlib`](https://matplotlib.org/)for data visualization, and [`Statsmodels`](https://www.statsmodels.org/stable/index.html) for time series analysis and statistical processing.
Based on these two points, we can use the ARIMA model to generate non-stationary time series data.
34
35
Compared to previous data generation methods, we can further fit the statistical characteristics of real time series data through the ARIMA model, thereby generating more realistic time series data.
36
+
37
+
Since this generation method involves the fitting and training of the ARIMA model, linear operations may trigger exceptions such as `LinAlgError`, resulting in generation failure.
38
+
This issue is generally related to the input time series data and the order of the ARIMA model. We have investigated the common input data problems as follows:
39
+
40
+
1. The data is completely constant (variance = 0);
41
+
2. The length of the input time series is too short;
42
+
3. There are obvious extreme values or outliers in the input sequence after standardization;
43
+
4. An excessively high order setting (p,q) leads to matrix dimension mismatch or singularity.
44
+
45
+
In addition, the `ARIMA` implementation in `statsmodels` has limited ability to handle certain ill-conditioned matrices (e.g., nearly singular matrices).
46
+
Even if the data appears normal, LU decomposition may still fail due to floating-point precision issues.
35
47
"""
36
48
37
49
def__init__(
@@ -41,10 +53,17 @@ def __init__(
41
53
max_q: int=5,
42
54
signif: float=0.05,
43
55
not_white_alarm: bool=True,
56
+
revin: bool=True,
44
57
random_state: Optional[int] =42,
45
58
) ->None:
46
59
"""
47
-
:param order: A tuple specifying the (p, d, q) order of the ARIMA model.
60
+
:param max_p: Maximum AR order (p) to consider when fitting the ARIMA model.
61
+
:param max_d: Maximum differencing order (d) to consider when fitting the ARIMA model.
62
+
:param max_q: Maximum MA order (q) to consider when fitting the ARIMA model.
63
+
:param signif: Significance level for the ADF test to determine stationarity.
64
+
:param not_white_alarm: Whether to issue a warning when the residuals of the fitted model are not white noise.
65
+
:param revin: Should reversible normalization be performed on time series data?
66
+
:param random_state: Random state for reproducibility when generating new time series data.
48
67
"""
49
68
self.max_p=max_p
50
69
self.max_d=max_d
@@ -56,6 +75,12 @@ def __init__(
56
75
# Whether to issue a warning when residuals are not white noise
57
76
self.not_white_alarm=not_white_alarm
58
77
78
+
# Should reversible normalization be performed on time series data?
79
+
# If True, the generated time series data will be normalized to have zero mean and unit variance,
80
+
# and the original mean and variance will be recorded for potential inverse transformation.
f"Warning: Model residuals may not be white noise (mean p-value={mean_p_value:.4f} < significance level={self.signif}), please re-evaluate the model order or parameters."
109
142
)
110
143
@@ -132,7 +165,35 @@ def transform(
132
165
),
133
166
)
134
167
135
-
returngenerated_series.values.T
168
+
return (
169
+
generated_series.values.T*self.std+self.mean
170
+
ifself.revin
171
+
elsegenerated_series.values.T
172
+
)
173
+
174
+
@property
175
+
defparam_names(self) ->List[str]:
176
+
"""Return the names of the parameters in the fitted ARIMA model."""
177
+
ifnothasattr(self, "model"):
178
+
raiseValueError("The model must be fitted before calling param_names.")
179
+
180
+
returnself.model.param_names
181
+
182
+
@property
183
+
defparams(self) ->Union[np.ndarray, pd.Series]:
184
+
"""Return the parameter values of the fitted ARIMA model."""
185
+
ifnothasattr(self, "model"):
186
+
raiseValueError("The model must be fitted before calling params.")
187
+
188
+
returnself.model.params
189
+
190
+
@property
191
+
defparam_items(self) ->List[Tuple[str, float]]:
192
+
"""Return a list of (parameter name, parameter value) tuples for the fitted ARIMA model."""
193
+
ifnothasattr(self, "model"):
194
+
raiseValueError("The model must be fitted before calling param_items.")
0 commit comments