I'm using the Prophet model to forecast time series data for predicting monthly expenses. However, I noticed an issue where my forecast predictions show unexpected spikes in future values. Here's a brief overview of my process:
- Data: I have a time series dataset with monthly expenses over several years. The data is pre-processed with advanced features like lag variables (
lag_1tolag_6), moving averages (ma_6,ma_3), and rolling standard deviations. - Seasonality: I've dynamically analyzed seasonality using
seasonal_decomposeand added seasonal components for yearly and quarterly patterns. - Modeling: I fit a
Prophetmodel with dynamic seasonality configuration based on the seasonality analysis. The model is trained with regressors liketahun,bulan,quarter,ma_6, and lag features. - Prediction: After training, I predict future values (42 months ahead), dynamically updating the lags during each iteration.
Despite the model fitting reasonably well on the training and test data, the forecasted values exhibit spikes, especially in the later periods. I'm trying to understand why the model behaves this way and how I can improve the predictions.
Here's a simplified version of my prediction loop for the future periods:
# Future prediction loop
for i in range(future_periods):
future_input = {
'ds': lags['ds'],
'tahun': lags['ds'].year,
'bulan': lags['ds'].month,
'quarter': lags['ds'].quarter,
'ma_6': lags['ma_6'],
'ma_3': lags['ma_3'],
'rolling_std': lags['rolling_std']
}
for lag in range(1, 7):
future_input[f'lag_{lag}'] = lags[f'lag_{lag}']
future_df = pd.DataFrame([future_input])
# Prophet prediction
prophet_forecast = prophet_model.predict(future_df)
final_prediction = max(prophet_forecast['yhat'].values[0], 0)
predictions.append(final_prediction)
# Update lags
lags['ds'] = forecast_dates[i + 1] if i + 1 < future_periods else lags['ds']
for j in range(6, 1, -1):
lags[f'lag_{j}'] = lags[f'lag_{j-1}']
lags['lag_1'] = final_prediction
lags['ma_6'] = np.mean([lags[f'lag_{k}'] for k in range(1, 7)])
lags['ma_3'] = np.mean([lags[f'lag_{k}'] for k in range(1, 4)])
lags['rolling_std'] = np.std([lags[f'lag_{k}'] for k in range(1, 7)])
Observations:
- The model seems to react sharply to minor fluctuations in the lags and moving averages in the most recent months, resulting in large changes in predictions.
Things I've tried:
- I added advanced features and lags to account for more granular patterns, but the spikes persist.
- I tried adjusting the changepoint prior scale and seasonality prior scale, but it didn't resolve the issue.
- I limited the maximum value of predictions to avoid extreme spikes, but the model continues to output erratic results.
Questions:
- Why does the model show such spikes in future predictions, especially after a certain point?
- How can I adjust the model or preprocessing steps to smooth out the spikes and generate more stable forecasts?
- Could the dynamic updating of the lag values lead to the model's instability, and if so, how can I address this issue?