I have a dataset containing a very long time series of hourly traffic congestion in a certain city, during a period of ~22 years (number of data points: Roughly 24 X 365 X 22 = 192720). I want to use this time series to forecast future hourly traffic congestion values. I have 2 types of missing values in the series:
- A "single" missing value - ~30 values that are missing, with no certain pattern, i.e. the missing values are sporadically spread across the time series.
- A missing day - 20 Days that are missing altogether, not a single data point for those days. 10 Of those 20 days are sporadically spread across the time series, while the other 10 are adjacent (10 days in a row).
the overall missing values rate is around 0.25%, so I'm not worried about removing them altogether for descriptive statistics etc., just wondering if it's correct to remove them for the forecasting part. Also, not sure if I should treat the 2 types of missing values differently.
Thanks!