Imagine a data frame with multiple columns, where each column is a time series holding daily returns for an individual stock index. Additionally, the data frame holds a date column.
I want to do block bootstrapping to create synthetic data. Each block spans n rows across all return columns. The idea is to preserve the cross-correlations.
This seems all fairly straight forward as long as the different stock indices (time series) are of the same length. In reality this is rarely the case. Indices with a later inception date have consecutive leading nan values up until their actual start date, which potentially span thousands of rows.
Block bootstrapping isn't that straight forward anymore. Let's say a block spans 200 rows. If you pick the first block after already picking a block that only contains prices for all indices, you end up inserting a block with nan values in the middle of an otherwise regular looking time series.
How can I solve this problem?
My actual goal is to produce synthetic data that has the same structure as the original data frame, i.e. nan values in the synthetic data frame should be exactly at the same position as in the original data frame.