2

I have a dataframe with a multiIndex of state and town names. The columns are quarterly housing data that were created via PeriodIndex. I want to create a ratio of the data in a new column:

housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)])

Whenever I try and create this new column, I get an error:

DateParseError: Unknown datetime string format, unable to parse: P Ratio

Full Code:

# Create housing cost dataframe
zillow_file = 'City_Zhvi_AllHomes.csv'    #from https://www.zillow.com/research/data/
zillow_df = pd.read_csv(zillow_file,header=0,usecols=1,2,*range(51,251)],index_col=[1,0]).dropna(how='all')

# rename state abbreviations in level 0 multiindex to full state name
zillow_df.reset_index(inplace=True)
zillow_df['State'] = zillow_df['State'].map(states)
zillow_df.set_index(['State','RegionName'], inplace=True)

housing_data_df = zillow_df.groupby(pd.PeriodIndex(zillow_df.columns, freq="Q"), axis=1).mean()


rec_start = '2000Q1'
rec_bottom = '2001Q1'

#Reduce Size to desired data
start_col = housing_data_df.columns.get_loc(pd.Period(rec_start))-1
end_col = housing_data_df.columns.get_loc(pd.Period(rec_bottom))

housing_data_compact_df = housing_data_df[[start_col,end_col]]

#This is where the issue occurs
housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)])

Here is some other data that may/may not be helpful:

[In]: print(housing_data_compact_df.head())

                                  2000Q1         2001Q1
State        RegionName                                
New York     New York      503933.333333  465833.333333
California   Los Angeles   502000.000000  413633.333333
Illinois     Chicago       237966.666667  219633.333333
Pennsylvania Philadelphia  118233.333333  116166.666667
Arizona      Phoenix       205300.000000  168200.000000



[In]: print("Indices: " + str(housing_data_compact_df.index.names))
Indices: ['State', 'RegionName']


[In]: print(housing_data_compact_df.columns)
PeriodIndex(['2000Q1', '2001Q1'], dtype='period[Q-DEC]', freq='Q-DEC')

What I've Tried:

It seems that my issue has something to do with the PeriodIndex columns. I have tried converting the data via direct casts:

[In]: housing_data_compact_df['P Ratio'] = float(housing_data_compact_df[pd.Period(start_col_name)]).div(float(housing_data_compact_df[pd.Period(end_col_name)]))

TypeError: cannot convert the series to <class 'float'>

I've also tried using .astype(), but I get the same error as without the conversion:

[In]: housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(start_col_name)].astype(float).div(housing_data_compact_df[pd.Period(end_col_name)].astype(float))

DateParseError: Unknown datetime string format, unable to parse: P Ratio

I have also reset the keys in an attempt to break the PeriodIndex, and then reindex after the operation is done. However, this does not seem to work on all the systems I test it on, and also seems like a roundabout way to fix what I believe should be a simple solution.

Question:

How can I create a new column as the ratio of the data from these PeriodIndex columns?

Thanks in advance for any help.

Doug B
  • 347
  • 1
  • 2
  • 8

1 Answers1

2

You need strftime for convert Periodindex to string and add copy:

housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')

All code: (small changes only for working for me, use your code (is nice ;)))

zillow_file = 'http://files.zillowstatic.com/research/public/City/City_Zhvi_AllHomes.csv'
zillow_df = pd.read_csv(zillow_file,header=0,
                        usecols=[1,2] + list(range(51,251)), #changed for python 3
                        index_col=[1,0]).dropna(how='all')

# rename state abbreviations in level 0 multiindex to full state name
zillow_df.reset_index(inplace=True)
#no states in question, so commented
#zillow_df['State'] = zillow_df['State'].map(states)
zillow_df.set_index(['State','RegionName'], inplace=True)

housing_data_df=zillow_df.groupby(pd.PeriodIndex(zillow_df.columns, freq="Q"), axis=1).mean()

rec_start = '2000Q1'
rec_bottom = '2001Q1'

#Reduce Size to desired data
start_col = housing_data_df.columns.get_loc(pd.Period(rec_start))-1
end_col = housing_data_df.columns.get_loc(pd.Period(rec_bottom))
#add copy
#http://stackoverflow.com/q/42438987/2901002
housing_data_compact_df = housing_data_df[[start_col,end_col]].copy()
print (housing_data_compact_df.head())
                      2016Q3         2001Q1
State RegionName                           
NY    New York      599850.0            NaN
CA    Los Angeles   588750.0  233000.000000
IL    Chicago       207600.0  156933.333333
PA    Philadelphia  129950.0   55333.333333
AZ    Phoenix       197800.0  119600.000000

anal_start_col_name = '2016Q3'
anal_end_col_name = '2001Q1'

a = housing_data_compact_df[pd.Period(anal_start_col_name)]
                              .div(housing_data_compact_df[pd.Period(anal_end_col_name)])
housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
housing_data_compact_df['P Ratio'] = a
print (housing_data_compact_df.head())
                      2016Q3         2001Q1   P Ratio
State RegionName                                     
NY    New York      599850.0            NaN       NaN
CA    Los Angeles   588750.0  233000.000000  2.526824
IL    Chicago       207600.0  156933.333333  1.322855
PA    Philadelphia  129950.0   55333.333333  2.348494
AZ    Phoenix       197800.0  119600.000000  1.653846

Another possible solution is:

housing_data_compact_df = housing_data_df[[start_col,end_col]].copy()
print (housing_data_compact_df.head())

anal_start_col_name = '2016Q3'
anal_end_col_name = '2001Q1'

housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
housing_data_compact_df['P Ratio'] = housing_data_compact_df[anal_start_col_name]
                                       .div(housing_data_compact_df[anal_end_col_name])

print (housing_data_compact_df.head())
                      2016Q3         2001Q1   P Ratio
State RegionName                                     
NY    New York      599850.0            NaN       NaN
CA    Los Angeles   588750.0  233000.000000  2.526824
IL    Chicago       207600.0  156933.333333  1.322855
PA    Philadelphia  129950.0   55333.333333  2.348494
AZ    Phoenix       197800.0  119600.000000  1.653846
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks a lot, @jezrael, this worked perfectly. I'd literally spent days looking through the pandas docs and stackoverflow to try and figure out how to fix this to no avail. – Doug B Mar 05 '17 at 15:48
  • 1
    yes, I always think solution is `df.columns = df.columns.astype(str)` and I was surprised it doesnt work. But strftime works perfect. Btw, in official docs is nothing, I found only [this](http://nullege.com/codes/search/pandas.tseries.period.Period.strftime) – jezrael Mar 05 '17 at 15:51
  • Thanks for the link, it looks pretty useful. Next time I am pulling my hair out in the Pandas docs, I'll check on the site you suggested. – Doug B Mar 05 '17 at 16:26
  • I only google it, ithink pandas docs is obviously better, this is exception only. – jezrael Mar 05 '17 at 16:33