I have 4 columns "Country, year, GDP, CO2 emissions"
I want to measure the pearson correlation between GDP and CO2emissions for each country.
The country column has all the countries in the world and the year has the values "1990, 1991, ...., 2018".
I have 4 columns "Country, year, GDP, CO2 emissions"
I want to measure the pearson correlation between GDP and CO2emissions for each country.
The country column has all the countries in the world and the year has the values "1990, 1991, ...., 2018".
You should use a groupby grouped with corr() as your aggregation function:
country = ['India','India','India','India','India','China','China','China','China','China']
Year = [2018,2017,2016,2015,2014,2018,2017,2016,2015,2014]
GDP = [100,98,94,64,66,200,189,165,134,130]
CO2 = [94,96,90,76,64,180,172,150,121,117]
df = pd.DataFrame({'country':country,'Year':Year,'GDP':GDP,'CO2':CO2})
print(df.groupby('country')[['GDP','CO2']].corr()
If we work this output a bit we can go to something fancier:
df_corr = (df.groupby('country')['GDP','CO2'].corr()).drop(columns='GDP').drop('CO2',level=1).rename(columns={'CO2':'Correlation'})
df_corr = df_corr.reset_index().drop(columns='level_1').set_index('country',drop=True)
print(df_corr)
Output:
Correlation
country
China 0.999581
India 0.932202
My guess is that you want to have the pearson coef for each country. Using pearsonr you can loop through and create a dictionary for each country.
from scipy.stats.stats import pearsonr
df = pd.DataFrame({"column1":["value 1", "value 1","value 1","value 1","value 2", "value 2", "value 2", "value 2"],
"column2":[1,2,3,4,5, 1,2,3],
"column3":[10,30,50, 60, 80, 10, 90, 20],
"column4":[1, 3, 5, 6, 8, 5, 2, 3]})
results = {}
for country in df.column1.unique():
results[country] = {}
pearsonr_value = pearsonr(df.loc[df["column1"]== country, "column3"],df.loc[df["column1"] == country, "column4"])
results[country]["pearson"] = pearsonr_value[0]
results[country]["pvalue"] = pearsonr_value[0]
print(results["value 1"])
#{'pearson': 1.0, 'pvalue': 1.0}
print(results["value 2"])
#{'pearson': 0.09258200997725514, 'pvalue': 0.09258200997725514}