Capturing Values Of Variables Across Dataframes From A Dict Of Dataframes In Python
I have 3 dataframes in a dict where the key is a month identifier and value is the dataframe:
Below is a snapshot of the dataframes along with the keys:
Now, for each unique variable I want to capture it's correlation strength across all the months/dataframes. If a variable has correlation values in a df then that should be captured or else the value would be 0. Something like VLOOKUP in excel.
The final dataframe would look like below:
This seems very complicated to me to implement in python so can someone please help me with this?
Below is the code to generate the sample data and create the dict of dataframes:
import pandas as pd
import numpy as np
df1 = pd.DataFrame([{'Variable_Name':'Pending_Disconnect','correlation': 0.553395448},
{'Variable_Name':'status_Active','correlation': 0.539464806},
{'Variable_Name':'days_active','correlation':0.414774231},
{'Variable_Name':'days_pend_disco','correlation':0.392915837},
{'Variable_Name':'prop_tenure','correlation':0.074321692},
{'Variable_Name':'abs_change_3m','correlation':0.062267386}
])
df2 = pd.DataFrame([{'Variable_Name':'Pending_Change','correlation': 0.043461995},
{'Variable_Name':'status_Active','correlation': 0.038057697},
{'Variable_Name':'ethnic','correlation':0.037503202},
{'Variable_Name':'days_active','correlation':0.037227245},
{'Variable_Name':'archetype_grp','correlation':0.035761434},
{'Variable_Name':'age_nan','correlation':0.035761434}
])
df3 = pd.DataFrame([{'Variable_Name':'active_frq_N','correlation':0.025697016},
{'Variable_Name':'active_frq_Y','correlation': 0.025697016},
{'Variable_Name':'ethnic','correlation':0.025195149},
{'Variable_Name':'ecgroup','correlation':0.023192408},
{'Variable_Name':'age','correlation':0.023121305},
{'Variable_Name':'archetype_nan','correlation':0.023121305}
])
dfs = [df1,df2,df3]
months = ['Jan - Feb 2018','Jan - Mar 2018','Jan - Apr 2018']
sample_dict = dict(zip(months,dfs))
Answer
you can replace the column name of your dataframe and then use pd.concat
to concatenate the dataframes.
for key, df in sample_dict.items():
df.rename(columns={'correlation':'correlation '+ key}, inplace=True)
pd.concat(dfs)
EDIT: you can also omit the dictionary and do this from the list of dataframes.
for i, df in enumerate(dfs):
df.rename(columns={'correlation':'correlation '+ months[i]}, inplace=True)
pd.concat(dfs)
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module