Ad

Capturing Values Of Variables Across Dataframes From A Dict Of Dataframes In Python

- 1 answer

I have 3 dataframes in a dict where the key is a month identifier and value is the dataframe:

enter image description here

Below is a snapshot of the dataframes along with the keys:

enter image description here

Now, for each unique variable I want to capture it's correlation strength across all the months/dataframes. If a variable has correlation values in a df then that should be captured or else the value would be 0. Something like VLOOKUP in excel.

The final dataframe would look like below:

enter image description here

This seems very complicated to me to implement in python so can someone please help me with this?

Below is the code to generate the sample data and create the dict of dataframes:

import pandas as pd
import numpy as np

df1 = pd.DataFrame([{'Variable_Name':'Pending_Disconnect','correlation': 0.553395448},
                    {'Variable_Name':'status_Active','correlation': 0.539464806},
                    {'Variable_Name':'days_active','correlation':0.414774231},
                    {'Variable_Name':'days_pend_disco','correlation':0.392915837},
                    {'Variable_Name':'prop_tenure','correlation':0.074321692},
                    {'Variable_Name':'abs_change_3m','correlation':0.062267386}
                    ])


df2 = pd.DataFrame([{'Variable_Name':'Pending_Change','correlation': 0.043461995},
                    {'Variable_Name':'status_Active','correlation': 0.038057697},
                    {'Variable_Name':'ethnic','correlation':0.037503202},
                    {'Variable_Name':'days_active','correlation':0.037227245},
                    {'Variable_Name':'archetype_grp','correlation':0.035761434},
                    {'Variable_Name':'age_nan','correlation':0.035761434}
                    ])


df3 = pd.DataFrame([{'Variable_Name':'active_frq_N','correlation':0.025697016},
                    {'Variable_Name':'active_frq_Y','correlation': 0.025697016},
                    {'Variable_Name':'ethnic','correlation':0.025195149},
                    {'Variable_Name':'ecgroup','correlation':0.023192408},
                    {'Variable_Name':'age','correlation':0.023121305},
                    {'Variable_Name':'archetype_nan','correlation':0.023121305}
                    ])

dfs = [df1,df2,df3]
months = ['Jan - Feb 2018','Jan - Mar 2018','Jan - Apr 2018']

sample_dict = dict(zip(months,dfs))
Ad

Answer

you can replace the column name of your dataframe and then use pd.concat to concatenate the dataframes.

for key, df in sample_dict.items():
    df.rename(columns={'correlation':'correlation '+ key}, inplace=True)
pd.concat(dfs)

EDIT: you can also omit the dictionary and do this from the list of dataframes.

for i, df in enumerate(dfs):
    df.rename(columns={'correlation':'correlation '+ months[i]}, inplace=True)
pd.concat(dfs)  
Ad
source: stackoverflow.com
Ad