How To Create A MultiIndex (hierarchical Index) Dataframe Object From Another Df's Column's Unique Values?
I'm trying to create a pandas multiIndexed dataframe that is a summary of the unique values in each column.
Is there an easier way to have this information summarized besides creating this dataframe?
Either way, it would be nice to know how to complete this code challenge. Thanks for your help! Here is the toy dataframe and the solution I attempted using a for loop with a dictionary and a value_counts dataframe. Not sure if it's possible to incorporate MultiIndex.from_frame or .from_product here somehow...
Original Dataframe:
data = pd.DataFrame({'A': ['case', 'case', 'case', 'case', 'case'],
'B': [2001, 2002, 2003, 2004, 2005],
'C': ['F', 'M', 'F', 'F', 'M'],
'D': [0, 0, 0, 1, 0],
'E': [1, 0, 1, 0, 1],
'F': [1, 1, 0, 0, 0]})
A B C D E F
0 case 2001 F 0 1 1
1 case 2002 M 0 0 1
2 case 2003 F 0 1 0
3 case 2004 F 1 0 0
4 case 2005 M 1 1 0
Desired outcome:
unique percent
A case 100
B 2001 20
2002 20
2003 20
2004 20
2005 20
C F 60
M 40
D 0 80
1 20
E 0 40
1 60
F 0 60
1 40
My failed for loop attempt:
def unique_values(df):
values = {}
columns = []
df = pd.DataFrame(values, columns=columns)
for col in data:
df2 = data[col].value_counts(normalize=True)*100
values = values.update(df2.to_dict)
columns = columns.append(col*len(df2))
return df
unique_values(data)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-84-a341284fb859> in <module>
11
12
---> 13 unique_values(data)
<ipython-input-84-a341284fb859> in unique_values(df)
5 for col in data:
6 df2 = data[col].value_counts(normalize=True)*100
----> 7 values = values.update(df2.to_dict)
8 columns = columns.append(col*len(df2))
9 return df
TypeError: 'method' object is not iterable
Let me know if there's something obvious I'm missing! Still relatively new to EDA and pandas, any pointers appreciated.
Answer
This is a fairly straightforward application of .melt
:
data.melt().reset_index().groupby(['variable', 'value']).count()/len(data)
output
index
variable value
A case 1.0
B 2001 0.2
2002 0.2
2003 0.2
2004 0.2
2005 0.2
C F 0.6
M 0.4
D 0 0.8
1 0.2
E 0 0.4
1 0.6
F 0 0.6
1 0.4
Related Questions
- → Length of Values Not Matching Length of Index
- → Historical price per minute between two timestamps for a cryptocurrency
- → How to add on to parameter names in functions?
- → Errors when load selected raw data into dataframe
- → Error using Santiment sanpy library for cryptocurrency data analysis
- → Trying to expand row data and convert to DataFrame, getting this error: AttributeError: 'float' object has no attribute 'keys'
- → How to retrieve and store multiple values from a python Data Frame?
- → Finding the closest date given a date in a groupby dataframe (Python)
- → Converting year and day of year into datetime index in pandas
- → Dates to Durations in Pandas
- → Stripping all trailing empty spaces in a column of a pandas dataframe
- → issue with cross_validation in pandas_ml
- → Stack columns in pandas dataframe to achieve record format