Ad

Convert A List Of Lists Of Items To Dummies In Pandas

- 1 answer

I have a list of lists of items like this:

lgenre[8:15]

[['Action'],
 ['Action', 'Adventure', 'Thriller'],
 ['Comedy', 'Drama', 'Romance'],
 ['Comedy', 'Horror'],
 ['Animation', "Children's"],
 ['Drama'],
 ['Action', 'Adventure', 'Romance']]

What I want is:

    id  Action  Adventure   Thriller    Comedy  Drama   Romance Horror  Animation   Children's
0   0   1   0   0   0   0   0   0   0   0
1   1   1   1   1   0   0   0   0   0   0
2   2   0   0   0   1   1   1   0   0   0
3   3   0   0   0   1   0   0   1   0   0
4   4   0   0   0   0   0   0   0   1   1
5   5   0   0   0   0   1   0   0   0   0
6   6   1   1   0   0   0   1   0   0   0

What I tried is to write a double loop which looks like this:

stor=pd.DataFrame({'id':list(range(len(lgenre[8:15])))})
for num,list in enumerate(lgenre[8:15]):
    for item in list:
        try:
            stor[item][num]=1
        except:
            stor[item]=0
            stor[item][num]=1

Although it is compilable, it is too slow to implement. Is there any efficient way to do this kind of thing? Any better algorithm or built-in method?

Ad

Answer

Build a dataframe from the nested list, and use pd.get_dummies:

df = pd.get_dummies(pd.DataFrame(l))
df.columns = df.columns.str.split("_").str[-1]

     Action  Animation  Comedy  Drama  Adventure  Children's  Drama  Horror  \
0       1          0       0      0          0           0      0       0   
1       1          0       0      0          1           0      0       0   
2       0          0       1      0          0           0      1       0   
3       0          0       1      0          0           0      0       1   
4       0          1       0      0          0           1      0       0   
5       0          0       0      1          0           0      0       0   
6       1          0       0      0          1           0      0       0   

   Romance  Thriller  
0        0         0  
1        0         1  
2        1         0  
3        0         0  
4        0         0  
5        0         0  
6        1         0  
Ad
source: stackoverflow.com
Ad