Pandas - How To Convert Row Data To Columns

- 1 answer

I want to groupby my data using a column (No) and keep each result of the columns date1 and results in different columns.

Here is an example of an input with the corresponding expected output :

enter image description here

I've added a little more data. and There's a lot of data.



Here is a way to do it :

from datetime import datetime

df = pd.DataFrame({'No.' : ['s1', 's2', 's2'], 'date_1' : [ for x in range(3)],
                  'results' : [1.2, 9.73, 3.71]})

# Use groupby to get the lists of dates and result
result = df.groupby('No.')[['date_1', 'results']].agg({'date_1' : list, 'results' : list})
# if you are running a pandas version <0.24.2 uncomment the following line and comment the one above
#result = df.groupby('No.')[['date_1', 'results']].agg({'date_1' : lambda x: list(x), 'results' : lambda x: list(x)})

# Look at the number of columns we will have to create
len_max = np.max([len(x) for x in result['results']])

# Create all the required columns  
for i in range(1,len_max):
    result['date__{}'.format(i+1)] = [x[i] if len(x)>i else 0 for x in result['date_1']]
    result['results_{}'.format(i+1)] = [x[i] if len(x)>i else 0 for x in result['results']]

# Modify the first  two columns that still contain the lists of the groupby
result['date_1'] = [x[0] for x in result['date_1']]
result['results'] = [x[0] for x in result['results']]

Output :

                        date_1  results                     date__2  results_2
s1  2019-07-29 08:00:45.878494     1.20                           0       0.00
s2  2019-07-29 08:00:45.878499     9.73  2019-07-29 08:00:45.878500       3.71