Ad

How To Make Code Faster By Removing For Loop

I am trying to do analysis of large amount of data , first i applied groupby function to divide data in different groups. then i check some conditions in each group if condition satisfied then calculate mean ,max and some other characteristics, below code works fine but its very low . in my case total number of groups are more than 50000.

x=df.groupby(pd.Grouper(freq='10Min', base=30, label='right'))
for name,df in x:
    min_x=np.min(df['A'])
    y_max=np.max(df['B'])
    z_max=np.max(df['C'])
    if (z_max<60)&(min_x>2) & (y_max<35):
        mean_D=np.mean(df['D'])

This code is giving right output , but its very slow . i need to find some fast way to do this

Ad

Answer

One idea should be:

df1=(df.groupby(pd.Grouper(freq='10Min', base=30, label='right'))
       .agg({'A':'min', 'B':'max', 'C':'max', 'D':'mean'}))

s = df1.loc[(df1['C']<60)&(df1['A']>2) & (df1['B']<35), 'D']
Ad
source: stackoverflow.com
Ad