Ad

Group By Month And Year In Pandas Dataframe

- 1 answer

I have the below data set consisting of cards swiped and time when swiped. The output has to be total no of cards swiped month and year wise.

Card No Date Time
34235   9/17/2018 5:19
56438   9/17/2018 5:57
634787  9/17/2018 5:58
79749   9/17/2018 5:59
48947   9/17/2018 6:00
3776    9/17/2018 6:07
34235   9/17/2018 6:20
56438   9/17/2018 6:23
634787  9/17/2018 6:29
79749   9/17/2018 6:35
48947   9/17/2018 6:43
3776    9/17/2018 7:05
34235   9/17/2018 7:06
56438   9/20/2018 14:25
634787  9/20/2018 14:25
79749   9/20/2018 14:26
48947   9/20/2018 14:27
3776    9/20/2018 14:28
34235   9/20/2018 14:29
56438   9/20/2018 14:32
634787  9/20/2018 14:34
79749   11/21/2018 7:58
48947   11/21/2018 8:02
3776    11/21/2018 8:02
634787  11/21/2018 8:05
79749   11/21/2018 8:11
48947   11/21/2018 8:13
3776    11/21/2018 8:20
34235   12/4/2018 14:36
56438   12/4/2018 14:37
634787  12/4/2018 14:44
79749   12/4/2018 14:44
48947   12/4/2018 14:52
3776    12/4/2018 14:54

Output

Month/Year Count
Sep/2018 21
Nov/2018 7
Dec/2018 6

I have tried using groupby but not able to reach the expected output.

  df1 = pd.DataFrame(data1, columns= ['Card No','Date Time'])

df2 = df1.groupby([df1['Date Time'].dt.year.rename('year'), df1['Date Time'].dt.month.rename('month')).agg({'count'}) 

How do I include the month name?

Ad

Answer

Since you made an attempt - this is how I would do it for your expected output,

df['month_'] = df['Date Time'].dt.strftime('%b')
df['year_'] = df['Date Time'].dt.strftime('%Y')
new_df = df.groupby(["month_", "year_"])["Card_No"].count().reset_index().sort_values(
    "Card_No", ascending=False)
print(new_df)
    month_  year_   Card No
2   Sep 2018    21
1   Nov 2018    7
0   Dec 2018    6

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.strftime.html for more information.

Edit

to sort by Month you'd need some sort of integer value to work with (although some may know better)

    df['month_'] = df['Date Time'].dt.strftime('%m') # change %b to %m
   df['year_'] = df['Date Time'].dt.strftime('%Y')
   new_df = df.groupby(["month_", "year_"])["Card_No"].count().reset_index().sort_values(
    "month_")
Ad
source: stackoverflow.com
Ad