Ad
How To Transform A Dataframe For Getting Time Of Various Occurence Of Events?
Given the following DataFrame:
+-------+-----+-------+-----+--------+---------------------------+
| DID | CID | Event | OID | Source | TimeStamp |
+-------+-----+-------+-----+--------+---------------------------+
| 25078 | 14 | QBT | 0 | EMS | 2019-10-15 10:54:35 +0000 |
| 25078 | 14 | NDOBT | 0 | EMS | 2019-10-15 10:54:48 +0000 |
| 25078 | 14 | SBT | 0 | EMS | 2019-10-15 10:54:52 +0000 |
| 25078 | 14 | SBT-1 | 0 | ECS | 2019-10-15 11:00:01 +0000 |
| 25078 | 14 | SBT-1 | 0 | ECS | 2019-10-15 11:00:26 +0000 |
| 25078 | 14 | SBT-1 | 0 | ECS | 2019-10-15 11:00:50 +0000 |
| 25078 | 14 | SBT | 0 | EMS | 2019-10-15T14:27:45 |
| 25078 | 14 | SBT | 0 | EMS | 2019-10-15T14:27:45 |
| 25078 | 14 | LSFA | 0 | SPDLS | 2019-10-15T14:28:16 |
| 25078 | 14 | LSFA | 0 | SPDLS | 2019-10-15T14:28:16 |
| 25078 | 14 | FEAR | 0 | CBS | 2019-10-15T14:28:18 |
| 25078 | 14 | FEAR | 0 | CBS | 2019-10-15T14:28:18 |
| 25078 | 14 | SBT | 0 | EMS | 2019-10-15T14:28:44 |
| 25078 | 14 | SBT | 0 | EMS | 2019-10-15T14:28:44 |
| 25078 | 14 | LSFA | 0 | SPDLS | 2019-10-15T14:30:55 |
| 25078 | 14 | LSFA | 0 | SPDLS | 2019-10-15T14:30:55 |
| 25078 | 14 | SBT | 0 | EMS-1 | 2019-10-15T15:28:43 |
| 25078 | 14 | SBT | 0 | EMS-1 | 2019-10-15T15:29:02 |
| 25078 | 14 | FEAR | 0 | CBS | 2019-10-15T15:30:51 |
| 25078 | 14 | FEAR | 0 | CBS | 2019-10-15T15:30:51 |
| 25078 | 14 | DBT | 0 | RS | 2019-10-15T15:44:23 |
| 25078 | 14 | QBT | 0 | EMS-1 | 2019-10-15T16:02:16 |
+-------+-----+-------+-----+--------+---------------------------+
I want to get the first and last occurrences of some events and source so that the final output looks something like this:
+-------+-----+---------------------+--------------------+---------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------+
| DID | CID | Event-QBT-Last-DT | Event-QBT-First-DT | Event-SBT-Last-DT | Event-SBT-First-DT | Screen-ECS-First-DT | Screen-ECS-Last-DT | FirstTimeUsage | LastTime Usage |
+-------+-----+---------------------+--------------------+---------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------+
| 25078 | 14 | 2019-10-15T16:02:16 | 10/15/19 10:54 AM | 2019-10-15T15:29:02 | 10/15/19 10:54 AM | 2019-10-15 11:00:01 +0000 | 2019-10-15 11:00:50 +0000 | 2019-10-15 10:54:35 +0000 | 2019-10-15T16:02:16 |
+-------+-----+---------------------+--------------------+---------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------+
How can i achieve this using pandas.
Ad
Answer
Idea is filtering rows by boolean indexing
with DataFrame.isin
for get first and last occurrences of some events, then use GroupBy.agg
with first
and last
and reshape by DataFrame.unstack
, last flatten MultiIndex
in columns:
L = ['QBT','SBT']
df1 = (df[df['Event'].isin(L)]
.groupby(['OID','DID','CID','Event'])['TimeStamp']
.agg([('Last-DT','last'), ('First-DT','first')])
.unstack()
.sort_index(axis=1, level=1))
df1.columns = [f'Event-{b}-{a}' for a, b in df1.columns]
#print (df1)
For first and last occurence use first solution with no filter, no Event
in groupby and no unstack
:
df2 = (df.groupby(['OID','DID','CID'])['TimeStamp']
.agg([('FirstTimeUsage','first'), ('LastTime Usage','last')]))
#print (df2)
Last join by DataFrame.join
together:
df = df1.join(df2).reset_index()
print (df)
OID DID CID Event-QBT-First-DT Event-QBT-Last-DT \
0 0 25078 14 2019-10-15 10:54:35 +0000 2019-10-15T16:02:16
Event-SBT-First-DT Event-SBT-Last-DT FirstTimeUsage \
0 2019-10-15 10:54:52 +0000 2019-10-15T15:29:02 2019-10-15 10:54:35 +0000
LastTime Usage
0 2019-10-15T16:02:16
EDIT: For processing next column a bit modify generate df1
:
L = ['QBT','SBT']
df1 = (df[df['Event'].isin(L)]
.groupby(['OID','DID','CID','Event'])['TimeStamp']
.agg([('Last-DT','last'), ('First-DT','first')])
.unstack()
.sort_index(axis=1, level=1))
df1.columns = [f'Event-{b}-{a}' for a, b in df1.columns]
#print (df1)
L2 = ['ECS']
df11 = (df[df['Source'].isin(L2)]
.groupby(['OID','DID','CID','Source'])['TimeStamp']
.agg([('Last-DT','last'), ('First-DT','first')])
.unstack()
.sort_index(axis=1, level=1))
df11.columns = [f'Screen-{b}-{a}' for a, b in df11.columns]
df2 = (df.groupby(['OID','DID','CID'])['TimeStamp']
.agg([('FirstTimeUsage','first'), ('LastTime Usage','last')]))
An d last use concat
:
df = pd.concat([df1, df11, df2], axis=1).reset_index()
print (df)
OID DID CID Event-QBT-First-DT Event-QBT-Last-DT \
0 0 25078 14 2019-10-15 10:54:35 +0000 2019-10-15T16:02:16
Event-SBT-First-DT Event-SBT-Last-DT Screen-ECS-First-DT \
0 2019-10-15 10:54:52 +0000 2019-10-15T15:29:02 2019-10-15 11:00:01 +0000
Screen-ECS-Last-DT FirstTimeUsage LastTime Usage
0 2019-10-15 11:00:50 +0000 2019-10-15 10:54:35 +0000 2019-10-15T16:02:16
Ad
source: stackoverflow.com
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module
Ad