Ad
Change Value Of A Slice In Pandas Depending On The Number Of Rows In The Slice
I have a pandas dataframe that looks like this
import pandas as pd
df = pd.DataFrame({'Timestamp': ['1642847484', '1642847484', '1642847484', '1642847484', '1642847487', '1642847487','1642847487','1642847487','1642847487','1642847487','1642847487','1642847487', '1642847489', '1642847489', '1642847489'],
'value': [11, 10, 14, 20, 3, 2, 9, 48, 5, 20, 12, 20, 56, 12, 8]})
The data is collected in batches which results in multiple lines having the same timestamp . I need to index the dataframe with time and to do so the indexes must have unique values.
The problem as you can see is:
- The timestamp step is varriant
- The number of rows for each timestep is varriant
The approach I tried is
- Multiply timestamp by 1000 to get microseconds
- calculate the time beween timestep i and the next timestep j delta = j-i
- count the number of rows n between i and j
- for each row between i and j add ( 1/n+1 * rank) seconds
expected output:
Timestamp value
0 1642847484000 11
1 1642847484750 10
2 1642847485500 14
3 1642847484000 20
4 1642847487000 3
5 1642847487250 2
6 1642847487500 9
7 1642847487750 48
8 1642847488000 5
9 1642847488250 20
10 1642847488500 12
11 1642847488750 20
12 1642847489000 56
13 1642847489333 12
14 1642847489666 8
15 1642847490000 4
But I couldn't find a way to that efficiently, I used loops but I have 15M+ rows
Is there a simpler way to do it ? Thank you
Ad
Answer
IIUC, you want to de-duplicate using interpolated values.
A simple way would be to mask
the duplicates and to interpolate
:
s = df['Timestamp'].astype(int)
df['Timestamp2'] = (s.mul(1000) # to µs
.mask(s.duplicated()) # mask dups
.interpolate(downcast= 'infer') # interpolate
.astype(str) # back to string
)
output:
Timestamp value Timestamp2
0 1642847484 11 1642847484000
1 1642847484 10 1642847484750
2 1642847484 14 1642847485500
3 1642847484 20 1642847486250
4 1642847487 3 1642847487000
5 1642847487 2 1642847487250
6 1642847487 9 1642847487500
7 1642847487 48 1642847487750
8 1642847487 5 1642847488000
9 1642847487 20 1642847488250
10 1642847487 12 1642847488500
11 1642847487 20 1642847488750
12 1642847489 56 1642847489000
13 1642847489 12 1642847489000
14 1642847489 8 1642847489000
Ad
source: stackoverflow.com
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module
Ad