Intersection Of Multiple Rows In Single DataFrame
I have a DataFrame of Temperature 1000s of rows(Time series data) and 40 columns(40 points in a catchment ). Entries in this DataFrame are zeros and one (1 means active part of catchment and zero means non-active part). I want to place number of intersected values in a separate column(named inter) in the same DataFrame .
I expect the output in this way [attached image]
value in the first row of inter should be zero as all entries are zero and no part is active on day first
value in the 2nd row of inter should be 4 as four parts are active on day 2.
value in the 3rd row of inter should be 3 (number of intersected values of all above rows including 3rd row)[enter image description here][1]. Green boxes in image show the value for 3rd row
value in 4th row of inter should be number of intersected values of all above rows (yellow shaded area in the image).
similarly blue boxes show the value for 5th row and red boxes show the value for sixth row and so on
Note: for every row I will count the intersection of all above rows
Answer
I deserve a reward for this :) Here is you answer:
import pandas as pd
import numpy as np
# setup test data
data = {'0': [0, 0, 0, 1, 0], '1': [0, 0, 1, 0, 1], '2': [0, 0, 0, 1, 0], '3': [0, 0, 1, 1, 1], '4': [0, 1, 1, 1, 0]
, '5': [0, 0, 0, 0, 1], '6': [0, 1, 1, 1, 0], '7': [0, 0, 1, 0, 1], '8': [0, 1, 0, 1, 0], '9': [0, 1, 1, 0, 0],
'10': [0, 0, 1, 0, 0], '11': [0, 0, 0, 1, 1], '12': [0, 0, 0, 1, 1]}
data = pd.DataFrame(data=data)
# collect inter data
inter_data = []
for main_index, main_row in data.iterrows():
# select data for calculations
selected_data = data.loc[0:main_index,:]
# handle firs row with 0 values
if not 1 in main_row.values:
inter_data.append(0)
else:
# handle second row
if selected_data.shape[0] == 2:
inter_data.append(selected_data[1:2].values[0].sum())
# handle rest of data
else:
# drop last row from selected data
selected_data = selected_data[:-1]
# sum selected data
summed_data = 0
for index, row in selected_data.iterrows():
summed_data += row.values
# get position of 1
positions = np.where(main_row.values == 1)
# get summed data based on position
positions_data = summed_data[positions[0]]
# sum occurance in data
inter_data.append((positions_data >= 1).sum())
# add inter data to raw data
data['inter'] = pd.DataFrame(inter_data)
Output:
0 1 2 3 4 5 6 7 8 9 10 11 12 inder
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 1 0 1 1 0 0 0 4
2 0 1 0 1 1 0 1 1 0 1 1 0 0 3
3 1 0 1 1 1 0 1 0 1 0 0 1 1 4
4 0 1 0 1 0 1 0 1 0 0 0 1 1 5
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module