Ad
Join Columns And Reshape In Row
i have data like this:
data = {'Host' : ['A','A','A','A','A','A','B','B','B'], 'Duration' : ['1','2',None,'4','5',None,'7','8',None], 'Predict' : [None,None,'3',None,None,'6',None,None,'9']}
df = pd.DataFrame(data)
It looks like:
Host Duration Predict
0 A 1 None
1 A 2 None
2 A None 3
3 A 4 None
4 A 5 None
5 A None 6
6 B 7 None
7 B 8 None
8 B None 9
What i expected to get:
A 1, 2, 3
A 4, 5, 6
B 7, 8, 9
I got what I wanted, but the way that I decided I do not like:
def create_vector(group):
result = []
df_array = []
for index, item in enumerate(group.Duration.ravel()):
if (item != None):
result.append(item)
else:
result.append(group.Predict.ravel()[index])
result.append(-1)
result = np.array(list(map(int, result)))
splitted = np.split(result, np.where(result == -1)[0] + 1)
for arr in splitted:
if (len(arr) > 3):
seq = ', '.join(str(e) for e in arr[:-1])
df_array.append(seq)
return pd.DataFrame(df_array,columns=['seq'])
Minimal length of arr must be one 'Duration' plus one 'Predict'
df= df.groupby(['host']).apply(create_vector)
df= df.reset_index().rename(columns={'level_1':'Index'})
df= df.drop(columns = {'Index'})
Would like to solve this problem using pandas. Waiting for comments and advice
Ad
Answer
I believe you can replace missing values from Duration
by Predict
column, so solution is simplify:
df['new'] = df['Duration'].fillna(df['Predict']).astype(str)
If need groups each 3 values by Host
groups:
g = df.groupby('Host').cumcount() // 3
Or if need groups by Predict
column with None
separator - only necessary default index:
g = df.index.where(df['Predict'].notna()).to_series().bfill()
#if always unique values in Predic column
#g = df['Predict'].bfill()
df = (df.groupby(['Host', g])['new']
.apply(', '.join)
.reset_index(level=1, drop=True)
.reset_index(name='Seq'))
print (df)
Host Seq
0 A 1, 2, 3
1 A 4, 5, 6
2 B 7, 8, 9
Another solution with reshape by DataFrame.stack
- None
s or missing values are removed by default with again aggregate join
:
g = df.groupby('Host').cumcount() // 3
df = (df.set_index(['Host', g])
.stack()
.astype(str)
.groupby(level=[0,1])
.apply(', '.join)
.reset_index(level=1, drop=True)
.reset_index(name='Seq')
)
print (df)
Host Seq
0 A 1, 2, 3
1 A 4, 5, 6
2 B 7, 8, 9
Ad
source: stackoverflow.com
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module
Ad