Pandas - Mapping Dict With Multiple Index To Column
I have two large data sets which I can't do the aggregations by combining two dataframes. I have to do the aggregation on df_train
first, then map the values to the df_test
.
df_train
and df_test
have the same exact id1
and id2
, but the df_test
have more samples. I'm computing the target mean on id1
and id2
and store it as a dictionary for memory issues.
target_mean = df_train.groupby(['id1', 'id2'])['target'].mean().to_dict()
This is the output of the aggregation. The keys are tuple pairs with id1
as the first element and id2
as the second element, and the values are target means of the groups.
{(0, 0): 146.45497131347656,
(1, 0): 74.86539459228516,
(2, 0): 14.551384925842285,
(3, 0): 235.5499725341797,
(4, 0): 976.5567626953125,
(5, 0): 17.894445419311523,
(6, 0): 64.06660461425781,
(7, 0): 350.33416748046875,
(7, 1): 3097.043701171875,
(8, 0): 256.92779541015625,
(9, 0): 72.7147445678711 }
How can I map those values to id1
and id2
columns properly?
(There are 60 million rows of data, 1449id1
and 4id2
values, so speed is important)
EDIT:
df_train[['id1', 'id2']].map(target_mean)
I tried this, but map
is only supported by pd.Series
.
Answer
I think better is use DataFrame.join
here:
target_mean = df_train.groupby(['id1', 'id2'])['target'].mean().rename('avg')
df_test = df_test.join(target_mean, on=['id1', 'id2'])
Your solution is possible, but I guess slowier with map
by MultiIndex
:
target_mean = df_train.groupby(['id1', 'id2'])['target'].mean().to_dict()
df_test['avg'] = df_test.set_index(['id1', 'id2']).index.map(target_mean)
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module