Pandas Linear Interpolate Between Different Dataframes Given Values In Each Row
We have a few dataframes for the price of apples in different countries (say, China, USA, India etc) according to tenors, which look like the below. Taking China as example:
tenors = pd.Series(['1W', '1M', '1Y'])
apples_china = pd.Series([5.1, 6.2, 7.1])
days = pd.Series([7, 30, 365])
data = {'tenors': tenors,
'apples_china': apples_china,
'days' : days}
apples_china = pd.DataFrame(data)
Then we would have equivalent ones for the US (apples_usa), Europe (apples_eu) etc.
We then have a larger 'portfolio' dataframe that looks like:
country = pd.Series(['china', 'usa', 'europe',' china', 'china', 'india'])
days = pd.Series([12, 45, 99, 101, 102, 300 ])
portfolio = {'country': country,
'days' : days}
df_portfolio = pd.DataFrame(portfolio)
I would like to add a column:
df_portfolio['price']
that looks at the value in df_portfolio['country'] and, if it is 'china' for instance, takes the number of days (12 in row[0]) and linearly interpolates inside the apples_china dataframe. Thus it would be a value between 5.1 and 6.2 on row[0], a value between 6.2 and 7.1 on row[3] and so forth.
For row[1], it would be looking inside the analogous apples_usa dataframe etc.
What i tried is:
from scipy.interpolate import interp1d
test = interp1d(apples_china['apples_china'], apples_china['days']) #arrays Y and X
df_portfolio['price'] = np.where(df_portfolio['country']=='china', test(df_portfolio['days']), 0)
But it returned a ValueError
with the x range.
Answer
A solution with a couple of choices is suitable for numpy.where
treatment.
However, for a scalable solution involving many countries, a dictionary may be more useful. Below I lay out the necessary steps:
Step 1
Create a dictionary mapping country to your country-specific data.
country_map = {'china': apples_china.sort_values('days')}
Note that we need to ensure that your country dataframes are sorted by days
to ensure that np.interp
in Step 2 works as required.
Step 2
Define a custom function which takes a row of data from your portfolio dataframe, and also the mapping dictionary from Step 1, then use np.interp
to perform the mapping. You can use a try
/ except
clause to catch instances where a country does not exist in your dictionary.
def interpolator(row, mapper):
days = row['days']
country = row['country']
try:
return np.interp(days, mapper[country]['days'].values,
mapper[country]['apples_china'].values)
except KeyError:
return np.nan
Step 3
Use pd.DataFrame.apply
to apply the function from Step 2 by row.
df_portfolio['price'] = df_portfolio.apply(interpolator, mapper=country_map, axis=1)
Result
print(df_portfolio)
country days price
0 china 12 5.339130
1 usa 45 NaN
2 europe 99 NaN
3 china 101 6.390746
4 china 102 6.393433
5 india 300 NaN
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module