Ad

Fastest Way To Iterate Over Dataframe Column To Find Match In Strings

- 1 answer

Here's a very truncated extract from a large dataframe:

nameagecity
ben66NY
rob45LON
james22LA

I also have a numerous strings that each contain different words but will contain one (not more) of the values in the name column.

For example:

  1. "rob was born in London"
  2. "ben once lived in New York"

For each string I want to iterate over the "name" column to find the name that matches the name in the string and return the age of the person.

So in the first example the desired result is 45 and in the second example the desired result is 66.

I am new to Pandas and am struggling. Can anyone point me in the right direction?

Ad

Answer

Data

s = pd.Series(['rob was born in London', "ben once lived in New York"])
df = pd.DataFrame({'name': ['ben', 'rob', 'james'],
                    'age': [66, 45, 22],
                    'city': ['NY', 'LON', 'LA']})

Solution

who = s.str.extract('(' + ')|('.join(df.name) + ')').bfill(axis=1)[0]
age_by_name = dict(zip(df.name, df.age))
pd.DataFrame({'text': s, 'age': who.map(age_by_name)})


                      text  age
0   rob was born in London  45
1   ben once lived in New York  66

Explanation

Use .str.extract to get the name in the string and then match it with the dataframe to get the age.

Ad
source: stackoverflow.com
Ad