Ad

Extracting Specific Text From Column In Dataframe In Pandas

- 1 answer

I have a pandas dataframe with a column, which I need to extract the word with [ft,mi,FT,MI] of the state column using regular expression and stored in other column.

 df1 = {
    'State':['Arizona 4.47ft','Georgia 1023mi','Newyork 2022 NY 74.6 FT','Indiana 747MI(In)','Florida 453mi FL']}

Expected output

               State  Distance
0     Arizona 4.47ft  4.47ft
1     Georgia 1023mi  1023mi
2  Newyork NY 74.6ft  74.6ft
3  Indiana 747MI(In)   747MI
4   Florida 453mi FL   453mi

Would anyone please help?

Ad

Answer

Build a regex pattern with the help of list l then use str.extract to extract the occurrence of this pattern from the State column

l = ['ft','mi','FT','MI']
df1['Distance'] = df1['State'].str.extract(r'(\S+(?:%s))\b' % '|'.join(l))

                    State Distance
0          Arizona 4.47ft   4.47ft
1          Georgia 1023mi   1023mi
2  Newyork 2022 NY 74.6FT   74.6FT
3       Indiana 747MI(In)    747MI
4        Florida 453mi FL    453mi
Ad
source: stackoverflow.com
Ad