Ad

How To Extract Rows With Some Processing Steps Using Python Pandas?

- 1 answer

My dataframe:

| query_name | position_description |
|------------|----------------------|
| A1         | [1-10]               |
| A1         | [3-5]                |
| A2         | [1-20]               |
| A3         | [1-15]               |
| A4         | [10-20]              |
| A4         | [1-15]               |

I would like to remove those rows with (i)same query_name and (ii) overlap entirely for the position_description?

Desired output:

| query_name | position_description |
|------------|----------------------|
| A1         | [1-10]               |
| A2         | [1-20]               |
| A3         | [1-15]               |
| A4         | [10-20]              |
| A4         | [1-15]               |
Ad

Answer

If there can be no more than one row contained in another we can use:

from ast import literal_eval
df2 = pd.DataFrame(df['position_description'].str.replace('-', ',')
                                             .apply(literal_eval).tolist(),
                   index=df.index).sort_values(0)
print(df2)

    0   1
0   1  10
2   1  20
3   1  15
5   1  15
1   3   5
4  10  20

check = df2.groupby(df['query_name']).shift()
df.loc[~(df2[0].gt(check[0]) & df2[1].lt(check[1]))]

  query_name position_description
0         A1               [1-10]
2         A2               [1-20]
3         A3               [1-15]
4         A4              [10-20]
5         A4               [1-15]
Ad
source: stackoverflow.com
Ad