Ad

How To Remove Duplicate Values In One Column But Keep The Rows Pandas?

- 1 answer

I have dataframe as per below Country: China, China, China, United Kingdom, United Kingdom,United Kingdom Country code: CN, CN, CN, UK, UK, UK Port Name: Yantian, Shekou, Quanzhou, Plymouth, Cardiff, Bird port

I want to remove the duplicates in the first two columns, only keep as: Country: China, , , United Kingdom, , Country code: CN, , , UK, , Port Name: Yantian, Shekou, Quanzhou, Plymouth, Cardiff, Bird port

I have tried df.drop_duplicates, but it will drop the whole rows.

Ad

Answer

You could use the pd.Series.duplicated method:

import pandas as pd

df = pd.DataFrame(
    [
        ['China', 'CN', 'Yantian'],
        ['China', 'CN', 'Shekou'],
        ['China', 'CN', 'Quanzhou'],
        ['United Kingdom', 'UK', 'Plymouth'],
        ['United Kingdom', 'UK', 'Cardiff'],
        ['United Kingdom', 'UK', 'Bird port']
    ],
    columns=['Country', 'Country code', 'Port Name']
)

for col in ['Country', 'Country code']:
    df[col][df[col].duplicated()] = np.NaN
print(df)

prints

indexCountryCountry codePort Name
0ChinaCNYantian
1NaNNaNShekou
2NaNNaNQuanzhou
3United KingdomUKPlymouth
4NaNNaNCardiff
5NaNNaNBird port
Ad
source: stackoverflow.com
Ad