Remove Duplicate Values In A Tuple Array In Python

- 1 answer

I have a party whom purchases products. Every time the customer purchases a product, a new row is generated with the same party number.

I have grouped the products on party number and I am now stuck with a column which has arrays of tuples in it

Party NbrProduct
1(a, a, a, a, b, c)
2(a, d, a, a)
3(a, a, b, b, b)

I cant find how I can remove all duplicates from each row of the product column.

Code for the groupby:

pf = prod.groupby(['Party Nbr'])['Product name'].apply(tuple).reset_index().rename(columns= {'Product name': 'Product'})

pf['Product'] = tuple(set(pf['Product']))

ValueError: Length of values (4663) does not match length of index (32539)

Someone able to help me?



Assuming, you are using pandas, I recreated your table into a dataframe, and show how you could do the transform.

In [11]: df = pd.DataFrame({
              "party": [1, 2, 3], 
              "product": [
                  ("a", "a", "a", "a", "b", "c"),
                  ("a", "d", "a", "a"),
                  ("a", "a", "b", "b", "b")]})

In [12]: df
   party             product
0      1  (a, a, a, a, b, c)
1      2        (a, d, a, a)
2      3     (a, a, b, b, b)

In [13]: df["product"] = df["product"].apply(set).apply(tuple)

In [14]: df
   party    product
0      1  (c, b, a)
1      2     (a, d)
2      3     (b, a)

Note: as mentioned in the comments, the order of the products is not preserved, you want to preserve the order, you can use a custom function in place of chaining set & tuple.