Custom Grouping For All Possible Groups When Having Missing Values
I have a dictionary which represents a set of products. I need to find all duplicate products within these products. If products have same product_type
,color
and size
-> they are duplicates. I could easily group by ('product_type','color','size') if I did not have a problem: some values are missing. Now I have to find all possible groups of products that might be duplicates between themselves. This means that some elements can appear in multiple groups.
Let me illustrate:
import pandas as pd
def main():
data= {'product_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
'product_type': ['shirt', 'shirt', 'shirt', 'shirt', 'shirt', 'hat', 'hat', 'hat', 'hat', 'hat', 'hat', ],
'color': [None, None, None, 'red', 'blue', None, 'blue', 'blue', 'blue', 'red', 'red', ],
'size': [None, 's', 'xl', None, None, 's', None, 's', 'xl', None, 'xl', ],
}
print(data)
if __name__ == '__main__':
main()
for this data:
I need this result - list of possibly duplicate products for each possible group (take only the biggest super groups):
So for example, lets take "shirt" with id=1
this product does not have color or size so he can appear in a possible "duplicates group" together with shirt #2 (which has size "s" but does not have color) and shirt #4 (which has color "red" but does not have size). So these three shirts (1,2,4) are possibly duplicates with same color "red" and size "s".
I tried to implement it by looping through all possible combinations of missing values but it feels wrong and complex.
Is there a way to get the desired result?
Answer
You can create all possible keys that are not None
and then check which item falls into what key - respecting the None
s:
data= {'product_id' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
'product_type': ['shirt', 'shirt', 'shirt', 'shirt', 'shirt', 'hat',
'hat', 'hat', 'hat', 'hat', 'hat', ],
'color' : [None, None, None, 'red', 'blue', None, 'blue',
'blue', 'blue', 'red', 'red', ],
'size' : [None, 's', 'xl', None, None, 's', None, 's', 'xl', None, 'xl', ]}
from itertools import product
# create all keys without None in it
p = product((t for t in set(data['product_type']) if t),
(c for c in set(data['color']) if c),
(s for s in set(data['size']) if s))
# create the things you have in stock
inventar = list( zip(data['product_id'],data['product_type'],data['color'],data['size']))
d = {}
# order things into its categories
for cat in p:
d.setdefault(cat,set()) # uses a set to collect the IDs
for item in inventar:
TY, CO, SI = cat
ID, TYPE, COLOR, SIZE = item
# the (TYPE or TY) will substitute TY for any TYPE that is None etc.
if (TYPE or TY)==TY and (COLOR or CO)==CO and (SIZE or SI)==SI:
d[cat].add(ID)
print(d)
Output:
# category-key id's that match
{('shirt', 'blue', 's') : {1, 2, 5},
('shirt', 'blue', 'xl'): {1, 3, 5},
('shirt', 'red', 's') : {1, 2, 4},
('shirt', 'red', 'xl') : {1, 3, 4},
('hat', 'blue', 's') : {8, 6, 7},
('hat', 'blue', 'xl') : {9, 7},
('hat', 'red', 's') : {10, 6},
('hat', 'red', 'xl') : {10, 11}}
Doku:
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module