Random Sample Of N Distinct Permutations Of A List
Suppose I have a Python list of arbitrary length k
. Now, suppose I would like a random sample of n
, (where n <= k!) distinct permutations of that list. I was tempted to try:
import random
import itertools
k = 6
n = 10
mylist = list(range(0, k))
j = random.sample(list(itertools.permutations(mylist)), n)
for i in j:
print(i)
But, naturally, this code becomes unusably slow when k
gets too large. Given that the number of permutations that I may be looking for n
is going to be relatively small compared to the total number of permutations, computing all of the permutations is unnecessary. Yet it's important that none of the permutations in the final list are duplicates.
How would you achieve this more efficiently? Remember, mylist
could be a list of anything, I just used list(range(0, k))
for simplicity.
Answer
Naïve implementation
Bellow the naïve implementation I did (well implemented by @Tomothy32, pure PSL using generator):
import numpy as np
mylist = np.array(mylist)
perms = set()
for i in range(n): # (1) Draw N samples from permutations Universe U (#U = k!)
while True: # (2) Endless loop
perm = np.random.permutation(k) # (3) Generate a random permutation form U
key = tuple(perm)
if key not in perms: # (4) Check if permutation already has been drawn (hash table)
perms.update(key) # (5) Insert into set
break # (6) Break the endless loop
print(i, mylist[perm])
It relies on numpy.random.permutation
which randomly permute a sequence.
The key idea is:
- to generate a new random permutation (index randomly permuted);
- to check if permutation already exists and store it (as
tuple
ofint
because it must hash) to prevent duplicates; - Then to permute the original list using the index permutation.
This naïve version does not directly suffer to factorial complexity O(k!)
of itertools.permutations
function which does generate all k!
permutations before sampling from it.
About Complexity
There is something interesting about the algorithm design and complexity...
If we want to be sure that the loop could end, we must enforce N <= k!
, but it is not guaranteed. Furthermore, assessing the complexity requires to know how many time the endless-loop will actually loop before a new random tuple is found and break it.
Limitation
Let's encapsulate the function written by @Tomothy32:
import math
def get_perms(seq, N=10):
rand_perms = perm_generator(mylist)
return [next(rand_perms) for _ in range(N)]
For instance, this call work for very small k<7
:
get_perms(list(range(k)), math.factorial(k))
But will fail before O(k!)
complexity (time and memory) when k
grows because it boils down to randomly find a unique missing key when all other k!-1
keys have been found.
Always look on the bright side...
On the other hand, it seems the method can generate a reasonable amount of permuted tuples in a reasonable amount of time when N<<<k!
. Example, it is possible to draw more than N=5000
tuples of length k
where 10 < k < 1000
in less than one second.
When k
and N
are kept small and N<<<k!
, then the algorithm seems to have a complexity:
- Constant versus
k
; - Linear versus
N
.
This is somehow valuable.
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module