Generating Seemingly Random Looking Unique Numbers For Url Purposes
So I was wondering about youtube's url. Specially the video id watch?v=XZmGGAbHqa0
. Same for similar tinyurl services.
I came across Tom's video of Will YouTube Ever Run Out Of Video IDs? That was quite interesting. But generating a seemingly random number in a random way, check for duplicates looked quite expensive.
In case you haven't watched the video, youtube generates a large unique number and base64 encode it. Not sure if the whole process is that simple inside youtube but I'm trying to write something similar.
I came across a mathematical function called Modular Multiplicative Inverse.
Using this along with a bit of Extended Euclidean algorithm will generate a unique number from our given input. Inverse is possible too, But someone won't be easily able to inverse or brute force without knowing the seeds. So we can easily get a large random number from our subsequent id in database. Maybe even base64 it.
But I have confusions.
As tom mentioned in the video, syncing a subsequent id across multiple servers can lead to problems and overheads. But if we don't make it subsequent, won't it be a tedious and expensive task to find a unique id? We have to check if the id is available in db.
Should I avoid mixing in url with subsequent ids? Or masking it with something like that Multiplicative Inverse is good?
Why would I want to base64 it? To save storage or to save url space? I would still need to generate something seemingly random, unique and be able to quickly generate it without duplicating and search it.
Any better ways to achieve what I'm trying? I did checked all the similar so questions and a few blog posts. On of which is this
Sorry If I couldn't explain well, I'm quite confused what to ask.
Answer
This is why you bypass the issue of ensuring uniqueness of randomly generated numbers entirely and use a pseudorandom permutation (PRP) family to transform a pre-coordinated counter.
A PRP family is like a PRF family in that it is a cipher that takes a key and a plaintext and produces a ciphertext except that it has a one-to-one map between plaintexts and ciphertexts.
This lets one use a Twitter Snowflake like design and then simply encode the internal sequential identifiers into non-sequential external identifiers.
See Algorithm to turn numeric IDs in to short, different alphanumeric codes for an implementation in Python.
Also
M. Luby and C. Rackoff. How to construct pseudorandom permutations from pseudorandom functions. SIAM J. Comput., 17(2):373–386, Apr. 1988.
Related Questions
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Can't turn off Javascript using Selenium
- → WebDriver click() vs JavaScript click()
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module