How To Find All Websites Under A Certain URL.
I really want to know how to find all websites under a certain URL. For example, I have an URL of https://a.b/c, and I want to find all websites under it such as https://a.b/c/d and https://a.b/c/d/e . Are there some methods to do this? Thanks so much!
If the pages are interconnected with hyperlinks from the page at the root, you can easily spider the site by following internal links. This would require you to load the root page, parse its hyperlinks, load those pages and repeat until no new links are detected. You will need to implement cycle detection to avoid crawling pages you have already crawled. Spiders are not trivial to operate politely; many sites expose metadata through robots.txt files or otherwise to indicate which parts of their site they do not wish to be indexed, and they may operate slowly to avoid consuming excessive server resource. You should respect these norms.
However, do note that there is no general purpose way to enumerate all pages if they are not explicitly linked from the site. To do so would require:
- that the site enables directory listing, so you can identify all files stored on those paths. Most sites do not provide such a service; or
- cooperation with the operator of the site or the web server to find all pages listed under those paths; or
- a brute-force search of all possible URLs under those paths, which is an effectively unbounded set. Implementing such a search would not be polite to the operator of the site, is prohibitive in terms of time and effort, and cannot be exhaustive.
- → What are the pluses/minuses of different ways to configure GPIOs on the Beaglebone Black?
- → Django, code inside <script> tag doesn't work in a template
- → React - Django webpack config with dynamic 'output'
- → GAE Python app - Does URL matter for SEO?
- → Put a Rendered Django Template in Json along with some other items
- → session disappears when request is sent from fetch
- → Python Shopify API output formatted datetime string in django template
- → Shopify app: adding a new shipping address via webhook
- → Shopify + Python library: how to create new shipping address
- → shopify python api: how do add new assets to published theme?
- → Access 'HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT' with Python Shopify Module