Ad

How To Scale A Node Server That Will Be Having Long-running Tasks For Each User

I am building a service where each user will be able to have a long running task (think web scraper). An Express API will be used to manage users and spin up processes.

My initial plan was to have an API that spins up child processes which is fine for a small number of users. After a certain point, the server will reach capacity and I am not sure what I can do at this point.

Ad

Answer

If you're really going to have a long running process per user, then you will have to test how many of these processes a single server will handle and then you will have to add servers (perhaps with either a form of clustering or with custom code to start processes on other servers) when you scale beyond that tested limit.

If the long running task is really something like web scraping, then you may want to create an asynchronous design for this work (rather than a process per user) and use a work queue for divvying up any CPU-intensive work among worker processes. node.js is very good at running lots and lots of asynchronous I/O related things at once. Where you have to get other CPUs involved is any CPU-intensive work. Without understanding precisely what the long running work is, it's hard for us to make a very specific suggestion.

In a nutshell, as much as possible should be using asynchronous I/O which scales really well by itself in node.js. CPU-intensive works should be farmed out to other CPUs via some sort of work queue. Singe host clustering (the node.js cluster module) can apply multiple CPUs to even the asynchronous work. A work queue and multiple processes can apply multiple CPUs to the CPU-intensive work. If you need more scale than that, then you will want a work queue manager that can farm out the jobs to even other servers in your pod.

It is possible to create a design that uses a work queue that as the work load gets higher and higher your server resources go to max load and do as much as they can and it just increases the response time as it takes longer to complete a given request (and the work queue gets longer and longer), but it's still working on all the requests. This would be advisable over a design that tries to create a new process per user and eventually just overwhelms the server by consuming too many resources.

Ad
source: stackoverflow.com
Ad