Horizontal Scaling With A Node.js App & Socket Io
My team and I are working on a digital signage platform.
We have ~ 2000 Raspberry Pi around the world connected to a Nodejs server using Socket IO. The Raspberries are initiating the connection.
We would like to be able to scale horizontally our application on multiple servers but we have a problem that we can’t figure out.
Basically, the application stores the sockets of the connected Raspberry in an array. We have an external program that calls the API within the server, this results by the server searching which sockets will be "impacted" by the API call and send them the informations.
After lots of search, we assume that we have to stores the sockets (or their ID) elsewhere (Redis ?), to make the application stateless. Then, any server can respond to a API call and look the sockets in a central place.
Unfortunately, we can’t find any detailed example on how to do that.
Can you please help us ?
(You can't store sockets from multiple server instances in a shared datastore like redis: they only make sense in the context of the server where they were initiated).
You will need a cluster of node.js servers to handle this. There are various ways to make a cluster. They all involve directing incoming connections from your RPis to a "generic" hostname, for example server.example.com. Behind that server.example.com hostname will be multiple node.js servers.
Each incoming connection from each RPi connects to just one of those multiple servers. (You know this, I believe.) This means one node.js server in your cluster "owns" each individual RPi.
(Telling you how to rig up a cluster of node.js servers is beyond the scope of this answer. Hints: round-robin DNS or a reverse-proxy nginx front end.)
Then, you want to route -- to fan out -- the incoming data from each API call to each server in the cluster, so the server can route it to the RPis it owns.
Here's a good way to handle that:
- Set up a redis cache or other shared data store. It can be very small.
- When each node.js server starts, have it register itself as active. That is, have it place its own specific address for handling API calls into the shared server. The specific address is probably of the form
188.8.131.52:3000: that is, an IP address and port.
- Have each server update that address every so often, once a minute or so, to show it is still alive.
- When an API call arrives at server.example.com, it will come to a more-or-less randomly chosen node.js server instance.
- Get that server to read the list of server addresses from the redis cache
- Get that server to repeat the API call to all servers except itself. Add a parameter like
repeated=yesto the repeated API calls.
- Then, each server looks at its list of connected sockets and does what your application requires.
- On server shutdown, have the server unregister itself -- remove its address from redis -- if possible.
In other words, build a way of fanning out the API calls to all active node.js servers in your cluster.
If this must scale up to a very large number (more than a hundred or so) node.js servers, or to many hundreds of API calls a minute, you probably should investigate using message queuing software.
SECURE YOUR REDIS server from random cybercreeps on the internet.
- → Maximum call stack exceeded when instantiating class inside of a module
- → Browserify api: how to pass advanced option to script
- → Node.js Passing object from server.js to external modules?
- → gulp-rename makes copies, but does not replace
- → requiring RX.js in node.js
- → Remove an ObjectId from an array of objectId
- → Can not connect to Redis
- → React: How to publish page on server using React-starter-kit
- → Express - better pattern for passing data between middleware functions
- → Can't get plotly + node.js to stream data coming through POST requests
- → IsGenerator implementation
- → Async/Await not waiting
- → (Socket.io on nodejs) Updating div with mysql data stops without showing error