How To Scale A NodeJS Application (2024)

Let’s build a NodeJS application and load test it to see how it performs. We are going to use amazing Fastify framework to test since it’s a very low overhead simple framework to get started with. I am going to use the example provided on their site for hello world server.

This is a very simple fastify application that starts on port 3000 and returns a hello world json. Now let’s run the server and load test it with wrk2. I have a 12 core machine, so I am going to run wrk command with 12 threads and 1,000 connections for 30 seconds with 200,000 requests per second.

wrk -t12 -c1000 -d30s -R200000 http://localhost:3000/

The result shows that vanilla fastify server can handle around 25 thousand requests per second.

Running 30s test @ http://localhost:3000/
 12 threads and 1000 connections
 Thread Stats Avg Stdev Max +/- Stdev
 Latency 17.30s 4.94s 26.17s 58.16%
 Req/Sec 2.08k 1.66 2.09k 91.67%
 753447 requests in 30.03s, 134.37MB read
Requests/sec: 25087.51
Transfer/sec: 4.47MB

The cluster module introduced around version 0.8 can lunch a cluster of Node processes. The master process can fork and launch other child processes which then all run in parallel. Let’s see it in action

Here we use the os module to detect the number of CPU cores the system has. If the number of cores is 1, it just simply runs the application as it ran previously. If it has more cores, it detects if the running process is the Master process with the help of cluster module. It then loops over the number of CPUs of the machine and forks the current process using the cluster.fork() method.

What fork does is really just runs another node process of the same program similar to running node index.js . When the child process executes, the cluster module’s isMaster returns false and it runs the program as usual.

The master process listens at our HTTP server’s port and load balances all requests among the workers. The output looks like

server listening on 3000 and worker 280474
server listening on 3000 and worker 280473
server listening on 3000 and worker 280483
server listening on 3000 and worker 280480
server listening on 3000 and worker 280492
server listening on 3000 and worker 280503
server listening on 3000 and worker 280510
server listening on 3000 and worker 280517
server listening on 3000 and worker 280504
server listening on 3000 and worker 280533
server listening on 3000 and worker 280526
server listening on 3000 and worker 280536

The output will vary from machine to machine depending on the number of cores in the system. I have 12 cores in my system, so it runs 12 processes.

When we now hit the web server multiple times, the requests will start to get handled by different worker processes with different process ids. The master distributes the load in round robin fashion among the workers.

Now let’s load test this again,

Running 30s test @ http://localhost:3000/
 12 threads and 1000 connections
 Thread Stats Avg Stdev Max +/- Stdev
 Latency 8.86s 2.69s 15.56s 60.54%
 Req/Sec 9.19k 427.09 10.17k 70.83%
 3262051 requests in 30.00s, 581.74MB read
Requests/sec: 108729.29
Transfer/sec: 19.39MB

We see a drastic performance improvement over a single threaded NodeJS server. We are now able to serve 100 thousand requests a second almost 4 times the previous performance. This is a real gain, without any external mechanism, just with built in tools.

And now you have a NodeJS application scaled to run in all cores of a machine!

When we run a single instance of Node server, we have to restart it in case of a crash or when we deploy new code. Multiple processes running the program alleviates this issue. We can just fork a new process when one crashes. Let’s see it in action

The full file is available here. Here, we simulate a random crash and make sure the crash is happening in one of the worker processes not in the master process. We still have to restart the application if the master process crashes. But for the child process, we can fork again when we see a crash. We add condition before forking to make sure it was a crash, not killed or disconnected by the master. When we run it, we see

server listening on 3000 and worker 287932
server listening on 3000 and worker 287921
server listening on 3000 and worker 287928
server listening on 3000 and worker 287922
server listening on 3000 and worker 287940
server listening on 3000 and worker 287959
server listening on 3000 and worker 287967
server listening on 3000 and worker 287984
server listening on 3000 and worker 287951
server listening on 3000 and worker 287981
server listening on 3000 and worker 287973
server listening on 3000 and worker 287952
Worker 5 has exited.
server listening on 3000 and worker 288053
Worker 1 has exited.
server listening on 3000 and worker 288064
Worker 7 has exited.
server listening on 3000 and worker 288075
Worker 14 has exited.
server listening on 3000 and worker 288086
Worker 10 has exited.
server listening on 3000 and worker 288097

Every time a worker exits, a new one is being spun up. This is good as we have made sure we have the application running regardless of any crash. We do need to find the root cause and fix it soon but this will keep the app afloat in the meantime.

There are tools built on this technology that would help us in the production environment so that we do not have to do the heavy lifting. Let’s explore one of these tools, PM2. There is an enterprise version, but there is also a free version that we can use. Let’s start the original fastify example with pm2.

npm i -g pm2
pm2 start index.js -i max

First, we install pm2 globally. Then we run the pm2 command. This will spin up the NodeJS application to the max number of cores of the CPU. In my machine I see

So, 12 instances of the application is being spun up. Let’s load test again

Running 30s test @ http://localhost:3000/
 12 threads and 1000 connections
 Thread Stats Avg Stdev Max +/- Stdev
 Latency 12.29s 3.46s 20.43s 59.21%
 Req/Sec 6.55k 52.83 6.66k 66.67%
 2305130 requests in 30.00s, 411.09MB read
Requests/sec: 76837.29
Transfer/sec: 13.70MB

It is 3x the original performance which ran on single core, not quite the one we saw with spinning worker processes ourselves. If we run pm2 monit when load testing, we can see that pm2 is not using the full power of the CPUs, rather 70–80%

This is good as well since in a production environment we do not want to use 100% of the CPU, there could be load balancer rules that would spin up another virtual machine if we reach close to 80% for a sustained period. This is a good setup and if we use the reload flag, pm2 can reload automatically if a process crashes. Pm2 also enables zero downtime deployments.