I feel somewhat ashamed of myself that I’m only now learning about this problem(s) with process shut down with Node & Docker. After finding Bret Fisher’s talk(s) about Node & Docker best practices, I couldn’t believe that there’s a bit of an issue with process signal making it all the way through to the application.
I was already aware of some subject areas such as the Linux Kernel, zombies & orphans, by no means am I an expert on those subjects, but I had a vague idea of what they were & whatnot.
But through this research I discovered that using
npm start is a terrible idea with Node & containers, prior to my research I didn’t even think this would ever be an issue, I just thought y’know, run
npm start & boom, you have a functional, running application.
If only it were that simple! 😅
To provide a bit of basic knowledge, when you want to stop an application, without Docker, it’s simple, you can run something like
kill <pid>. Simple stuff.
However, when your application is running inside a container, you can’t directly send signals from outside because it’s within an isolated namespace, your best bet is to use the Docker CLI to shut down your application & stop the container.
When you stop a container, Docker sends a SIGTERM signal to the process with PID 1. After a timeout period, if your application doesn’t gracefully shutdown, Docker will force the the application to terminate with a SIGKILL signal. Now, this might seem pretty simple so far, but as it turns out, the SIGKILL signal doesn’t go to the Linux kernel & your application has no way of detecting such a signal. So it’s safe to conclude that SIGKILL is a last resort sorta approach, it’s not clean, it’s not nice, it’s pretty nasty stuff! 😐
If your application is not PID 1, or in the event where another process is running as PID 1, the signal doesn’t know to get forwarded to your application, it will not shutdown gracefully, that is a simple fact. So Docker has no other option than to force your application to shutdown, yikes.
Using a solution much like one that I’ve previously done, e.g.
As it turns out, to the untrained eye, myself included there, this seems perfectly fine. Seems simple enough, it’ll install the application dependencies, run some migration scripts, run some seeds, simples! 😀
But, because the shell will take up PID 1 in this example, Node will become a child process, so there are now two processes running within the container. This isn’t a good thing because we want our containers to be as small & lightweight as possible, it really is that simple. But in the event where shell receives SITERM, it will not relay the signal to the child process. This is bad. 😕
Regardless of the previous information, in the context of running a node application, I think it’s reasonable to conclude that you wouldn’t want shell to be the parent process of node, because you want to have your Node.js application running in a standalone type manner. You have the Node.js runtime, you have the code, why bother with shell?
It’s reasonable for an application developer such as myself to make this mistake, after all it’s pretty much the industry standard to use
In the container environment, this isn’t the way to do it. Why you may ask? Well that’s because of the fact that Docker images are used to essentially specify how the application should start, it’s really that simple. But NPM is not a process manager & it will not give a damn about any process signals that are running around the place, hence why using
npm start should not exist within a Dockerfile. ❗
Now let’s just say for arguments sake that you got it right & your application is running as PID 1, awesome, this now means that your application should theoretically shut down gracefully. Huh, turns out it isn’t that simple either. 😂
Now, this requires a little more context now. In a non-container environment, during boot time of a Linux based operating system, the Kernel starts an init process & assigns it PID 1. Init is a process manager that’s responsible for removing zombie & orphan processes, among other things. For clarification, a zombie process is a process that has stopped & is waiting to be removed from the kernel table by its parent. Whereas an orphan process is a process where its parent has died, but this child process is still running. Therefore you can have a zombie orphan, I know that sounds kinda funny… 🧟♂️ … But in all seriousness, this means that you have a process that has stopped, but its parent is no more.
By applying the logic of init, what’s meant to happen here exactly? We can’t remove this orphan process because the parent has vanished?! Well as it turns out, when the Linux kernel sees an orphaned process, it assigns PID1 as the parent. This process is now responsible for cleaning up the adopted child process.
The Linux kernel also protects PID 1 that would kill other processes, unless you explicitly handle SIGTERM in your code, your application will not quite when it’s running as PID1.
We also don’t want to waste time & resources by having a bunch of zombies taking up slots on the Linux kernel process table, when this table is full, it won’t be able to spawn any more processes. Almost like a star that has died, things start to explode in a magnificent fashion! 💥
Anyway, without delving too deep into how the Linux kernel works & whatnot, the short conclusion is simple, do not use
npm start within your Dockerfile. Instead, I would recommend to use something more like this:
The reason why you’d want to use
dumb-init is that it will proxy signals to its child process & in this context, the
dumb-init package will run your node application as its one & only process. You can also use other alternatives such as
Tini, which is a slim process manager, but the big benefit to using
dumb-init is that it will support signal rewritting whereas
Tini does not.
You should use dumb-init or something similar to ensure that because it just ensures that you don’t need to make any changes to your application code, thus ensuring that the application code continues to work & behave as expected.
With more up to date versions of Docker, you can use
docker run --init <my image name> which will essentially inject an init process into the container, as the mean process. This will essentially handle some of the nature & behaviour of when you want to kill or exit the container.
Either way, this has been a fun little research area that I found myself going into, but for me, personally I prefer the idea of using something like
dumb-init over using the
init flag with Docker because it ensures that if some numpty came along & removed the
init flag from your CD task, it’ll continue to shutdown with grace. 👍
In development, it’s awesome using tools like PM2 or Nodemon, etc. But I cannot believe that there are people out there that use tools like Nodemon or PM2 inside of their containers, why?
Now fair enough, there may be some very solid reasons, but for the 90% of use cases where it’s a typical Node based web application, just why? Even though I love using the likes of PM2, I’d only use that if I was running it on bare metal or on a VM. Even before my recent research, I wouldn’t have done this. 😅
By all means, use the likes of Nodemon for development purposes, but as soon as you start tinkering with containers, ditch it! Trust me, you’re asking for a world of pain if you plan to do otherwise! 😶