The many and varied ways to kill an OTP Process
The many and varied ways to kill an OTP Process
Summary
When you come to think about it, there’s lots of different ways to kill an OTP process. Each has its own subleties in what behaviour is displayed, even without bringing in the impact on linked processes which is a topic for a later post.
This post is largely notes for myself in what to expect when a process is terminated in a particular way, as I can’t find this all documented in one place.
Executing as LiveBook
This post is written as an executable LiveBook. You can execute it
- By installing LiveBook following the instructions here: https://github.com/elixir-nx/livebook#escript.
- Downloading the live markdown from here
- Run
livebook server
and open the ‘varied-ways-to-kill.livemd’ file you have downloaded.
As an alternative to step 2, you could clone this blog to your
local machine and from the root run ./bin/live-blog
; then open ‘varied-ways-to-kill.livemd’
A GenServer for the experiments
This is the GenServer we will kill in all the different ways.
defmodule Life do
use GenServer
def start do
GenServer.start(__MODULE__, {})
end
def init(_) do
{:ok, %{}}
end
def trapping_exits?(pid) do
with {:trap_exit, val} <- Process.info(pid, :trap_exit), do: val
end
def trap_exits(server) do
:ok = GenServer.call(server, :trap_exits)
end
def stop_trapping_exit(server) do
:ok = GenServer.call(server, :stop_trapping_exits)
end
def stop(server, reason) do
:ok = GenServer.call(server, {:stop, reason})
end
def execute_in_process(server, execute_me) do
GenServer.call(server, {:execute, execute_me})
end
def alive_after_wait_for_death?(pid, count \\ 50)
def alive_after_wait_for_death?(pid, 0), do: Process.alive?(pid)
def alive_after_wait_for_death?(pid, count) do
case Process.alive?(pid) do
true ->
:timer.sleep(1)
alive_after_wait_for_death?(pid, count - 1)
_ ->
false
end
end
def handle_call({:execute, execute_me}, _, s) do
result = execute_me.()
{:reply, result, s}
end
def handle_call(:trap_exits, _, s) do
Process.flag(:trap_exit, true)
{:reply, :ok, s}
end
def handle_call(:stop_trapping_exits, _, s) do
Process.flag(:trap_exit, false)
{:reply, :ok, s}
end
def handle_call({:stop, reason}, _, s) do
{:stop, reason, :ok, s}
end
def handle_info(event, s) do
IO.inspect({self(), event}, label: :handle_info)
{:noreply, s}
end
def terminate(reason, _s) do
IO.inspect({self(), reason}, label: :terminate)
end
end
Process.exit/2 (called from another process)
(Process.exit/2
is a delegate for
:erlang.exit/2
)
Arguably, the first killing method to spring to mind would be to call Process.exit/2
. Here’s a summary of what happens.
Trapping exits? | Reason | Exits? | Message? | terminate/2 callback? | Error logged? | GenServer.call/2 exit? |
---|---|---|---|---|---|---|
no | :normal |
no | no | no | no | n/a |
no | any reason but :normal |
yes | no | no | no | n/a |
yes | any reason but :kill |
no | yes | no | no | n/a |
yes | :kill |
yes | no | no | no | n/a |
I use the same table structure throughout this post; in case it’s not clear here are the longer meanings of the columns:
- Trapping exits? - is the process trapping exits, having invoked
Process.flag(:trap_exit, true)
? - Reason - the exit reason, in this case the second argument to
Process.exit/2
- Exits? - does the process exit?
- Message? - is a message sent to the process, in a
GenServer
handled by a callback tohandle_info/2
- terminate/2 callback? - is the (optional)
terminate/2
callback called, if present. - Error log - does OTP (proclib) log an error due to the exit?
- GenServer.call/2 exit/ - This is only releveant when a process exits during a
GenServer.call/2
, when OTP is monitoring the process -:erlang.exit/1
is called on receipt of aDOWN
message before the reply message is received.
It’s worth noting that if you are relying on the terminate/2
callback to clean up after your process then you
may have backed the wrong horse.
And now for the code illustrating the summary:
{:ok, l1} = Life.start()
Process.exit(l1, :normal)
Process.alive?(l1)
As documented in Process.exit/2, sending a :normal
exit
signal will have no effect unless it is being sent to self()
.
for reason <- [:other_reason, :shutdown, {:shutdown, "a reason"}, :kill] do
{:ok, l1} = Life.start()
Process.exit(l1, reason)
Process.alive?(l1)
end
Calling Process.exit/2
on a process, not trapping exits, with any other reason will cause it to exit without
logging any errors.
{:ok, l1} = Life.start()
Life.trap_exits(l1)
Process.exit(l1, :normal)
Process.alive?(l1)
Sending :normal
exit signal to a process that is trapping exits still won’t kill it but it will get the message,
handled here by handle_info/3
. (The message is a 3-part tuple: :EXIT
, the pid of the sending process, and
the exit reason.)
{:ok, l1} = Life.start()
Life.trap_exits(l1)
Process.exit(l1, :other_reason)
Process.alive?(l1)
Of coure, the same thing will happen to a process trapping errors, with a different exit reason to :normal
(or :kill
): a message is received, it does not exit, and nothing is logged.
{:ok, l1} = Life.start()
Life.trap_exits(l1)
Process.exit(l1, :kill)
Process.alive?(l1)
The :kill
reason is untrappable.
Process.exit/2 called from the same process
When a process calls Process.exit(self(), reason)
then the behaviour varies from when calling
Process.exit/2
on a different process.
Trapping exits? | Reason | Exits? | Message? | terminate/2 callback? | Error logged? | Genserver.call/2 exits? |
---|---|---|---|---|---|---|
no | any reason | yes | no | no | no | yes |
yes | :normal , :shutdown , or {:shutdown, reason} |
yes | no | yes | no | no |
yes | :kill |
yes | no | no | no | yes |
yes | any other reason | yes | no | yes | yes | no |
for reason <- [:other_reason, :normal, :shutdown, {:shutdown, "a reason"}, :kill] do
{:ok, l1} = Life.start()
try do
Life.execute_in_process(l1, fn -> Process.exit(l1, reason) end)
catch
:exit, val ->
IO.inspect(val, label: :caught_exit)
end
Process.alive?(l1)
end
If a process indulges in calling Process.exit/2
on itself, with any reason, it will be killed
with only itself() to blame. Note that as in this case the process exits during a handle_call/3
the exit is be propagated by GenServer.call/2
and we need the try/catch
above to continue processing. No error messages are logged and the terminate/2
callback is not invoked.
for reason <- [:other_reason, :normal, :shutdown, {:shutdown, "a reason"}] do
{:ok, l1} = Life.start()
Life.trap_exits(l1)
:okey_dokey =
Life.execute_in_process(l1, fn ->
Process.exit(l1, reason)
:okey_dokey
end)
Life.alive_after_wait_for_death?(l1)
end
Should a process send an exit signal to itself, the process is trapping exits, and the reason is not :kill
(see below) then the process will exit asynchronously (after processing the current message) and the terminate/2
callback will be invoked. An error will be logged unless the reason is :normal
, :shutdown
, or {:shutdown, reson}
.
{:ok, l1} = Life.start()
Life.trap_exits(l1)
try do
Life.execute_in_process(l1, fn -> Process.exit(l1, :kill) end)
catch
:exit, val ->
IO.inspect(val, label: :caught_exit)
end
Process.alive?(l1)
If the process is sending :kill
to itself, it makes no difference if the process is trapping exits. It is
terminated immediately with no callbacks to terminate/2
.
Note that the (caught) exit reason is :killed
not :kill
; this is significant when we come to look at linked
processes in a late post.
Kernel.exit/1
(Kernel.exit/1
is just a call to
:erlang.exit/1
)
From the Elixir docs
Stops the execution of the calling process with the given reason.
The documentation goes into the impact on linked processes, which I will go into in a later post. The behaviour
does differ from Process.exit(self(), reason)
in that regardless of whether exits are being trapped, the
process will trigger a terminate/2
callback, and GenServer.call/2
will raise the exit to a calling process.
Trapping exits? | Reason | Exits? | Message? | terminate/2 callback? | Error logged? | GenServer.call/2 exit? |
---|---|---|---|---|---|---|
yes and no | :normal , :shutdown , {:shutdown, reason} |
yes | no | yes | no | yes |
yes and no | other reasons, including :kill |
yes | no | yes | yes | yes |
for reason <- [:normal, :shutdown, {:shutdown, "something"}, :other_reason, :kill] do
{:ok, l1} = Life.start()
try do
Life.execute_in_process(l1, fn -> exit(reason) end)
catch
:exit, val ->
IO.inspect(val, label: :caught_exit)
end
Process.alive?(l1)
end
exit/1
terminates the process synchronously. The terminate/2
callback is called, even if the exit reason is:kill
.
An exit will be logged unless the reason is one of :normal
, :shutdown
, or {:shutdown, reason}
.
for reason <- [:normal, :shutdown, {:shutdown, "something"}, :other_reason, :kill] do
{:ok, l1} = Life.start()
Life.trap_exits(l1)
try do
Life.execute_in_process(l1, fn -> exit(reason) end)
catch
:exit, val ->
IO.inspect(val, label: :caught_exit)
end
Process.alive?(l1)
end
Trapping exits has no impact on exit/1
; the behaviour the same as when the process is not trapping exits.
Returning :stop from a message callback
To me, returning {:stop, reason, state}
or {:stop, reason, reply, state}
from a message handling callback
feels like the usual way to terminate a GenServer
.
Trapping exits? | Reason | Exits? | Message? | terminate/2 callback? | Error logged? | GenServer.call/2 exit? |
---|---|---|---|---|---|---|
yes and no | :normal , :shutdown , {:shutdown, reason} |
yes | no | yes | no | no |
yes and no | other reasons, including :kill |
yes | no | yes | yes | no |
As I also mention below, the behaviour is the same as exit/1
except that the exit is not propagated to a process
invoking GenServer.call/2
.
for reason <- [:normal, :shutdown, {:shutdown, "something"}, :other_reason, :kill] do
{:ok, l1} = Life.start()
Life.stop(l1, reason)
Process.alive?(l1)
end
for reason <- [:normal, :shutdown, {:shutdown, "something"}, :other_reason, :kill] do
{:ok, l1} = Life.start()
Life.trap_exits(l1)
Life.stop(l1, reason)
Process.alive?(l1)
end
Returning a :stop
from messaging callback behaves pretty much like exit/1
: terminate/2
is called (if present);
it always exits’; errors are not logged for :normal
, :shutdown
, and {:shutdown, reason}
reasons; errors are
logged for other reasons.
One difference with exit/1
is that the exit is not raised over a GenServer.call/2
.
GenServer.stop/2
for reason <- [:normal, :shutdown, {:shutdown, "something"}, :other_reason, :kill] do
{:ok, l1} = Life.start()
GenServer.stop(l1, reason)
Process.alive?(l1)
end
Stopping a process with GenServer.stop/2
has exactly the same behaviour as returning a :stop
from a
messaging callback.
Exception / errors
Errors kill processes in much the same way as Kernel.exit/1
.
Trapping exits? | Reason | Exits? | Message? | terminate/2 callback? | Error logged? | GenServer.call/2 exit? |
---|---|---|---|---|---|---|
yes and no | n/a | yes | no | yes | yes | yes |
{:ok, l1} = Life.start()
try do
Life.execute_in_process(l1, fn -> raise "hell" end)
catch
:exit, reason ->
IO.inspect(reason, label: :caught_exit)
end
Process.alive?(l1)
One difference between exiting with a raised exception/error and Kernel.exit/1
is the propagated message,
raised by GenServer.call/2
but also sent to linked and monitoring processes, includes a stack trace. In Learn You
Some Erlang, Fred Hébert explains that this is extra overhead as the stack trace needs to be copied to each
receiving process. Unless your processes are raising
exceptions all over the place I doubt this will have much impact; also you clearly would have other problems.
return_val = fn x -> x end
{:ok, l1} = Life.start()
try do
Life.trap_exits(l1)
Life.execute_in_process(l1, fn -> :ok = return_val.(:not_ok) end)
catch
:exit, reason ->
IO.inspect(reason, label: :caught_exit)
end
Process.alive?(l1)
As you would expect, there is no difference between explicitly raising an exception and one arising “naturally” through programmer “error”, as above.
Next…
I am not sure how helpful this is to others but it has clarified some areas that were a bit fuzzy to me, and I have a quick reference for next time I can’t remember exactly what to expect when a process dies. It has been interesting to write this as a Live Book, though challenging to publish on a blog.
There is an important part missing, which is the impact of an exit on linked processes. I will write something about that soon.
This series
Starting out looking at exit signals and OTP process death has turned into a small series of posts, including this one. These are:
-
The many and varied ways to kill an OTP Process: investigation of different ways to cause (or fail to cause) a process to exit.
-
What happens when a linked process dies: the impact of a process exiting on processes that are linked to it, excluding OTP processes with a parent/child relationship.
-
Death, Children, and OTP: the impact on an OTP process when the process that spawned it (its parent) exits, particularly when the child is trapping exits.
Updates
- 2021-06-29: included the section linking to posts in this series.