The many and varied ways to kill an OTP Process

Summary

When you come to think about it, there’s lots of different ways to kill an OTP process. Each has its own subleties in what behaviour is displayed, even without bringing in the impact on linked processes which is a topic for a later post.

This post is largely notes for myself in what to expect when a process is terminated in a particular way, as I can’t find this all documented in one place.

Executing as LiveBook

This post is written as an executable LiveBook. You can execute it

By installing LiveBook following the instructions here: https://github.com/elixir-nx/livebook#escript.
Downloading the live markdown from here
Run livebook server and open the ‘varied-ways-to-kill.livemd’ file you have downloaded.

As an alternative to step 2, you could clone this blog to your local machine and from the root run ./bin/live-blog; then open ‘varied-ways-to-kill.livemd’

A GenServer for the experiments

This is the GenServer we will kill in all the different ways.

defmodule Life do
  use GenServer

  def start do
    GenServer.start(__MODULE__, {})
  end

  def init(_) do
    {:ok, %{}}
  end

  def trapping_exits?(pid) do
    with {:trap_exit, val} <- Process.info(pid, :trap_exit), do: val
  end

  def trap_exits(server) do
    :ok = GenServer.call(server, :trap_exits)
  end

  def stop_trapping_exit(server) do
    :ok = GenServer.call(server, :stop_trapping_exits)
  end

  def stop(server, reason) do
    :ok = GenServer.call(server, {:stop, reason})
  end

  def execute_in_process(server, execute_me) do
    GenServer.call(server, {:execute, execute_me})
  end

  def alive_after_wait_for_death?(pid, count \\ 50)
  def alive_after_wait_for_death?(pid, 0), do: Process.alive?(pid)

  def alive_after_wait_for_death?(pid, count) do
    case Process.alive?(pid) do
      true ->
        :timer.sleep(1)
        alive_after_wait_for_death?(pid, count - 1)

      _ ->
        false
    end
  end

  def handle_call({:execute, execute_me}, _, s) do
    result = execute_me.()
    {:reply, result, s}
  end

  def handle_call(:trap_exits, _, s) do
    Process.flag(:trap_exit, true)
    {:reply, :ok, s}
  end

  def handle_call(:stop_trapping_exits, _, s) do
    Process.flag(:trap_exit, false)
    {:reply, :ok, s}
  end

  def handle_call({:stop, reason}, _, s) do
    {:stop, reason, :ok, s}
  end

  def handle_info(event, s) do
    IO.inspect({self(), event}, label: :handle_info)
    {:noreply, s}
  end

  def terminate(reason, _s) do
    IO.inspect({self(), reason}, label: :terminate)
  end
end

Process.exit/2 (called from another process)

(Process.exit/2 is a delegate for :erlang.exit/2)

Arguably, the first killing method to spring to mind would be to call Process.exit/2. Here’s a summary of what happens.

Trapping exits?	Reason	Exits?	Message?	terminate/2 callback?	Error logged?	GenServer.call/2 exit?
no	`:normal`	no	no	no	no	n/a
no	any reason but `:normal`	yes	no	no	no	n/a
yes	any reason but `:kill`	no	yes	no	no	n/a
yes	`:kill`	yes	no	no	no	n/a

I use the same table structure throughout this post; in case it’s not clear here are the longer meanings of the columns:

Trapping exits? - is the process trapping exits, having invoked Process.flag(:trap_exit, true)?
Reason - the exit reason, in this case the second argument to Process.exit/2
Exits? - does the process exit?
Message? - is a message sent to the process, in a GenServer handled by a callback to handle_info/2
terminate/2 callback? - is the (optional) terminate/2 callback called, if present.
Error log - does OTP (proclib) log an error due to the exit?
GenServer.call/2 exit/ - This is only releveant when a process exits during a GenServer.call/2, when OTP is monitoring the process - :erlang.exit/1 is called on receipt of a DOWN message before the reply message is received.

It’s worth noting that if you are relying on the terminate/2 callback to clean up after your process then you may have backed the wrong horse.

And now for the code illustrating the summary:

{:ok, l1} = Life.start()
Process.exit(l1, :normal)
Process.alive?(l1)

As documented in Process.exit/2, sending a :normal exit signal will have no effect unless it is being sent to self().

for reason <- [:other_reason, :shutdown, {:shutdown, "a reason"}, :kill] do
  {:ok, l1} = Life.start()
  Process.exit(l1, reason)
  Process.alive?(l1)
end

Calling Process.exit/2 on a process, not trapping exits, with any other reason will cause it to exit without logging any errors.

{:ok, l1} = Life.start()
Life.trap_exits(l1)
Process.exit(l1, :normal)
Process.alive?(l1)

Sending :normal exit signal to a process that is trapping exits still won’t kill it but it will get the message, handled here by handle_info/3. (The message is a 3-part tuple: :EXIT, the pid of the sending process, and the exit reason.)

{:ok, l1} = Life.start()
Life.trap_exits(l1)
Process.exit(l1, :other_reason)
Process.alive?(l1)

Of coure, the same thing will happen to a process trapping errors, with a different exit reason to :normal (or :kill): a message is received, it does not exit, and nothing is logged.

{:ok, l1} = Life.start()
Life.trap_exits(l1)
Process.exit(l1, :kill)
Process.alive?(l1)

The :kill reason is untrappable.

Process.exit/2 called from the same process

When a process calls Process.exit(self(), reason) then the behaviour varies from when calling Process.exit/2 on a different process.

Trapping exits?	Reason	Exits?	Message?	terminate/2 callback?	Error logged?	Genserver.call/2 exits?
no	any reason	yes	no	no	no	yes
yes	`:normal`, `:shutdown`, or `{:shutdown, reason}`	yes	no	yes	no	no
yes	`:kill`	yes	no	no	no	yes
yes	any other reason	yes	no	yes	yes	no

for reason <- [:other_reason, :normal, :shutdown, {:shutdown, "a reason"}, :kill] do
  {:ok, l1} = Life.start()

  try do
    Life.execute_in_process(l1, fn -> Process.exit(l1, reason) end)
  catch
    :exit, val ->
      IO.inspect(val, label: :caught_exit)
  end

  Process.alive?(l1)
end

If a process indulges in calling Process.exit/2 on itself, with any reason, it will be killed with only itself() to blame. Note that as in this case the process exits during a handle_call/3 the exit is be propagated by GenServer.call/2 and we need the try/catch above to continue processing. No error messages are logged and the terminate/2 callback is not invoked.

for reason <- [:other_reason, :normal, :shutdown, {:shutdown, "a reason"}] do
  {:ok, l1} = Life.start()
  Life.trap_exits(l1)

  :okey_dokey =
    Life.execute_in_process(l1, fn ->
      Process.exit(l1, reason)
      :okey_dokey
    end)

  Life.alive_after_wait_for_death?(l1)
end

Should a process send an exit signal to itself, the process is trapping exits, and the reason is not :kill (see below) then the process will exit asynchronously (after processing the current message) and the terminate/2 callback will be invoked. An error will be logged unless the reason is :normal, :shutdown, or {:shutdown, reson}.

{:ok, l1} = Life.start()
Life.trap_exits(l1)

try do
  Life.execute_in_process(l1, fn -> Process.exit(l1, :kill) end)
catch
  :exit, val ->
    IO.inspect(val, label: :caught_exit)
end

Process.alive?(l1)

If the process is sending :kill to itself, it makes no difference if the process is trapping exits. It is terminated immediately with no callbacks to terminate/2.

Note that the (caught) exit reason is :killed not :kill; this is significant when we come to look at linked processes in a late post.

Kernel.exit/1

(Kernel.exit/1 is just a call to :erlang.exit/1)

From the Elixir docs

Stops the execution of the calling process with the given reason.

The documentation goes into the impact on linked processes, which I will go into in a later post. The behaviour does differ from Process.exit(self(), reason) in that regardless of whether exits are being trapped, the process will trigger a terminate/2 callback, and GenServer.call/2 will raise the exit to a calling process.

Trapping exits?	Reason	Exits?	Message?	terminate/2 callback?	Error logged?	GenServer.call/2 exit?
yes and no	`:normal`, `:shutdown`, `{:shutdown, reason}`	yes	no	yes	no	yes
yes and no	other reasons, including `:kill`	yes	no	yes	yes	yes

for reason <- [:normal, :shutdown, {:shutdown, "something"}, :other_reason, :kill] do
  {:ok, l1} = Life.start()

  try do
    Life.execute_in_process(l1, fn -> exit(reason) end)
  catch
    :exit, val ->
      IO.inspect(val, label: :caught_exit)
  end

  Process.alive?(l1)
end

exit/1 terminates the process synchronously. The terminate/2 callback is called, even if the exit reason is:kill.

An exit will be logged unless the reason is one of :normal, :shutdown, or {:shutdown, reason}.

for reason <- [:normal, :shutdown, {:shutdown, "something"}, :other_reason, :kill] do
  {:ok, l1} = Life.start()
  Life.trap_exits(l1)

  try do
    Life.execute_in_process(l1, fn -> exit(reason) end)
  catch
    :exit, val ->
      IO.inspect(val, label: :caught_exit)
  end

  Process.alive?(l1)
end

Trapping exits has no impact on exit/1; the behaviour the same as when the process is not trapping exits.

Returning :stop from a message callback

To me, returning {:stop, reason, state} or {:stop, reason, reply, state} from a message handling callback feels like the usual way to terminate a GenServer.

Trapping exits?	Reason	Exits?	Message?	terminate/2 callback?	Error logged?	GenServer.call/2 exit?
yes and no	`:normal`, `:shutdown`, `{:shutdown, reason}`	yes	no	yes	no	no
yes and no	other reasons, including `:kill`	yes	no	yes	yes	no

As I also mention below, the behaviour is the same as exit/1 except that the exit is not propagated to a process invoking GenServer.call/2.

for reason <- [:normal, :shutdown, {:shutdown, "something"}, :other_reason, :kill] do
  {:ok, l1} = Life.start()
  Life.stop(l1, reason)
  Process.alive?(l1)
end

for reason <- [:normal, :shutdown, {:shutdown, "something"}, :other_reason, :kill] do
  {:ok, l1} = Life.start()
  Life.trap_exits(l1)
  Life.stop(l1, reason)
  Process.alive?(l1)
end

Returning a :stop from messaging callback behaves pretty much like exit/1: terminate/2 is called (if present); it always exits’; errors are not logged for :normal, :shutdown, and {:shutdown, reason} reasons; errors are logged for other reasons.

One difference with exit/1 is that the exit is not raised over a GenServer.call/2.

GenServer.stop/2

for reason <- [:normal, :shutdown, {:shutdown, "something"}, :other_reason, :kill] do
  {:ok, l1} = Life.start()
  GenServer.stop(l1, reason)
  Process.alive?(l1)
end

Stopping a process with GenServer.stop/2 has exactly the same behaviour as returning a :stop from a messaging callback.

Exception / errors

Errors kill processes in much the same way as Kernel.exit/1.

Trapping exits?	Reason	Exits?	Message?	terminate/2 callback?	Error logged?	GenServer.call/2 exit?
yes and no	n/a	yes	no	yes	yes	yes

{:ok, l1} = Life.start()

try do
  Life.execute_in_process(l1, fn -> raise "hell" end)
catch
  :exit, reason ->
    IO.inspect(reason, label: :caught_exit)
end

Process.alive?(l1)

One difference between exiting with a raised exception/error and Kernel.exit/1 is the propagated message, raised by GenServer.call/2 but also sent to linked and monitoring processes, includes a stack trace. In Learn You Some Erlang, Fred Hébert explains that this is extra overhead as the stack trace needs to be copied to each receiving process. Unless your processes are raising exceptions all over the place I doubt this will have much impact; also you clearly would have other problems.

return_val = fn x -> x end
{:ok, l1} = Life.start()

try do
  Life.trap_exits(l1)
  Life.execute_in_process(l1, fn -> :ok = return_val.(:not_ok) end)
catch
  :exit, reason ->
    IO.inspect(reason, label: :caught_exit)
end

Process.alive?(l1)

As you would expect, there is no difference between explicitly raising an exception and one arising “naturally” through programmer “error”, as above.

Next…

I am not sure how helpful this is to others but it has clarified some areas that were a bit fuzzy to me, and I have a quick reference for next time I can’t remember exactly what to expect when a process dies. It has been interesting to write this as a Live Book, though challenging to publish on a blog.

There is an important part missing, which is the impact of an exit on linked processes. I will write something about that soon.

This series

Starting out looking at exit signals and OTP process death has turned into a small series of posts, including this one. These are:

The many and varied ways to kill an OTP Process: investigation of different ways to cause (or fail to cause) a process to exit.
What happens when a linked process dies: the impact of a process exiting on processes that are linked to it, excluding OTP processes with a parent/child relationship.
Death, Children, and OTP: the impact on an OTP process when the process that spawned it (its parent) exits, particularly when the child is trapping exits.

Updates

2021-06-29: included the section linking to posts in this series.