What happens when a linked process dies

Summary

Previously we looked at different ways we can kill (or attempt to kill) a process and what happens in each case. Now let’s see what happens when a linked process dies. The tl;dr is in the table below.

Trapping exits?	Reason for linked process exit	Exit message received?	Exits?
no	`:normal`	no	no
no	any reason other than `:normal`	no	yes
yes	any reason including `:normal`	yes	no

Note that the behaviour describes what happens when the exiting linked process is _not_the parent process. We look at that in a subsequent post.

This page can be downloaded as a LiveBook for execution. The raw markdown is here, or you can follow the previous instructions for cloning the blog and executing the LiveBook pages.

When the final value of a snippet is significant I have matched against it to make the output clearer when reading the blog rather than executing the page. eg

true = Linky.alive_after_wait_for_death?(l1)

A GenServer for the experiments

defmodule Linky do
  use GenServer

  def start do
    GenServer.start(__MODULE__, {})
  end

  def init(_), do: {:ok, %{}}

  def link(server, other) do
    :ok = GenServer.call(server, {:link, other})
  end

  def trap_exits(server) do
    :ok = GenServer.call(server, :trap_exits)
  end

  def alive_after_wait_for_death?(pid, count \\ 50)
  def alive_after_wait_for_death?(pid, 0), do: Process.alive?(pid)

  def alive_after_wait_for_death?(pid, count) do
    case Process.alive?(pid) do
      true ->
        :timer.sleep(1)
        alive_after_wait_for_death?(pid, count - 1)

      _ ->
        false
    end
  end

  def handle_call({:link, other}, _, s) do
    Process.link(other)
    {:reply, :ok, s}
  end

  def handle_call(:trap_exits, _, s) do
    Process.flag(:trap_exit, true)
    {:reply, :ok, s}
  end

  def handle_info(event, s) do
    IO.inspect({self(), event}, label: :handle_info)
    {:noreply, s}
  end

  def terminate(reason, _s) do
    IO.inspect({self(), reason}, label: :terminate)
  end
end

The GenServer above is used in the illustrative code that follows.

Linked process exits with :normal reason

I’ve read in more than one place, that linking two processes ties their lifecycle together, so that one dies the other pops off too. This is not the full story.

{:ok, l1} = Linky.start()
{:ok, l2} = Linky.start()

Linky.link(l1, l2)

GenServer.stop(l2, :normal)

true = Linky.alive_after_wait_for_death?(l1)

When we execute the above code, we discover that when a linked process exits with a :normal reason then the other process does not die.

{:ok, l1} = Linky.start()
{:ok, l2} = Linky.start()

Linky.link(l1, l2)
Linky.trap_exits(l1)

GenServer.stop(l2, :normal)
true = Linky.alive_after_wait_for_death?(l1)

We still get a message from a linked process’s :normal exit (eg {:EXIT, #PID<0.136.0>, :normal}) when we are trapping exits. In fact, with one exception, given linked procceses l1 and l2, when l1 experiences l2’s exit exactly as if l2 had called Process.exit/2 on l1 with its exit reason.

Linked processes that exit by a :kill

Remember that :kill is an untrappable exit when calling Process.exit/2? Its untrappability does not cascade to linked processes.

{:ok, l1} = Linky.start()
{:ok, l2} = Linky.start()

Linky.link(l1, l2)
Linky.trap_exits(l1)

Process.exit(l2, :kill)

true = Linky.alive_after_wait_for_death?(l1)

Interestingly the reason received by the linked process is :killed not :kill, which would of course be trappable if sent via Process.exit/2. That is one explanation for :kills not cascasding through linked processes.

{:ok, l1} = Linky.start()
{:ok, l2} = Linky.start()

Linky.link(l1, l2)
Linky.trap_exits(l1)

GenServer.stop(l2, :kill)

true = Linky.alive_after_wait_for_death?(l1)

It would be an explanation for kills not cascading, except it is perfectly possible, as above, for a process to exit with a reason:kill, and have the signal :kill trapped by a linked process. (Possible, but it would be an curious thing to actually do in production code.)

{:ok, l1} = Linky.start()
{:ok, l2} = Linky.start()

Linky.link(l1, l2)

Process.exit(l2, :kill)

false = Linky.alive_after_wait_for_death?(l1)

Of course if we are not trapping exits and a linked process is kiled, then the other processes also dies.

Linked processes exiting with other reasons

{:ok, l1} = Linky.start()
{:ok, l2} = Linky.start()

Linky.link(l1, l2)

GenServer.stop(l2, :whatever)

false = Linky.alive_after_wait_for_death?(l1)

For completeness, the above code shows that if you are not trapping exits then linked processes will exit when the other exits, as long as the reason is not :normal.

Linked (non GenServer / OTP) processes that simply exit

{:ok, l1} = Linky.start()
{:ok, l2} = Linky.start()

spawny =
  spawn(fn ->
    receive do
      :bye ->
        IO.puts("Goodbye sweet world 😿")
    end
  end)

Linky.link(l1, spawny)
Linky.link(l2, spawny)

Linky.trap_exits(l1)

send(spawny, :bye)

[true, true] = for pid <- [l1, l2], do: Linky.alive_after_wait_for_death?(pid)

When any process’s function returns normally, then the process exits with the :normal reason. Linked processes behave accordingly: they receive a :normal exit notification from the Linked process if they are trapping exits; even if they are not trapping exits, they do not exit. This is why it’s safe, for instance, to start a task with Task.start_link/1 or spawn_link/1.

A silly mistake I made that resulted in a process leak

For reasons I created a helper process for a LiveView page in the mount/3 callback. I wrote something like

def mount(_params, _session, socket) do
  {:ok, debouncer} = Debouncer.start_link(self(), 500)
  {:noreply, assign(socket, debouncer: debouncer)}
end

You can see the problem here, right? Yes, mount/3 is called twice: once when the initial html is rendered a and again when the socket connects, so an extra Debouncer is started.

This would be a waste of cpu cycles, but no more, except that the initial rendering call is initiated by a Cowboy process that exits with a :normal reason. The Debouncer process does not exit, but stays orphaned and taking up resources.

def mount(_params, _session, socket) do
  if connected?(socket) do
    {:ok, debouncer} = Debouncer.start_link(self(), 500)
    {:noreply, assign(socket, debouncer: debouncer)}
  else
    {:noreply, socket}
  end
end

The above mount/3 does not leak processes, as the LiveView’s socket process exits with {:shutdown, reason}, eg {:shutdown, :closed}, causing the linked Debouncer to also exit. It would be even safer, and proof against LiveView changes, to trap exits in Debouncer and voluntarily exit with a :stop return on the callback; I may do just that.

I thought about adding this story at the beginning of the post, but I thought it might make it like one of those cooking articles that people complain about - the ones where you have to scroll though paragraphs of prose before getting to the actual recipe. (Also it doesn’t reflect well on me and people might get bored before reading this far.)

This series

Starting out looking at exit signals and OTP process death has turned into a small series of posts, including this one. These are:

The many and varied ways to kill an OTP Process: investigation of different ways to cause (or fail to cause) a process to exit.
What happens when a linked process dies: the impact of a process exiting on processes that are linked to it, excluding OTP processes with a parent/child relationship.
Death, Children, and OTP: the impact on an OTP process when the process that spawned it (its parent) exits, particularly when the child is trapping exits.

Updates

2020-06-28 Added check to code to show that a process trapping exits does not die when a linked process dies with :normal, just like when not trapping exits.
2021-06-29: included the section linking to posts in this series.
2021-06-29: added a note that this post does not look at linked processes with a parent/child relationship.