llvm/lldb/docs/resources/debugging.rst

Debugging
=========

This page details various ways to debug LLDB itself and other LLDB tools. If
you want to know how to use LLDB in general, please refer to
:doc:`/use/tutorial`.

As LLDB is generally split into 2 tools, ``lldb`` and ``lldb-server``
(``debugserver`` on Mac OS), the techniques shown here will not always apply to
both. With some knowledge of them all, you can mix and match as needed.

In this document we refer to the initial ``lldb`` as the "debugger" and the
program being debugged as the "inferior".

Building For Debugging
----------------------

To build LLDB with debugging information add the following to your CMake
configuration:

::

  -DCMAKE_BUILD_TYPE=Debug \
  -DLLDB_EXPORT_ALL_SYMBOLS=ON

Note that the ``lldb`` you will use to do the debugging does not itself need to
have debug information.

Then build as you normally would according to :doc:`/resources/build`.

If you are going to debug in a way that doesn't need debug info (printf, strace,
etc.) we recommend adding ``LLVM_ENABLE_ASSERTIONS=ON`` to Release build
configurations. This will make LLDB fail earlier instead of continuing with
invalid state (assertions are enabled by default for Debug builds).

Debugging ``lldb``
------------------

The simplest scenario is where we want to debug a local execution of ``lldb``
like this one:

::

  ./bin/lldb test_program

LLDB is like any other program, so you can use the same approach.

::

  ./bin/lldb -- ./bin/lldb /tmp/test.o

That's it. At least, that's the minimum. There's nothing special about LLDB
being a debugger that means you can't attach another debugger to it like any
other program.

What can be an issue is that both debuggers have command line interfaces which
makes it very confusing which one is which:

::

  (the debugger)
  (lldb) run
  Process 1741640 launched: '<...>/bin/lldb' (aarch64)
  Process 1741640 stopped and restarted: thread 1 received signal: SIGCHLD

  (the inferior)
  (lldb) target create "/tmp/test.o"
  Current executable set to '/tmp/test.o' (aarch64).

Another issue is that when you resume the inferior, it will not print the
``(lldb)`` prompt because as far as it knows it hasn't changed state. A quick
way around that is to type something that is clearly not a command and hit
enter.

::

  (lldb) Process 1742266 stopped and restarted: thread 1 received signal: SIGCHLD
  Process 1742266 stopped
  * thread #1, name = 'lldb', stop reason = signal SIGSTOP
      frame #0: 0x0000ffffed5bfbf0 libc.so.6`__GI___libc_read at read.c:26:10
  (lldb) c
  Process 1742266 resuming
  notacommand
  error: 'notacommand' is not a valid command.
  (lldb)

You could just remember whether you are in the debugger or the inferior but
it's more for you to remember, and for interrupt based events you simply may not
be able to know.

Here are some better approaches. First, you could use another debugger like GDB
to debug LLDB. Perhaps an IDE like Xcode or Visual Studio Code. Something which
runs LLDB under the hood so you don't have to type in commands to the debugger
yourself.

Or you could change the prompt text for the debugger and/or inferior.

::

  $ ./bin/lldb -o "settings set prompt \"(lldb debugger) \"" -- \
    ./bin/lldb -o "settings set prompt \"(lldb inferior) \"" /tmp/test.o
  <...>
  (lldb) settings set prompt "(lldb debugger) "
  (lldb debugger) run
  <...>
  (lldb) settings set prompt "(lldb inferior) "
  (lldb inferior)

If you want spacial separation you can run the inferior in one terminal then
attach to it in another. Remember that while paused in the debugger, the inferior
will not respond to input so you will have to ``continue`` in the debugger
first.

::

  (in terminal A)
  $ ./bin/lldb /tmp/test.o

  (in terminal B)
  $ ./bin/lldb ./bin/lldb --attach-pid $(pidof lldb)

Placing Breakpoints
*******************

Generally you will want to hit some breakpoint in the inferior ``lldb``. To place
that breakpoint you must first stop the inferior.

If you're debugging from another window this is done with ``process interrupt``.
The inferior will stop, you place the breakpoint and then ``continue``. Go back
to the inferior and input the command that should trigger the breakpoint.

If you are running debugger and inferior in the same window, input ``ctrl+c``
instead of ``process interrupt`` and then folllow the rest of the steps.

If you are doing this with ``lldb-server`` and find your breakpoint is never
hit, check that you are breaking in code that is actually run by
``lldb-server``. There are cases where code only used by ``lldb`` ends up
linked into ``lldb-server``, so the debugger can break there but the breakpoint
will never be hit.

Debugging ``lldb-server``
-------------------------

Note: If you are on MacOS you are likely using ``debugserver`` instead of
``lldb-server``. The spirit of these instructions applies but the specifics will
be different.

We suggest you read :doc:`/use/remote` before attempting to debug ``lldb-server``
as working out exactly what you want to debug requires that you understand its
various modes and behaviour. While you may not be literally debugging on a
remote target, think of your host machine as the "remote" in this scenario.

The ``lldb-server`` options for your situation will depend on what part of it
or mode you are interested in. To work out what those are, recreate the scenario
first without any extra debugging layers. Let's say we want to debug
``lldb-server`` during the following command:

::

  $ ./bin/lldb /tmp/test.o

We can treat ``lldb-server`` as we treated ``lldb`` before, running it under
``lldb``. The equivalent to having ``lldb`` launch the ``lldb-server`` for us is
to start ``lldb-server`` in the ``gdbserver`` mode.

The following commands recreate that, while debugging ``lldb-server``:

::

  $ ./bin/lldb -- ./bin/lldb-server gdbserver :1234 /tmp/test.o
  (lldb) target create "./bin/lldb-server"
  Current executable set to '<...>/bin/lldb-server' (aarch64).
  <...>
  Process 1742485 launched: '<...>/bin/lldb-server' (aarch64)
  Launched '/tmp/test.o' as process 1742586...

  (in another terminal)
  $ ./bin/lldb /tmp/test.o -o "gdb-remote 1234"

Note that the first ``lldb`` is the one debugging ``lldb-server``. The second
``lldb`` is debugging ``/tmp/test.o`` and is only used to trigger the
interesting code path in ``lldb-server``.

This is another case where you may want to layout your terminals in a
predictable way, or change the prompt of one or both copies of ``lldb``.

If you are debugging a scenario where the ``lldb-server`` starts in ``platform``
mode, but you want to debug the ``gdbserver`` mode you'll have to work out what
subprocess it's starting for the ``gdbserver`` part. One way is to look at the
list of runninng processes and take the command line from there.

In theory it should be possible to use LLDB's
``target.process.follow-fork-mode`` or GDB's ``follow-fork-mode`` to
automatically debug the ``gdbserver`` process as it's created. However this
author has not been able to get either to work in this scenario so we suggest
making a more specific command wherever possible instead.

Another option is to let ``lldb-server`` start up, then attach to the process
that's interesting to you. It's less automated and won't work if the bug occurs
during startup. However it is a good way to know you've found the right one,
then you can take its command line and run that directly.

Output From ``lldb-server``
***************************

As ``lldb-server`` often launches subprocesses, output messages may be hidden
if they are emitted from the child processes.

You can tell it to enable logging using the ``--log-channels`` option. For
example ``--log-channels "posix ptrace"``. However that is not passed on to the
child processes.

The same goes for ``printf``. If it's called in a child process you won't see
the output.

In these cases consider interactive debugging ``lldb-server`` or
working out a more specific command such that it does not have to spawn a
subprocess. For example if you start with ``platform`` mode, work out what
``gdbserver`` mode process it spawns and run that command instead.

Another option if you have ``strace`` available is to trace the whole process
tree and inspect the logs after the session has ended. ::

  $ strace -ff -o log -p $(pidof lldb-server)

This will log all syscalls made by ``lldb-server`` and processes that it forks.
``-ff`` tells ``strace`` to trace child processes and write the results to a
separate file for each process, named using the prefix given by ``-o``.

Search the log files for specific terms to find the process you're interested
in. For example, to find a process that acted as a ``gdbserver`` instance::

  $ grep "gdbserver" log.*
  log.<N>:execve("<...>/lldb-server", [<...> "gdbserver", <...>) = 0

Remote Debugging
----------------

If you want to debug part of LLDB running on a remote machine, the principals
are the same but we will have to start debug servers, then attach debuggers to
those servers.

In the example below we're debugging an ``lldb-server`` ``gdbserver`` mode
command running on a remote machine.

For simplicity we'll use the same ``lldb-server`` as the debug server
and the inferior, but it doesn't need to be that way. You can use ``gdbserver``
(as in, GDB's debug server program) or a system installed ``lldb-server`` if you
suspect your local copy is not stable. As is the case in many of these
scenarios.

::

  $ <...>/bin/lldb-server gdbserver 0.0.0.0:54322 -- \
    <...>/bin/lldb-server gdbserver 0.0.0.0:54321 -- /tmp/test.o

Now we have a debug server listening on port 54322 of our remote (``0.0.0.0``
means it's listening for external connections). This is where we will connect
``lldb`` to, to debug the second ``lldb-server``.

To trigger behaviour in the second ``lldb-server``, we will connect a second
``lldb`` to port 54321 of the remote.

This is the final configuration:

::

  Host                                        | Remote
  --------------------------------------------|--------------------
  lldb A debugs lldb-server on port 54322 ->  | lldb-server A
                                              |  (which runs)
  lldb B debugs /tmp/test.o on port 54321 ->  |    lldb-server B
                                              |      (which runs)
                                              |        /tmp/test.o

You would use ``lldb A`` to place a breakpoint in the code you're interested in,
then ``lldb B`` to trigger ``lldb-server B`` to go into that code and hit the
breakpoint. ``lldb-server A`` is only here to let us debug ``lldb-server B``
remotely.

Debugging The Remote Protocol
-----------------------------

LLDB mostly follows the `GDB Remote Protocol <https://sourceware.org/gdb/onlinedocs/gdb/Remote-Protocol.html>`_
. Where there are differences it tries to handle both LLDB and GDB behaviour.

LLDB does have extensions to the protocol which are documented in
`lldb-gdb-remote.txt <https://github.com/llvm/llvm-project/blob/main/lldb/docs/lldb-gdb-remote.txt>`_
and `lldb/docs/lldb-platform-packets.txt <https://github.com/llvm/llvm-project/blob/main/lldb/docs/lldb-platform-packets.txt>`_.

Logging Packets
***************

If you just want to observe packets, you can enable the ``gdb-remote packets``
log channel.

::

  (lldb) log enable gdb-remote packets
  (lldb) run
  lldb             <   1> send packet: +
  lldb             history[1] tid=0x264bfd <   1> send packet: +
  lldb             <  19> send packet: $QStartNoAckMode#b0
  lldb             <   1> read packet: +

You can do this on the ``lldb-server`` end as well by passing the option
``--log-channels "gdb-remote packets"``. Then you'll see both sides of the
connection.

Some packets may be printed in a nicer way than others. For example XML packets
will print the literal XML, some binary packets may be decoded. Others will just
be printed unmodified. So do check what format you expect, a common one is hex
encoded bytes.

You can enable this logging even when you are connecting to an ``lldb-server``
in platform mode, this protocol is used for that too.

Debugging Packet Exchanges
**************************

Say you want to make ``lldb`` send a packet to ``lldb-server``, then debug
how the latter builds its response. Maybe even see how ``lldb`` handles it once
it's sent back.

That all takes time, so LLDB will likely time out and think the remote has gone
away. You can change the ``plugin.process.gdb-remote.packet-timeout`` setting
to prevent this.

Here's an example, first we'll start an ``lldb-server`` being debugged by
``lldb``. Placing a breakpoint on a packet handler we know will be hit once
another ``lldb`` connects.

::

  $ lldb -- lldb-server gdbserver :1234 -- /tmp/test.o
  <...>
  (lldb) b GDBRemoteCommunicationServerCommon::Handle_qSupported
  Breakpoint 1: where = <...>
  (lldb) run
  <...>

Next we connect another ``lldb`` to this, with a timeout of 5 minutes:

::

  $ lldb /tmp/test.o
  <...>
  (lldb) settings set plugin.process.gdb-remote.packet-timeout 300
  (lldb) gdb-remote 1234

Doing so triggers the breakpoint in ``lldb-server``, bringing us back into
``lldb``. Now we've got 5 minutes to do whatever we need before LLDB decides
the connection has failed.

::

  * thread #1, name = 'lldb-server', stop reason = breakpoint 1.1
      frame #0: 0x0000aaaaaacc6848 lldb-server<...>
  lldb-server`lldb_private::process_gdb_remote::GDBRemoteCommunicationServerCommon::Handle_qSupported:
  ->  0xaaaaaacc6848 <+0>:  sub    sp, sp, #0xc0
  <...>
  (lldb)

Once you're done simply ``continue`` the ``lldb-server``. Back in the other
``lldb``, the connection process will continue as normal.

::

  Process 2510266 stopped
  * thread #1, name = 'test.o', stop reason = signal SIGSTOP
      frame #0: 0x0000fffff7fcd100 ld-2.31.so`_start
  ld-2.31.so`_start:
  ->  0xfffff7fcd100 <+0>: mov    x0, sp
  <...>
  (lldb)

Reducing Bugs
-------------

This section covers reducing a bug that happens in LLDB itself, or where you
suspect that LLDB causes something else to behave abnormally.

Since bugs vary wildly, the advice here is general and incomplete. Let your
instincts guide you and don't feel the need to try everything before reporting
an issue or asking for help. This is simply inspiration.

Reduction
*********

The first step is to reduce uneeded compexity where it is cheap to do so. If
something is easily removed or frozen to a cerain value, do so. The goal is to
keep the failure mode the same, with fewer dependencies.

This includes, but is not limited to:

* Removing test cases that don't crash.
* Replacing dynamic lookups with constant values.
* Replace supporting functions with stubs that do nothing.
* Moving the test case to less unqiue system. If your machine has an exotic
  extension, try it on a readily available commodity machine.
* Removing irrelevant parts of the test program.
* Reproducing the issue without using the LLDB test runner.
* Converting a remote debuging scenario into a local one.

Now we hopefully have a smaller reproducer than we started with. Next we need to
find out what components of the software stack might be failing.

Some examples are listed below with suggestions for how to investigate them.

* Debugger

  * Use a `released version of LLDB <https://github.com/llvm/llvm-project/releases>`_.

  * If on MacOS, try the system ``lldb``.

  * Try GDB or any other system debugger you might have e.g. Microsoft Visual
    Studio.

* Kernel

  * Start a virtual machine running a different version. ``qemu-system`` is
    useful here.

  * Try a different physical system running a different version.

  * Remember that for most kernels, userspace crashing the kernel is always a
    kernel bug. Even if the userspace program is doing something unconventional.
    So it could be a bug in the application and the kernel.

* Compiler and compiler options

  * Try other versions of the same compiler or your system compiler.

  * Emit older versions of DWARF info, particularly DWARFv4 to v5, some tools
    did/do not understand the new constructs.

  * Reduce optimisation options as much as possible.

  * Try all the language modes e.g. C++17/20 for C++.

  * Link against LLVM's libcxx if you suspect a bug involving the system C++
    library.

  * For languages other than C/C++ e.g. Rust, try making an equivalent program
    in C/C++. LLDB tends to try to fit other languages into a C/C++ mould, so
    porting the program can make triage and reporting much easier.

* Operating system

  * Use docker to try various versions of Linux.

  * Use ``qemu-system`` to emulate other operating systems e.g. FreeBSD.

* Architecture

  * Use `QEMU user space emulation <https://www.qemu.org/docs/master/user/main.html>`_
    to quickly test other architectures. Note that ``lldb-server`` cannot be used
    with this as the ptrace APIs are not emulated.

  * If you need to test a big endian system use QEMU to emulate s390x (user
    space emulation for just ``lldb``, ``qemu-system`` for testing
    ``lldb-server``).

.. note:: When using QEMU you may need to use the built in GDB stub, instead of
          ``lldb-server``. For example if you wanted to debug ``lldb`` running
          inside ``qemu-user-s390x`` you would connect to the GDB stub provided
          by QEMU.

          The same applies if you want to see how ``lldb`` would debug a test
          program that is running on s390x. It's not totally accurate because
          you're not using ``lldb-server``, but this is fine for features that
          are mostly implemented in ``lldb``.

          If you are running a full system using ``qemu-system``, you likely
          want to connect to the ``lldb-server`` running within the userspace
          of that system.

          If your test program is bare metal (meaning it requires no supporting
          operating system) then connect to the built in GDB stub. This can be
          useful when testing embedded systems or kernel debugging.

Reducing Ptrace Related Bugs
****************************

This section is written Linux specific but the same can likely be done on
other Unix or Unix like operating systems.

Sometimes you will find ``lldb-server`` doing something with ptrace that causes
a problem. Your reproducer involves running ``lldb`` as well, this is not going
to go over well with kernel and is generally more difficult to explain if you
want to get help with it.

If you think you can get your point across without this, no need. If you're
pretty sure you have for example found a Linux Kernel bug, doing this greatly
increases the chances it'll get fixed.

We'll remove the LLDB dependency by making a smaller standalone program that
does the same actions. Starting with a skeleton program that forks and debugs
the inferior process.

The program presented `here <https://eli.thegreenplace.net/2011/01/23/how-debuggers-work-part-1>`_
(`source <https://github.com/eliben/code-for-blog/blob/master/2011/simple_tracer.c>`_)
is a great starting point. There is also an AArch64 specific example in
`the LLDB examples folder <https://github.com/llvm/llvm-project/tree/main/lldb/examples/ptrace_example.c>`_.

For either, you'll need to modify that to fit your architecture. A tip for this
is to take any constants used in it, find in which function(s) they are used in
LLDB and then you'll find the equivalent constants in the same LLDB functions
for your architecture.

Once that is running as expected we can convert ``lldb-server``'s into calls in
this program. To get a log of those, run ``lldb-server`` with
``--log-channels "posix ptrace"``. You'll see output like:

::

  $ lldb-server gdbserver :1234 --log-channels "posix ptrace" -- /tmp/test.o
  1694099878.829990864 <...> ptrace(16896, 2659963, 0x0000000000000000, 0x000000000000007E, 0)=0x0
  1694099878.830722332 <...> ptrace(16900, 2659963, 0x0000FFFFD14BF7CC, 0x0000FFFFD14BF7D0, 16)=0x0
  1694099878.831967115 <...> ptrace(16900, 2659963, 0x0000FFFFD14BF66C, 0x0000FFFFD14BF630, 16)=0xffffffffffffffff
  1694099878.831982136 <...> ptrace() failed: Invalid argument
  Launched '/tmp/test.o' as process 2659963...

Each call is logged with its parameters and its result as the ``=`` on the end.

From here you will need to use a combination of the `ptrace documentation <https://man7.org/linux/man-pages/man2/ptrace.2.html>`_
and Linux Kernel headers (``uapi/linux/ptrace.h`` mainly) to figure out what
the calls are.

The most important parameter is the first, which is the request number. In the
example above ``16896``, which is hex ``0x4200``, is ``PTRACE_SETOPTIONS``.

Luckily, you don't usually have to figure out all those early calls. Our
skeleton program will be doing all that, successfully we hope.

What you should do is record just the interesting bit to you. Let's say
something odd is happening when you read the ``tpidr`` register (this is an
AArch64 register, just for example purposes).

First, go to the ``lldb-server`` terminal and press enter a few times to put
some blank lines after the last logging output.

Then go to your ``lldb`` and:

::

  (lldb) register read tpidr
  tpidr = 0x0000fffff7fef320

You'll see this from ``lldb-server``:

::

  <...> ptrace(16900, 2659963, 0x0000FFFFD14BF6CC, 0x0000FFFFD14BF710, 8)=0x0

If you don't see that, it may be because ``lldb`` has cached it. The easiest way
to clear that cache is to step. Remember that some registers are read every
step, so you'll have to adjust depending on the situation.

Assuming you've got that line, you would look up what ``116900`` is. This is
``0x4204`` in hex, which is ``PTRACE_GETREGSET``. As we expected.

The following parameters are not as we might expect because what we log is a bit
different from the literal ptrace call. See your platform's definition of
``PtraceWrapper`` for the exact form.

The point of all this is that by doing a single action you can get a few
isolated ptrace calls and you can then fill in the blanks and write
equivalent calls in the skeleton program.

The final piece of this is likely breakpoints. Assuming your bug does not
require a hardware breakpoint, you can get software breakpoints by inserting
a break instruction into the inferior's code at compile time. Usually by using
an architecture specific assembly statement, as you will need to know exactly
how many instructions to overwrite later.

Doing it this way instead of exactly copying what LLDB does will save a few
ptrace calls. The AArch64 example program shows how to do this.

* The inferior contains ``BRK #0`` then ``NOP``.
* 2 4 byte instructins means 8 bytes of data to replace, which matches the
  minimum size you can write with ``PTRACE_POKETEXT``.
* The inferior runs to the ``BRK``, which brings us into the debugger.
* The debugger reads ``PC`` and writes ``NOP`` then ``NOP`` to the location
  pointed to by ``PC``.
* The debugger then single steps the inferior to the next instruction
  (this is not required in this specific scenario, you could just continue but
  it is included because this more cloesly matches what ``lldb`` does).
* The debugger then continues the inferior.
* The inferior exits, and the whole program exits.

Using this technique you can emulate the usual "run to main, do a thing" type
reproduction steps.

Finally, that "thing" is the ptrace calls you got from the ``lldb-server`` logs.
Add those to the debugger function and you now have a reproducer that doesn't
need any part of LLDB.

Debugging Tests
---------------

See :doc:`/resources/test`.