On Ftrace, kprobes, tracepoints

Posted on Wed 09 August 2017 in Performance

Note: This is going to start more as a brain-dump and over a period of time, I am going to iterate over this, till it comes in some consumable form.

Most of the Linux dynamic tracing is built around the core support in kernel called Ftrace, this started as a function trace sub-system, but is considerably more involved now. All the major tools like LTTng, SystemTap or the more recent BCC make use of this infrastructure and then build upon it. In fact some of the kernel developments like kprobes and uprobes were developed in SystemTap project.

Found a good presentation, that provides a historical perspective of how many of these projects are started. I have also created a clipboard, that give a timeline of Linux tracing and evolution of BPF support. This helped me understand why some of the utilities in bcc won't run on my Ubuntu 16.04 system.

Arguably one of the best (if not the best) resource about Linux tracing is Brendan Gregg's Blog.

Coming back to specifics - it is possible to 'trace' following -

A vast majority of kernel functions - those available inside /sys/kernel/debug/tracing/available_filter_functions. (This assumes you have mounted the tracefs in more recent kernels (and debugfs in slightly older kernels) on the /sys/kernel/debug/tracing path. It's possible to trace only a subset of those functions or functions belonging to a particular subsystem like say net etc. Kernel's documentation is a good starting point inside `Documentation/trace/ftrace.txt' and a few other files.
kprobes provided a mechanism to trace both entry and exit of a function. However the mechanism to do this was slightly involved, in the absence of integration with ftrace mechanism (basically a similar mechanism to trace functions above). However with ftrace support for kprobes, this has become very useful.
In addition there are a number of tracepoints defined in various subsystems. But it's not quite clear to me - which are the use cases where it would make sense to use this mechanims as opposed to one of the above, which seem to be very flexible.
perf events (not fully understood yet).
Userspace probing (not fully understood it yet).

The recent eBPF has made tracing a lot more interesting. What eBPF essentially allows is adding a code from Userspace to the kernel at the runtime, that can be interfaced with the above ftrace mechanism. bcc pointed above, has developed a lot of useful tools using this mechanism.

I will keep updating this article as we go along.