Why is it hard?

Portability:

The paths that are given to the kernels through syscalls may be relative to the current directory or contain the “..”  pattern. Resolving this path is a complex process, handled by the kernel before doing the actual work on the files. This means that we cannot simply hook on system calls but also need symbols deeper in the kernel. These symbols may be added, renamed, or even removed from one kernel version to another. This can become very challenging if you want to support a wide range of kernels (CentOS/RHEL 7 is using a 3.11 based kernel, released in 2013).

In addition to that, the configuration options used to compile the kernel also vary. For instance, on Red Hat kernels, the CONFIG_SECURITY_PATH is not enabled which prohibits the use of the security_path_* function family.

Compiler optimization—such as function inlining, which is pretty common on small functions—can also make a symbol unstable, even though it seems to be present in /proc/kallsyms: the compiler can choose to inline only some calls to a method, dependending on the size of the calling function.

Hard  links  and  mounts:

To identify files, mere mortals use paths—such as/etc/shadow—but the kernel uses inodes. The very same file can be accessed using different paths in the case of hard links. If we want to fully monitor a file, we need to monitor all the hard links to this file. It would be simple if we could have all the paths that identify the same file but unfortunately, this is not the case, and we can only know the number of hard links that exist for a file.

The filesystem hierarchy is composed of folders and files. Mount points allow replacing folders or files with another hierarchy, which can be a disk or a folder from somewhere else in the hierarchy—the latter being called bind mounts. Let’s consider this example: you are monitoring the file /etc/shadow. If the/etcfolder is bind mounted to a different location, for instance /tmp/etc, the same file can be accessed from both /etc/shadow and /tmp/etc/shadow.

Considering the inodes instead of the paths in eBPF programs may help to solve both problems.

 

Performance:

The code handling the syscalls is highly optimized. Even though eBPF bytecode is turned into native code at runtime, and only a limited number of instructions are allowed, placing kprobes can still hurt the performance of the system. Therefore, one needs to take extreme care of what’s done inside the eBPF programs—for example, limiting the number of eBPF maps that are accessed in the program.

Thousands of syscalls can be issued in a second, so passing all these events to userspace through a perf ring buffer for processing may be not an option: if the user space reader is too slow, the ring buffer will fill up, and events will be dropped. This is a problem for a security solution.

Implementing some kind of filtering in-kernel is a must-have to achieve good performance.

Conclusion:

We have seen that the eBPF approach can address most of the problems that legacy solutions encounter. The eBPF constraints ensure safety while limiting the in-kernel overhead. Of course there is still the challenge of collecting and analyzing the information coming from the kernel, but we have seen that a naive in-kernel pre-filtering can help in this area. Finally, being able to run code that collects the events and context information in a safe way is something that only eBPF based solutions can provide. For more information on how such a solution is implemented, you can check out the File Integrity Monitoring feature implemented in the Datadog Agent.

 


THE POST WAS CO-AUTHORED BY:
 GUILLAUME FOURNIER
 SYLVAIN AFCHAIN
  SYLVAIN BAUBEAU